Re: unwanted (bogus) characters in marc

From: Ere Maijala <ere.maijala_at_nyob>
Date: Fri, 8 Oct 2010 09:31:04 +0300
To: CODE4LIB_at_LISTSERV.ND.EDU
On 7.10.2010 15:17, Thomas Krichel wrote:
>    Ere Maijala writes
>
>> # Fix non-UTF-8 characters with two highest bits set (we assume they
>> are actually ISO-8859-1)
>
>    What about
>
> use Encode::Guess qw/latin-1/;
> $decoded=decode("Guess", $dodgy_input);
>
>    $decoded then should be a utf-8 string with utf8 flag on.

Would that work for a predominantly proper utf-8 input with some 
"mistakes" thrown in?

--Ere
Received on Fri Oct 08 2010 - 02:32:13 EDT