Re: Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

From: Ed Summers <ehs_at_nyob> Date: Thu, 8 Mar 2012 15:18:59 -0500 To: CODE4LIB_at_LISTSERV.ND.EDU

Hi Terry,

On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
<terry.reese_at_oregonstate.edu> wrote:
> This is one of the reasons you really can't trust the information found in position 9.  This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not.  Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process.  Determining the something else is much more difficult, but generally not necessary.

Can you describe in a bit more detail how MARCEdit sniffs the record
to determine the encoding? This has come up enough times w/ pymarc to
make it worth implementing.

//Ed