Re: Fred 2.0 files was RE: Browsing percentages / analytics

From: Michael Kreyche <mkreyche_at_nyob>
Date: Tue, 12 Feb 2008 10:59:35 -0500
To: NGC4LIB_at_listserv.nd.edu
On Mon, 11 Feb 2008 16:42:33 -0600, Bryan Baldus
<bryan.baldus_at_QUALITY-BOOKS.COM> wrote:

> On Monday, February 11, 2008 3:54 PM, Simon Spero wrote:
>>For LC authorities and bibliographic records from 12/2006, see
>http://www.ibiblio.org/fred2.0/wordpress/?page_id=10

>I was able to get MarcEdit to successfully parse the records into something
>resembling a MARC format record. However, for records with certain
>diacritics (macrons, at least; acute/grave/tilde, probably others, seem
>fine), I get odd characters, perhaps because of the use of NFC (Composed
>Normal Form). For example, in the file headings-100-NFC--00000.xml, in NAR n
>42014288, Hirano, Ryūichi,$d 1920-$tKeijihō kenkyū, if I convert from
>MARCXML to MARC using the default settings, I get Hirano,
>Ryūichi, d1920- tKeijihō kenkyū

>What would be the best/easiest way to convert the files from their current
>format into raw MARC format, with diacritics coming out in MARC-8 format (or
>a format able to be converted to MARC-8)? Or am I doing something wrong?

Brian,

Try charlint.pl to convert from NFC to NFD:

http://www.w3.org/International/charlint/

I imagine MARCedit can do the rest of the job from there. An alternative
that should also work fine is MARC4J:

http://marc4j.tigris.org/

Mike
--
Michael Kreyche
Systems Librarian / Associate Professor
Libraries and Media Services
Kent State University
330-672-1918
Received on Tue Feb 12 2008 - 10:56:24 EST