On Mon, 11 Feb 2008 16:42:33 -0600, Bryan Baldus
<bryan.baldus_at_QUALITY-BOOKS.COM> wrote:
> On Monday, February 11, 2008 3:54 PM, Simon Spero wrote:
>>For LC authorities and bibliographic records from 12/2006, see
>http://www.ibiblio.org/fred2.0/wordpress/?page_id=10
>I was able to get MarcEdit to successfully parse the records into something
>resembling a MARC format record. However, for records with certain
>diacritics (macrons, at least; acute/grave/tilde, probably others, seem
>fine), I get odd characters, perhaps because of the use of NFC (Composed
>Normal Form). For example, in the file headings-100-NFC--00000.xml, in NAR n
>42014288, Hirano, Ryūichi,$d 1920-$tKeijihō kenkyū, if I convert from
>MARCXML to MARC using the default settings, I get Hirano,
>Ryūichi, d1920- tKeijihŠkenkyū
>What would be the best/easiest way to convert the files from their current
>format into raw MARC format, with diacritics coming out in MARC-8 format (or
>a format able to be converted to MARC-8)? Or am I doing something wrong?
Brian,
Try charlint.pl to convert from NFC to NFD:
http://www.w3.org/International/charlint/
I imagine MARCedit can do the rest of the job from there. An alternative
that should also work fine is MARC4J:
http://marc4j.tigris.org/
Mike
--
Michael Kreyche
Systems Librarian / Associate Professor
Libraries and Media Services
Kent State University
330-672-1918
Received on Tue Feb 12 2008 - 10:56:24 EST