Fred 2.0 files was RE: Browsing percentages / analytics

From: Bryan Baldus <bryan.baldus_at_nyob> Date: Mon, 11 Feb 2008 16:42:33 -0600 To: NGC4LIB_at_listserv.nd.edu

 On Monday, February 11, 2008 3:54 PM, Simon Spero wrote:
>For LC authorities and bibliographic records from 12/2006, see
http://www.ibiblio.org/fred2.0/wordpress/?page_id=10

When I attempt to open one of these files using MarcEdit, in order to turn
the records into raw MARC format, I receive an error message, -99. According
to an e-mail I received from the author of MarcEdit, this is because the
files lack an XML header. After changing the first line of one of the files
from
<collection>
to:
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">

I was able to get MarcEdit to successfully parse the records into something
resembling a MARC format record. However, for records with certain
diacritics (macrons, at least; acute/grave/tilde, probably others, seem
fine), I get odd characters, perhaps because of the use of NFC (Composed
Normal Form). For example, in the file headings-100-NFC--00000.xml, in NAR n
42014288, Hirano, Ryūichi,$d 1920-$tKeijihō kenkyū, if I convert from
MARCXML to MARC using the default settings, I get Hirano,
RyÅ«ichi, d1920- tKeijihÅ kenkyÅ«

If I select the option to Translate to MARC-8, I get:
Hirano, Ry&#x16B;ichi, d1920- tKeijih&#x14D; kenky&#x16B;

The author portion converted from raw MARC (saved out of our cataloging
software, ITS for Windows, based on record in LC's database) into MarcEdit's
mnemonic format:
Hirano, Ry{macr}uichi,$d1920-

What would be the best/easiest way to convert the files from their current
format into raw MARC format, with diacritics coming out in MARC-8 format (or
a format able to be converted to MARC-8)? Or am I doing something wrong?

Thank you for your assistance,

Bryan Baldus
Cataloger
Quality Books Inc.
1-800-323-4241x402
bryan.baldus_at_quality-books.com
eijabb_at_cpan.org
http://home.inwave.com/eija