Re: Linfeed Trick

From: Hickey,Thom <hickey_at_nyob> Date: Tue, 25 May 2004 15:07:40 -0400 To: CODE4LIB_at_LISTSERV.ND.EDU

Actually, the Python code is all home-grown.  Its main virtue is speed,
since we regularly pass 50+ million records through it.  It avoids
user-defined classes for just about everything but the main record class.
I've generalized it a bit lately to handle OAI-harvested DC records, etc.

--Th

-----Original Message-----
From: Ed Summers [mailto:ehs_at_pobox.com]
Sent: Tuesday, May 25, 2004 2:26 PM
To: CODE4LIB_at_LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Linfeed Trick

On Tue, May 25, 2004 at 02:08:17PM -0400, Hickey,Thom wrote:
> Maybe others are doing this (or is everyone using XML?), but it's new to
us
> here.  Maybe this would even work with MARC-XML if you restricted
linefeeds
> to the end of record.

This is how MARC is read by MARC::File::USMARC in the MARC::Record CPAN
module :)

> On my workstation, grep can plow through 50 million Unicode MARC-21
records
> in less than 15 minutes.  The best time our C software can do is more than
> half an hour and our Python code could take several hours.

Cool! I've been working off and on on a Python port for MARC::Record,
and wasnt' able to find an equivalent to $/ in Python. But I'm a Python
newbie, so perhaps I overlooked something?

//Ed