I've talked to the OL folks about exporting a set of MARC records for
all the items that have an open digital version. (Exporting OL records
for the set is relatively simple.) The difficulty is making sure there
is a good MARC record for each of them. It should be relatively easy
for any records with an LCCN, or perhaps an ISBN, but more difficult
for others.
If you've got the item, do you really need a full MARC record with it?
What would work for folks who want to link from OL digital works to
their own catalog records? An index of OL IDs and LCCN's/ISBNs/OCLC#s,
linked to the Archive ID for the digital work?
I'd love to have a definitive idea of what would be useful.
kc
Quoting Eric Lease Morgan <emorgan_at_ND.EDU>:
> On Jul 1, 2010, at 11:40 AM, B.G. Sloan wrote:
>
>> Most of the book-scanning projects are focusing on digitizing works
>> in the public domain, right? And the public domain is basically
>> books published before 1923, right? So, aren't most of these
>> projects the equivalent of building a physical library collection
>> of pre-1923 books?
>
>
>
> Along the lines of what is outlined above, I have done a bit of an
> experiment to see how difficult it would be to supplement our
> physical holdings with the digital holdings of the Internet Archive.
> After all, the content there is free. Here's how:
>
> 1. Dump - Export all of your bibliographic MARC
> records to a file.
>
> 2. Parse - Extract the authors, titles, and
> other identifying information from a MARC
> record.
>
> 3. Search - Use the result of Step #2 to create
> REST-like searches of the Internet Archive
> making sure results are returned as XML (or
> some other machine-readable format).
>
> 4. Verify - Validate the search results making
> sure they correctly match the MARC. There may
> be false hits, or there may be multiple hits.
>
> 5. Download - For each Internet Archive records
> that adequately matches the MARC record,
> mirror the remote Internet Archive version of
> the data locally. The PDF as well as the plain
> text.
>
> 6. Update - For each downloaded record, update the
> MARC record with two additional URLs. One
> pointing to the Internet Archive, and another
> pointing to your local mirror.
>
> 7. Go to Step #2 - Continue the process for each
> record in your set of MARC records.
>
> 8. Reindex - Make searchable your MARC records as
> well as the full text that has been mirrored.
>
> 9. Provide services - Enable search against the index.
> Search results can point to your local physical
> copy, your local mirrored copy, as well as the
> remote (canonical) Internet Archive copy. Provide
> services against the results enabling users to do
> things like: print-on-demand, bind, do concordance
> against, generate word cloud, put on reserve, add
> to a syllabus, annotate, rank, review, graphically
> illustrate the use of frequently used n-grams, etc.
>
> As alluded to above, some of this work has been done. More
> specifically, using the MARC records from a thing called the
> "Catholic Portal", a graduate student and I did Steps #1 through #7.
> The hardest part is Step #4. The coolest part is Step #9.
>
> If we, as a profession, were to get to Step #9, then we would be
> seen as providing truly cutting edge and valuable services to our
> constituents. Step #9 represents the growth opportunity.
>
> --
> Eric Lease Morgan
> University of Notre Dame
>
--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Thu Jul 01 2010 - 14:47:45 EDT