FW: Reply to Library of Congress Working Group

From: Frances Dean McNamara <fdmcnama_at_nyob> Date: Wed, 25 Jun 2008 15:34:12 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Folks,
I meant to post this reply to my post from David Williamson at LC.

Frances McNamara
University of Chicago

-----Original Message-----
From: David Williamson [mailto:dawi_at_loc.gov]
Sent: Tuesday, June 17, 2008 10:03 AM
To: Frances Dean McNamara
Subject: RE: Reply to Library of Congress Working Group

I am not a member of the list.  The message came to me from someone
who thought I would be interested.  Feel free to forward it if you
like.

DW

On Tue, 17 Jun 2008 09:18:14 -0500, Frances Dean McNamara wrote:

>Dear David,
>
>David,
>
>Thanks for your reply and clarifications.  Perhaps you should share this response with the list.  I think it is good that the "enrichments" like TOC are being supplied via an automated feed.  More power to you.
>
>I think the point you make about:
>
>"until the library community embraces the idea that the catalog record can have minor variations from the book in non-controlled fields, the data in ONIX records will drive many librarians right up the wall."
>
>is very apt and is a discussion and challenge that should be vetted by librarians.  Yes, the library community needs to change and if they don't they need to justify why they are spending money on those minor variations.  I think re-tooling the CIP program is what was suggested by the report, so I'm glad to hear that at some point in the future that is planned.
>
>I think you should post your message to the list, not just me.
>
>Frances McNamara
>University of Chicago
>
>-----Original Message-----
>From: David W Williamson [mailto:dawi_at_loc.gov]
>Sent: Monday, June 16, 2008 6:24 AM
>To: f-mcnamara_at_uchicago.edu
>Subject: Re: Reply to Library of Congress Working Group
>
>Dear Ms. McNamara,
>
>Your recent post on the NCG4LIB listserv about the  Library of Congress Working Group made its way to me.  Your comments were specifically about the projects I work on so I felt I should respond and either answer your questions or clarify what was said in the report.
>
>Regarding the tables of contents and other links, while you may find these links problematic, many people find them incredible useful.  Our TOC links alone generate half a million hits a month on our server.  This is by far the most popular enhancement we can make to our records.  Many libraries, however, do remove the links and that is up to them to decide, I have no problem with that, but the many other people we serve by including those links definitely makes the project worthwhile.
>
>There has been a lot written about our TOC projects.  The last article was in ITAL, vol. 25, no. 1, 2006, http://www.lita.org/ala/lita/litapublications/ital/252006/2501mar/toc.cfm. "Enriching Traditional Cataloging for Improved Access to Information: Library of Congress Tables of Contents Projects" JOHN D. BYRUM JR. AND DAVID W. WILLIAMSON.  It explained the projects and gave cost information.  The generation of the data for the links from ONIX data is almost 100% automated, so there is little cost to LC.  The specific TOC link you cited was a special case in that the record started life as an electronic CIP record.  The cataloger was able to manipulate the TOC into the 505 field of the record using software originally developed by me to take ASCII text that the publisher supplies and convert it into MARC fields as the cataloger is creating the MARC record.  If the cataloger can do this quickly and easily with little or no fiddling with the TOC, then we add the 505 so that !
 the natural language keywords are
present in the catalog record.  Users have stated many times that they much prefer if we can get the TOC in the record so that it is searchable in the OPAC.  There was also a link to the TOC in the 856 field.  That also has a usefulness.  That link was created from the ONIX data I receive from publishers.  The automated process I spoke of extracts out the TOC, description, author bio, and/or sample text that may be present in the ONIX record for that title.  The HTML file is created, put on the server, and then the record is linked.  The process only looks for existing 856 links, it does not look to see if there is an existing 505 field.  This has shown to be useful because the TOC files (as well as the others) are being indexed by the search engines.  That means folks surfing around the internet may hit on one of these TOC files-- and they do.  Once they find the TOC, there is a link bringing them into the LC catalog where they can then do further searching by clicking on t!
 he links in the catalog record for
 the author, subjects, classification, or whatever other links are present in the OPAC display.  We've never been able before to grab someone surfing the net and bring them into the Library.  For that reason, we are allowing both the 505 and 856 links in our records, even if there is duplicate information.  I did a couple of surveys several years ago and 30% of users that were using our TOC files were finding them on the Internet through a search engine.  Another phenomenon that has been happening is that, for example, if I put up a TOC file about, say, a book on antique furniture, I have seen where an antique dealer who has that book and is selling it on his web site will link the TOC file on his web site, taking advantage of that file generated from the ONIX data received.  The project is able to help users in new ways never thought of previously.
>
>As far as the statement that our files seem crude as compared to Amazon, the data used to create those files is the exact same data that Amazon receives, LC just doesn't pretty it up.  If we are to keep the costs down to a minimum, then we cannot afford the staff time that Amazon puts in to processing the data from publishers.  It is Amazon that spends tons of money to have a data processing department to make the data look nice on their commercial web site.  LC is just making the raw data available for those who wish to use it.
>
>The other part of your posting dealt with the OCLC ONIX pilot.  I'm sorry, but I hope someone pointed out you were completely wrong in your assertion that we dismiss OCLC's creation of MARC from ONIX feeds.  To the contrary, I am looking forward to this happening at any moment.  This was exactly one of the projects I was assigned to try to accomplish-- take the ONIX data I receive, create initial MARC records from it, put them into a resource database and make them available to our staff for use in cataloging.  When I heard OCLC was working on this, my boss agreed that we should wait and see where this goes before putting any of my time into this.  I have already created software that performs a Z39.50 search against WorldCat, manipulates the records for LC needs, and then adds them into our catalog.  With the ONIX-derived records also available, that could potentially cut down the number of records we have to create from scratch for U.S. publications (initially) and eventu!
 ally for foreign publications as m
ore and more countries start using ONIX (I think about 15 are involved now).
>
>In the reply to the report, under 1.1.3.1 LC: Develop content and format guidelines for submission of ONIX data to the CIP program and require publishers participating in the program to comply with these guidelines, while we say that at the moment this isn't possible for many, many publishers, we do mention the ONIX pilot and say, "An OCLC pilot project to accept ONIX metadata from publishers, convert it into MARC, and create base-level records (equivalent to enriched IBC records) holds the possibility for making it easier for publishers who create ONIX data to apply for CIP" down the road when more publishers produce ONIX data.  In the same section we say, "LC participates in ONIX development with BIC/BISAC [that's me].  LC is participating in the OCLC Pilot Advisory Board [me again] for the pilot to accept ONIX data from publishers, convert it into MARC records, and make those records available in WorldCat. These will be basic, IBC-type records, as ONIX data are not geare!
 d towards library catalog use, but
 rather publishing industry use. When the pilot makes ONIX-derived records available, LC will examine them to see if they are suitable for use as the basis for CIP cataloging records."  I provided a lot of the reasoning for this because of my extensive (almost 8 years now) experience with ONIX and my participation on the BISAC Metadata Committee, the group in the U.S. responsible for developing the ONIX standard.  Many folks seem to think ONIX data will be the magic pill for bibliographic data, but until the library community embraces the idea that the catalog record can have minor variations from the book in non-controlled fields, the data in ONIX records will drive many librarians right up the wall.  I can provide many examples of where this will happen.
>
>The thing regarding ONIX that we disagree with comes in the next section, "1.1.3 Fully Automate the CIP process, 1.1.3.2 LC: Develop a mechanism to accept these data in a fully automated fashion so that the descriptive portion of the bibliographic record is created prior to cataloging." not because of the idea but "...because incompatibilities between ONIX and ECIP programming and publishers' workflows make this unworkable at this time."  Again, this relates only to the CIP program.  Retooling the CIP program would be a major undertaking, and before we go there, it's better to look at the OCLC ONIX pilot and see if we can take advantage of that now and then perhaps work on the recommendation once we see how these records will look.  With the software I have written, we can search for that ONIX-derived record and pull it in faster than creating a CIP record from scratch.  That will work for the short term and we can then see where we want to go in the longer term with ONIX d!
 ata and the CIP program.
>
>There is still a lot more I could say, and I hope I have clarified the projects I am involved in.  If you would like more information about these, please feel free to contact me.
>
>David Williamson
>
>
>
>David Williamson
>Acting Team Leader, Romance Languages Team, HLCD
>Cataloging Automation Specialist
>Library of Congress
>Washington, D.C. 20540-4300
>
>202.707.5179 (voice)
>202.707.2824 (fax)

David Williamson
Cataloging Automation Specialist
Acquisitions and Bibliographic Access Directorate
Library of Congress
Washington, D.C. 20540-4300
202.707.5179 (voice)
202.707.2824 (fax)
dawi_at_loc.gov