Re: Online Catalogs: What Users and Librarians Want

From: Karen Coyle <lists_at_nyob>
Date: Thu, 23 Apr 2009 06:51:49 -0700
To: NGC4LIB_at_LISTSERV.ND.EDU
You don't need XML for that. It might be handy, but you can do it with 
just about any data format. As a matter of fact, I just got a list of 
'matching name variants' from some work being done on the Open Library 
to match names from MARC records to wikipedia entries. Wikipedia has a 
lot of information, including full dates of birth and death, place of 
birth, titles and dates of works. Here are some of the name variants 
that were found in the MARC records, all of which match up to a single 
name in wikipedia using an algorithm. (Note, some of the differences are 
in Unicode encoding, and probably won't show up in an email message.)

    * *$a*A. C. Bhaktivedanta Swami Prabhupada*$d*1896-1977.
    * *$a*A. C. Bhaktivedanta Swami Prabhupada*$d*1896-1977
    * *$a*A.C. Bhaktivedanta Swami Prabhupada*$d*1896-1977
    * *$a*A. C. Bhaktivedanta Swami Prabhupa-da*$d*1896-1977.
    * *$a*A. C. Bhaktivedanta Swami Prabhupa-da*$d*1896-1977
    * *$a*A.C. Bhaktivedanta Swami Prabhupa-da*$d*1896-1977.
    * *$a*Bhaktivedanta, A. C.*$d*1896-1977.
    * *$a*Bhaktivedanta Swami, A. c.*$d*1896-
    * *$a*Bhaktivedanta Swami, A. C.*$d*1896-
    * *$a*Bhaktivedanta Swami, A.C.*$d*1896-
    * *$a*Bhaktivedanta Swami, A. C.*$d*1896-1977.
    * *$a*Bhaktivedanta Swami*$d*1896
    * *$a*Bhaktivedanta Swami Prabhupa-da*$d*1896-1977.


    * *$a*Athanasius*$c*Saint*$c*Patriarch of Alexandria*$d*d. 373.
    * *$a*Athanasius*$c*Saint*$c*Patriarch of Alexandria*$d*d. 373
    * *$a*Athanasius*$c*Saint*$d*295-373.
    * *$a*Athanasius*$c*Saint*$d*295-373 A.D.
    * *$a*Athanasius*$c*Saint*$d*ca. 298-373.
    * *$a*Athanasius*$c*Saint*$d*ca.298-373.
    * *$a*Athanasius*$c*Saint, Patriarch of Alexander*$d*d. 373.
    * *$a*Athanasius*$c*Saint, patriarch of Alexandria*$d*d. 373.
    * *$a*Athanasius*$c*Saint, Patriarch of Alexandria*$d*d. 373.
    * *$a*Athanasius*$c*Saint, Patriarch of Alexandria*$d*d. 373
    * *$a*Athanasius, Saint*$d*295-373 A.D.

You can see more here: http://edwardbetts.com/ol/marc_author_variants.html

I'd like to see a link between the LCCN in LC names and wikipedia pages...

kc

Weinheimer Jim wrote:
> Deborah Fritz wrote:
>   
>>  Weinheimer Jim wrote:
>>  
>>  > I don't think FRBR is necessary. XML processing can eliminate
>>  > duplicates in all kinds of ways, so I still believe that the
>>  > main thing is to dump the ISO2709 format ASAP, change to some
>>  > kind of XML format, be it MARCXML or MODS, switch to URIs the
>>  > moment LC (finally) puts everything online, then share our
>>  > records widely (!!) in all different kinds of formats.
>>  
>>  Jim, can you clarify how "XML processing can eliminate duplicates"?
>>     
>
> Actually, it's XSLT processing that can eliminate duplicates. XML can do very little on its own, you need the style sheets that will transform the XML file into something more useful, such as an HTML page or pdf document. There are other XML tools as well such as XQuery, which I understand less.
>
> There are all kinds of things you can do with XSLT such as sorting, transforming, etc. in all sorts of ways that I think will take some time for people to fully appreciate. But one thing it can do is detect duplicate values and display them as you want. It can also perform fuzzy value detection. I understand the principle quite well, but haven't implemented it in a long time. For a short, semi-technical discussion, see: http://www.xml.com/pub/a/2002/10/02/tr.html
>
> Therefore, you can make an XSLT to say that if you have the same 245abc, 250, 260, 300a, 4xx/8xx (don't know how this would work today with the new series treatments!), it could merge all the records with the same information into one record. You could also make it "fuzzy" with e.g. the 260.
>
> Or we could merge based on completely different criteria and find out... who knows? This is where you can play and perhaps discover something new.
>
> This is yet another reason why I hesitate to enact RDA and FRBR. If we want FRBR-type records, I think a *LOT* could be done with XSLTs to generate those new types of records automatically so that we can discover if they really are useful to our patrons or not. 
>
> There is less and less reason to de-duplicate manually today.
>
> Jim Weinheimer
>
>
>   


-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------
Received on Thu Apr 23 2009 - 09:54:20 EDT