Deborah Fritz wrote:
> Weinheimer Jim wrote:
>
> > I don't think FRBR is necessary. XML processing can eliminate
> > duplicates in all kinds of ways, so I still believe that the
> > main thing is to dump the ISO2709 format ASAP, change to some
> > kind of XML format, be it MARCXML or MODS, switch to URIs the
> > moment LC (finally) puts everything online, then share our
> > records widely (!!) in all different kinds of formats.
>
> Jim, can you clarify how "XML processing can eliminate duplicates"?
Actually, it's XSLT processing that can eliminate duplicates. XML can do very little on its own, you need the style sheets that will transform the XML file into something more useful, such as an HTML page or pdf document. There are other XML tools as well such as XQuery, which I understand less.
There are all kinds of things you can do with XSLT such as sorting, transforming, etc. in all sorts of ways that I think will take some time for people to fully appreciate. But one thing it can do is detect duplicate values and display them as you want. It can also perform fuzzy value detection. I understand the principle quite well, but haven't implemented it in a long time. For a short, semi-technical discussion, see: http://www.xml.com/pub/a/2002/10/02/tr.html
Therefore, you can make an XSLT to say that if you have the same 245abc, 250, 260, 300a, 4xx/8xx (don't know how this would work today with the new series treatments!), it could merge all the records with the same information into one record. You could also make it "fuzzy" with e.g. the 260.
Or we could merge based on completely different criteria and find out... who knows? This is where you can play and perhaps discover something new.
This is yet another reason why I hesitate to enact RDA and FRBR. If we want FRBR-type records, I think a *LOT* could be done with XSLTs to generate those new types of records automatically so that we can discover if they really are useful to our patrons or not.
There is less and less reason to de-duplicate manually today.
Jim Weinheimer
Received on Thu Apr 23 2009 - 03:24:16 EDT