Re: Are MARC subfields really useful ?

From: Houghton,Andrew <houghtoa_at_nyob> Date: Fri, 4 Jun 2010 10:24:45 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Dan Matei
> Sent: Friday, June 04, 2010 09:46 AM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] Are MARC subfields really useful ?
> 
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Demian Katz
> > Sent: 4 iunie 2010 15:27
> >
> >
> > VuFind also uses subfields to help with relevance ranking --
> > for example, words within subfield a of a title are given
> > more importance than words in the rest of the field --
> > extremely valuable if, for example, you're trying to find a
> > book ABOUT an author rather than a book BY that author.
> 
> 
> Right ! But that can be "reduced" to: a) before "/" b) after "/"

Yes it can, but the issue with MARC is that it mixes content and 
presentation and you are suggesting a presentation position rather 
than a content position.  The unfortunate part of MARC is that it 
doesn't use *enough* subfields to describe the content and what 
subfields it does define it includes presentation artifacts that 
have to be striped or replaced depending upon what you want to do 
with the data.

Having been through this issue when the DDC folks decided to add 
some additional fields and subfields to MARC the current thinking 
at LC and MARBI was a presentation position.  They couldn't 
understand why the DDC folks wanted to keep the presentation
artifacts out of the content fields and why there was a need to
use a separate subfield for textual/punctuation information.  In
the end the DDC folks were able to keep the presentation artifacts 
out of the subfields they defined so that *applications* could use 
the data, as is, and *applications* could provide the correct 
presentation that made sense in their local or regional markets, 
rather than pulling the data from the subfield, removing the 
presentation artifacts and replacing them with the appropriate 
local or regional presentation artifacts.

I'm not "Mr. Spalding", but to provide context to your question
directed at him, when full text indexing fields, it really
doesn't matter what presentation artifacts or punctuation you
have in the data since it will be stripped out by the tokenizer
you are using.  However, this is just one use case for the data, 
albeit a common one.  Every time I want to grab an ISBN from a 
MARC record I cringe when I have to remove numerous presentation 
artifacts from the subfield and I cannot just use the data, as 
is...

  020 ## $a 9780446561808 (pbk.)

vs.

  020 ## $a 9780446561808 $i (pbk.)

Yes, subfield-i isn't defined, but maybe it should be since
"(pbk.)" shouldn't be in subfield-a if its defined as the 
*ISBN*.

Andy.