Re: After MARC...MODS?

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Mon, 19 Apr 2010 16:29:31 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
Cory Rockliff wrote:
> MARC, as is often pointed out, was conceived of as a data exchange 
> format; and while our library systems, bizarrely, continue to provide 
> the MARC tags view as the default input mode for catalogers, they almost 
> universally store that data internally in an RDBMS. Do we no longer need 
> a record-like data exchange format if our systems use, e.g., 
> triplestores internally (a prospect which is a ways off, I think)?
>   

We _definitely_ still need to exchange data.  We need to do more, not 
less, sharing of data, cooperative cataloging, etc., and you can't do 
that without a standard exchange format.

We just need an exchange format which _really is_ just an exchange 
format, and does not become our standard element vocabulary schema too, 
or our rules for entering data.  MARC has become all of this.   Which is 
one of the reasons i've thought for a while we desperately need to get 
away from MARC -- not because MARC is not capable of expressing what we 
need (it is capable of expressing MOST but not all of it, although not 
always easily or prettily), but because moving away from MARC is the 
only realistic way to conceptually disintangle our data transmission 
format from our data vocabularly schema(ata), from our rules and 
guidelines for entering data.

What will that standard exchange format look like? Do we need a "record 
model", or can we we just get by with free-floating RDF atomic 
assertions?  I don't know.   I am less confident hitching our boat to 
RDF than some.  And certainly our data practices should not REQUIRE RDF 
among all technologies, but be somewhat agnostic toward semantic web 
technologies, IMO.  If its' well designed data though, it should 
_support_ serialization as RDF.   If we DO end up with RDF-compatible 
data, than the standard transmission format COULD be expressed as RDF, 
and probably other ways too. (Note that it's not enough to simply say 
"RDF" -- to get to a standard exchange format, you'd need to constrain a 
whole bunch more things, including RDF vocabularies used, and RDF 
serialization formats recognized.)

But I agree it's just as likely that we'll want "packages" of metadata 
in the form of "records" for some time to come, not JUST bundles of 
atomic RDF assertions.  

But I'm still not sure MODS is very helpful.  My reasons for not being 
enthused with MODS are NOT, as Cory suggests, "because it embodies the 
hieararchical document model of XML."  I've got nothing against a 
hieararchical document model, and I've got nothing against a "record" 
package based exchange format.   My reasons for being suspicious of MODS 
are because it STILL holds too closely to MARC, it's basically just a 
slightly prettified MARC.   It doesn't allow one to do _very_ much more 
than MARC does, and it still makes it harder for us to conceptually 
seperate our _transmission_ format from our data schemata and rules for 
entering data.

So the key thing here is that our _transmission format_ ought not to 
matter very much.  If we can get down our element schemata in a formal 
and clear and flexible way, and we can provide our guidelines for 
entering data in a transmission-format-independent way.... then the 
transmission format(s) are _easy_ after that.   Those are the hard 
parts.  Get those ducks in a row, and it's no longer a very hard problem 
to create one, two, many transmission/exchange serializations that all 
work fine.

Getting those ducks in a row is (at least from one perspective) the goal 
(or one of the goals) of RDA.  I am not certain how well it has 
succeeded, I have some trepidation.

Thinking that mere exchange/serialization format is _important_, that it 
will determine how we build metadata -- is a symptom of getting confused 
about the role of a serialization format vs the role of a formally 
defined element vocabulary/schema.   It's the latter that's hard, it's 
the latter that needs to be flexible enough to handle the various 
realistic possiblities we see for how we manage metadata.  The 
exchange/serialization format is not so important or hard.  That MARC 
seems so important and central to ALL our metadata management is 
testament to how out-of-control MARC has grown to be WAY more than just 
a data exchange format.

> Setting aside the question of our cooperative cataloging ecosystem 
> (OCLC, mostly, for the time being), which is dependent on the record 
> model, isn't it still handy to have an abstraction representing a 
> discrete parcel of bibliographic information (e.g., "manifestation 
> record" or "work record") rather than always needing to decide, on a 
> case-by-case basis, how much and what sort of data to harvest from 
> someone else's system?
>
> What I'm wondering is, has the "record" as an abstraction truly outlived 
> its usefulness? Is it a good idea to dismiss the many real wins of a 
> standard like MODS (which I won't enumerate here, unless someone would 
> like me to), a standard which has some traction in the digital library 
> world, because it embodies the hierarchical document model of XML?
>
>   
Received on Mon Apr 19 2010 - 16:30:36 EDT