encoding formats (was Re: Next Gen Catalog and FRBR)

From: Jonathan Rochkind <rochkind_at_nyob> Date: Wed, 23 May 2007 10:22:58 -0400 To: NGC4LIB_at_listserv.nd.edu

As long as we bury MARC-8 character encoding forever, and use UTF-8.
(Character encoding is really tricky, and using our own MARC-8 encoding
just makes things worse, and is causing real problems right now). And
get rid of the length limit in MARC, both for the record as a whole, and
for individual fields. And, like Bernhard says, increase the number of
field and subfield codes available.  And add field-level versioning.
And add the ability to have better hiearchical data---for instance, we
currently have transcribed contents in a 5xx, and then controlled
contents in a 7xx---but there's no way good way to correlate a
transcribed item from the 5xx with it's controlled version in the
7xx--this is a problem.

Sure, MARC _could_ be extended to do all this. But once you've done all
this, you've changed MARC so fundamentally....  why are you using MARC
at all? How about at least use MARC-XML instead, which has already done
some of this, and makes the rest of it easier? But I'd like MODS even
better. We also need to get rid of the some of the duplication of data,
and ambiguity about where to put data (And YES, I realize that a
transcribed field and a controlled field are not duplicated, they are
each different. Even once you realize this, there is still a bunch of
confusing duplication of data in our current MARC practices. I think one
of the problems with MARC is that it's _too_ flexible in fact--there are
too many different ways to do the same thing, which only increases the
chance that different people will do it differently).

But I agree that encoding format is not our fundamental problem.  First
we've got to be clear about what data we have/want/need, and THEN we can
decide how to encode it. But this ALSO assumes that MARC _is_ just an
encoding format. In fact, we use it as our conceptual domain model too,
and we use it to provide guidance for how to formulate values too.
We've GOT to get rid of this mentality, and it's going to be hard to do
without getting rid of MARC too.

Bernhard Eversberg wrote:
> Karen Coyle wrote:
> >
> > So if nothing else, we need to move to a format that doesn't keep us
> from adding to our metadata record.
>
>
> But the potential of MARC is still far from exhausted! Why
> do tags have to be numeric only and why are capital
> letters not usable as subfield codes? Get rid of these
> restrictions. Anything non-MARC will have a much longer and
> harder time to become mainstream.
>
> OTOH, it isn't really the format we should focus on but the content.
> MARC record content is too terse. Add ToC data, at least, and
> figure out how to do some useful ranking based on that.
> AND make browsable indexes a feature of NGC. Our structured
> data makes them possible, search engines can't do it.
>
> B.Eversberg
>

--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu