Harvey,
All of this change to the MARC record to extend it's life is great and
all, but to what end? It's highly unlikely that this would make MARC
any more appealing to any other non-library content producing entity
and does nothing to help us bridge the gap and fit into an
increasingly information-aware universe.
How is this any more than blindly clinging to our beloved and ever
sacred cow, furiously polishing our turd as the rest of the world
ignores us?
I honestly have great appreciation for MARC, but I don't think it's
purpose was to isolate us in the digital age.
-Ross.
On 8/24/07, Hahn, Harvey <hhahn_at_ahml.info> wrote:
> Katherine McConnell wrote:
> |Quoting Bernhard Eversberg <ev_at_BIBLIO.TU-BS.DE>:
> |> Hahn, Harvey wrote:
> |>> I've argued the non-need of ISBD in MARC records
> |>> repeatedly in cataloging forums to no avail. Oh, well...
> |
> | I am loving this line of discussion where we pull apart the
> |components of what we work with. A MARC database is a thing of
> |beauty. Separating it from the constraints of AARC2R and/or ISBD and
> |even LCSH leaves it open for use in areas other than traditional
> |libraries. The design of the database and the level of indexing
> |available and done by most ILSs leaves me wanting to use it for other
> |data stores. And MARCXML starts to look a lot more attractive. Time
> |for MARC to break out of the library system?
>
> What's really interesting is that many (most??) people are unaware that
> MARC21 is only one of thousands (maybe millions) of structures for
> records that are possible using the "MARC" format. You have to "think
> between the lines" of the MARC21 structural definitions to see the far
> more general possibilities:
>
> <http://www.loc.gov/marc/specifications/specrecstruc.html>
>
> There are four hardcoded values in the general MARC structure: (1) the
> leader is a 24-character ASCII alphanumeric string, (2) the record
> length is a 5-character ASCII numeric string, (3) the base address of
> data (part of the leader) is a 5-character ASCII numeric string and (4)
> a tag is a 3-character ASCII alphanumeric string; everything else
> defines a specific type of MARC record structure. (There is another
> "hardcoding" aspect in that some of this data is limited to a *single*
> digit, that is, a maximum value of 9.) At this point in time, there is
> one and only one defined and implemented MARC record structure: MARC21.
> The MARC21 implementation further defines other hardcoded values in the
> leader that contribute to what that particular record structure looks
> like.
>
> There are some other hardcoded groupings in MARC as well--there are
> three parts to a MARC record: (1) leader area, (2) directory area, and
> (3) data area; and there are three delimiting characters: (1) subfield
> delimiter (SFD), (2) field terminator (FT), and (3) record terminator
> (RT).
>
> I might note that, although created in 1988, the Tag(ged) Image File
> Format (TIFF) is conceptually similar to the general MARC format from 20
> years earlier (a tribute to the genius of Henriette Avram and her
> team!), particularly since they both share the idea of tagged data.
>
> As I mentioned earlier, what makes MARC into MARC21 are certain values
> in the leader. But what if those values are changed? Here they are
> (zero-based from the start of the leader), with all values in the range
> 0 to 9:
>
> 10: number of indicators
> 11: length of identifier (subfield code)
> 20: max number of digits in length-of-field value
> 21: max number of digits in starting-character-position
>
> The respective values 2, 2, 4, and 5 (and tags limited to numeric values
> 000 to 999) define the structure of what we know as MARC21 records. But
> there's nothing to say that you couldn't change these values to come up
> with a *different* (i.e., non-MARC21) kind of MARC record.
>
> If you think about varying the values for a little bit, you'll probably
> note that the values in positions 20 and 21 above are actually
> constrained by the 5-character record length; in other words, the
> "practical" ranges of values would be 2-4 for length-of-field and 3-5
> for starting-character-position, with the greatest practicality and
> flexibility at the high ends. I think there would be greater value to
> increasing the number of indicators by 1 or 2 and increasing the size of
> the subfield code by 1 to permit things such as $aa, $b3, and $12. Both
> of these changes would increase granularity and flexibility in coding
> data. I first came across these kinds of thoughts 20-some years ago in
> Walt Crawford's book "MARC for Library Use". In the second edition,
> page 33, he says:
>
> "The standard allows a very wide range of implementations. A format
> need not have any indicators or subfields to be a Z39.2 format (i.e.,
> positions 10 and 11 of the leader could both be '0'). A format could
> also have eight indicators per field and subfield codes which were six
> characters long--with positions 10 and 11 being '86'--and still be a
> Z39.2 format.
>
> "An implementation could even *theoretically* have different directory
> structures for different records, since the leader in each record
> defines that record's directory. In practice such an implementation
> would be quite difficult to use, as the associated data dictionaries and
> parsing rules would be extremely complex."
>
> But there are two *other* things that could be changed, too--one legal,
> one currently illegal.
>
> The legal change has to do with the content of the 3-character tags.
> MARC21 limits their values to numeric ASCII values, but the MARC
> definition of a tag indicates that it contains *alphanumeric* values,
> that is, each of the three values of a tag can be numeric and/or
> alphabetic. This gives a possibility of up to 46,655 tags rather than
> "merely" 999. (Of course, this is "legal" MARC, but *illegal* MARC21.)
>
> The illegal change has to do with the challenge in today's electronic
> world that some people wish that the MARC record could carry digital
> data content within the MARC record itself. This is currently
> impossible because of the hardcoded record length of 5 digits, limiting
> record lengths to 99,999 ASCII characters; with multibyte Unicode
> characters, that would be reduced in half (or more!) in the blink of an
> eye--it's still 99,999 8-bit bytes of data, however.
>
> I can think of two ways around the limitation--but, of course, it means
> changing the world! ;-) One method, not easily human-readable at all
> and requiring all currently existing MARC records to be rewritten, would
> be to redefine the five numeric digits from base-10 to either base-16
> (hexadecimal), base-32, or base-36. All of the latter could
> meaningfully use both numeric and alphabetic characters in each of the
> digit positions. For example, most of you know that the hexadecimal
> system uses the numbers 0 to 9 and the letters A to F for the 16 needed
> digits; the base-32 system (16 doubled) would use the numbers 0 to 9 and
> the letters A to V; a base-36 system (all 10 numbers and 26 letters)
> would use 0 to 9 and A to Z. Although base-16 (max record size =
> 1,048,575 characters), base-32 (33,554,431 characters), and base-36
> (60,466,175 characters) increase the maximum MARC record size, many
> current digital files (and most future digital files) still would not
> fit within the increased size constraints.
>
> A second (and, I think, much more flexible for the future) method would
> retain the initial 5 characters for record length *info* and the
> 24-character standard for the leader (as the first method above also
> does) but, instead of using the first 5 characters as the actual length,
> it would use them as a *pointer* to where in the MARC record the true
> length can be found. Since no known MARC records have a length anywhere
> near approaching the maximum 99999 value, I suggest that an initial
> digit "9" would indicate that the number is a pointer rather than a
> value. (That way, all current MARC records can exist as is, without any
> changes needed.) The remaining four digits would then indicate a
> position (or, perhaps, an offset to a position). The position would be
> just after the directory and before the data content. This location
> (containing the actual length of the record) could be variable in length
> (just like data content) and terminated with a field terminator
> character, just like the directory and variable fields.
>
> There's a *second* set of 5 digits within the leader that needs to be
> redefined when the first 5 characters are a pointer rather than a value:
> the "base address of data = length of leader + length of directory + 1"
> needs to be changed to "base address of data = length of leader + length
> of directory + 1 + length of record size + 1". (The two 1's represent
> the length of each of the two field terminators involved.)
>
> With this approach, all current MARC records can be handled without any
> changes to parsing and reading/writing routines. The difference is that
> MARC software would need to *add* new parsing and reading/writing
> routines to handle the new situation where the record size begins with
> the digit "9". New reading/writing routines might have to be added as
> well to handle the new digital content that might exist within records.
> This content might be identified either with one or more "standardized"
> 9XX tags or, perhaps preferably, with *alphanumeric* tags (permitted in
> MARC but not currently in MARC21), where a leading alphabetic character
> might perhaps indicate a particular type of digital content. It would
> probably work best if these new tags would be exempt from field length
> limitations.
>
> Obviously, what I just said would work only when there is a single
> digital content element in the record (because the starting position of
> the digital content would still be a relatively small number, capable of
> being handled by the current MARC directory structure). If records
> needed to carry multiple digital content elements, then the directory
> structure would have to be revamped (either to permit lengthier fields
> or to use pointers, like the record size in the leader that I've
> proposed), and parsing and reading/writing routines would have to be
> newly written for these records containing digital contents. The
> problem with "merely" lengthening the directory fields is that the
> single-digit values in the leader for this data (the "4" and "5" near
> the end of the leader) would limit positions to only 9 digits, that is,
> a positional value of 999,999,999--a pretty big number but still limited
> in terms of possible future needs. To use (and where and how to store)
> multiple pointers instead gets complicated, and I haven't thought about
> the ease or challenge of that.
>
> The *big* challenge for any solution is how to avoid rewriting or
> restructuring all existing MARC records. My suggestion of changing the
> first 5 characters of the leader from "record size" to "record size
> information" (permitting a record size or a pointer to the record size,
> depending upon a key value to determine which) accomplishes that.
> Existing MARC records (and future ones requiring no more or no different
> cataloging information than now) can be handled as they are now.
> However, "special" MARC records containing digital content (coded with a
> first digit of "9" in the "record size information" area of the leader)
> can also be handled with new parsing and reading/writing software
> routines.
>
> As I recall, I think I may have described a lot of this a year or two or
> more ago on either this list or some other lists. In any case, I don't
> pretend to have all the answers, but maybe my thoughts above might
> stimulate some further explorations of enhanced MARC solutions to some
> of the issues discussed here.
>
> Harvey
>
> --
> ===========================================
> Harvey E. Hahn, Manager, Technical Services Department
> Arlington Heights (Illinois) Memorial Library
> 847/506-2644 - FX: 847/506-2650 - Email: hhahn(at)ahml(dot)info
> OML & Scripts web pages: http://www.ahml.info/oml/
> Personal web pages: http://users.anet.com/~packrat
>
Received on Fri Aug 24 2007 - 18:42:39 EDT