LibraryThing and FRBR entities

From: Brenndorfer, Thomas <tbrenndorfer_at_nyob> Date: Wed, 25 Apr 2007 10:24:04 -0400 To: NGC4LIB_at_listserv.nd.edu

I'd thought I would cross-point to this list from the FRBR list, just to
join in the fray. This posting follows up on some observations of Tim
Spalding's work on LibraryThing

> -----Original Message-----
> From: Zoltan Tomory [mailto:zoltan.tomory_at_mobot.org]
> Sent: April 24, 2007 2:27 PM
> To: Brenndorfer, Thomas; frbr_at_infoserv.inist.fr
> Subject: RE: LibraryThing and FRBR entities
>
> > LibraryThing relies to
> > some extent on their end-users, and so perhaps, wiki-like, we can
use
> > some of the user data and further user input to fill the gaps in the

> > old data set. There is a limit in size to the data, but no limit to
the
> > time required to set things straight. Or perhaps we can draw upon
other
> > services, like
>
> I'm all for a community of contributing catalogers loaning information

> back and forth.  Of course, if OCLC is doing a decent job, we may not
> have to.  Still, it is comforting that in the worst case we could
share
> data peer to peer like Napster, but using content we generate
ourselves,
> unlike Napster.
>
>
> > I wonder if there shouldn't be a cutoff point, between existing data

> > and new data better suited for the new environment. That might very
> > well mean that legacy data sets will be forever weaker, but there
> > might
be
>
> If you do something, it is worth doing well.  If you have a new
format,
> convert your old data. Period.
>
> > I don't think making better use of web pages should be the only
thing
> > catalogers should be working towards-- there is still a need for A-Z

> > lists and single entry formats, for example. But there do seem to be

> > enough solutions out there that could be applied to the perceived
> > weaknesses in our current cataloguing practices.
>
> The web pages exist exclusively at the presentation layer--if you are
> doing it right.  You can spin web pages without MARC or other
> bibliographic metadata, but in doing so you emulate the card catalog
and
> allow errors to propogate, waste time on filing, checking filing, etc.

I do think that the underlying data has to be structured and normalized
correctly, and the web page is just the presentation of all that data,
but I think the question here is about the direction to take things--
what underlying data needs to be fixed to display catalogs in ways (in
my view, FRBR ways) that make our catalog data the most user friendly.

As an example...

Our catalog has gone through various generational changes, and one
evolution that is most instructive to follow is the way in which users
find related works once a bibliographic record is on the screen.

In earlier versions of our catalog there was a Related Works screen that
listed authority controlled headings (author, subject, series) that
users could select. The next screen to appear would be a bibliographic
summary screen of titles linked to that heading.

Once we went to a web-based catalog, the Related Works screen was
replaced by underlined hyperlinks. At first the only option was to click
and navigate to a bibliographic summary screen of titles linked to that
heading. This saved a step from before, but the result was the same.

Eventually another option emerged, and this option was most useful for
subject headings. Clicking on a subject heading brought the user back to
the subject browse list. This was very practical especially with subject
headings with many subdivisions. Users could see that subject heading
and its context and so become aware of other ways to expand their
search.

The next leap is to have the user click on the subject heading and have
a page controlled by that heading appear. Similar to the way
LibraryThing handles things, on that page would be both the browse list
segment of similar headings AND the list of titles attached to that
headings. Ideally further ways to refine the search (with facets) or to
expand the search (with links to related headings) could be added to
this page.

Conceptualizing this I immediately drew upon the FRBR (and FRAR)
diagrams of bibliographic entities in boxes, each connected outwards to
other entities as the way to depict relationships. I could see how each
entity would have its own web page, and once established this way, the
entity could then became the target for a seemingly endless number of
operations such as those offered by social networking tools. The web
page would still be underpinned by good cataloger-supplied data. It
would make sense to have some sense of permanence to the web page, which
gets into the idea of work identifier or authority numbering. Perhaps
some variation of external/internal construction can take place, with
authority data pulled in through network services from central agencies,
and local information integrated below on the screen. Enriched content
services like Syndetics get us part of the way there.

Fixing that data though is I think of some importance. In following the
discussions about FRBR, RDA, and the next generation catalog, I see a
conflation of what needs to remain very distinct in all discussions of
metadata: content rules, semantics, and syntactical rules. AACR2 and RDA
are primarily content rules and MARC is primarily a set of syntax rules
for machine manipulation. Both sets overlap in determining the semantics
of what is represented in each field (i.e., does the 100 field mean an
"author," or is it really a primary access point that could represent an
entity in the form of a person (or maybe a family) responsible for the
work, the first work, or the collective work contained in the
bibliographic resource).

A good illustration is the problem of constructing good work records out
of our existing records:

The 245 title proper can be the heading of the work.
Often it's co-tagged with a 100,11X field, and the initial articles
remain present in the 245. Together these fields help to identify a
work.
Sometimes the 240 substitutes for the 245 and is co-tagged with the
100,11X (our system generates an actual authority record for these
co-tagged fields).
In the event of a title main entry, the 130 can substitute for the 245
to identify a work.
Works can also be represented by the 700,71X fields but only when a
subfield $t is present. 730s can also represent works.
The 700,71X fields are also used for other entities such as co-authors,
editors, illustrators, etc.
The work entities in the 7XX fields can be coded as related works or as
analytical works (works contained in the resource in hand) depending on
the choice for the second indicator.
A similar construction for works can be built out of 600,61X,630 fields.
If series are considered as works the 8XX fields fulfill this purpose.
But if the title string on the resource matches the established form
then a 440 is used (initial articles might be retained in the 440 but
this may conflict with an 830 for the same heading).

Oh, and many of these syntactical fields also double or triple up as the
foundation for expression and manifestation headings.
Variations on titles of the work can appear in the bibliographic record
(in 246 or 740 fields) or in the authority record (4XX fields).
Relationships between works can be made by the use of "added entries" in
bibliographic records or by the use of SEE/SEE ALSO references in
authority records. The various ways of identifying a work with alternate
headings are not necessarily aggregated from all the bibliographic and
authority records representing that work.

Our system also has a "multi-use authority" function, whereby, for
example, the work heading from a 730 can also be used in a 630. A series
can be a subject (or even a related heading found in a 730, but it's
helpful to have that heading with no initial articles).

In conclusion I think all of this encoding works very well for the
production of printed catalog cards, but it seems overly convoluted for
the next generation catalog. RDA as the basis for content rules remains
quite sensible, but there does continue to be a need for greater
alignment with semantic and syntactical rules, and with ongoing data
normalization to really pin down what are the "things" and
"relationships" we want to catalog and convey to the user.

Thomas Brenndorfer, B.A, M.L.I.S.
Guelph Public Library
100 Norfolk St.
Guelph, ON
N1H 4J6
(519) 824-6220 ext. 276
tbrenndorfer_at_library.guelph.on.ca