Next Gen Catalog and FRBR

From: Brenndorfer, Thomas <tbrenndorfer_at_nyob> Date: Mon, 14 May 2007 10:00:27 -0400 To: NGC4LIB_at_listserv.nd.edu

The general needs for a next generation catalogue were articulated in
the original FRBR report at http://www.ifla.org/VII/s13/frbr/frbr.pdf .

The original guiding principles for the construction of catalogues were
derived from the Paris Principles and the International Standard
Bibliographic Description.

Factors contributing to the change in the environment in which
cataloguing is done include:

- ongoing development of automated systems to create and process
bibliographic data

- growth of large-scale databases and shared cataloguing

- need to reduce cataloguing costs by minimizing duplicate cataloguing
costs

- new formats that require adaptations in cataloguing codes and
practices

- increasing range of user expectations and needs

What should a next gen catalogue do? I think the user tasks as defined
by FRBR are actually a very good basis for eliciting and evaluating
ideas. While RDA is really about descriptive data that satisfies these
user needs, the overall next gen catalogue should be about software
services that meet those needs.

I think what's very important is the other part of the FRBR-the part
about entities and relationships. Whatever data structures are used in
the next gen catalogue, there should be a strong adherence to these
entities. For example, is an author an attribute of an entity, or is
part of a relationship between entities? (I'd say the latter). Should a
relationship between the entities of work, expression, and manifestation
be clearly delineated? (I'd say yes). Is a series title an attribute of
a work or is an entity that has a relationship with the work? (I'd say
the latter)

In reverse order, the FRBR tasks are OBTAIN, SELECT, IDENTIFY, and FIND.
The FIND function is really about making sure there are good index
structures in place, especially ones that are helpful in directing users
such as SEE and SEE ALSO references, and newer function such as spell
checking and "Did you mean" features.

The OBTAIN function is all about getting a hold of the physical or
electronic resource. Everything from call numbers to purchase
information to interlibrary loan forms is covered here.

The SELECT function is defined this way in this way in the RDA
objectives http://www.collectionscanada.ca/jsc/docs/5rda-objectives.pdf
:

select a resource that is appropriate to the user's requirements with
respect to content, format, etc.

but in the original FRBR paper the word "entity" is used instead of
"resource":

using the data to select an entity that is appropriate to the user's
needs (e.g., to select a text in a language the user understands, or to
choose a version of a computer program that is compatible with the
hardware and operating system available to the user)

When I look at a catalogue front-end service such as the one provided by
Endeca, I see the SELECT function a play. Bibliographic resources in the
form of regular catalogue records (what I would call records based on
the manifestation level in FRBR) are flanked by options to limit or
refine the search.

Instead of relying only on descriptive data that the catalogue user has
to eyeball in each record, the attributes that one utilize for selection
(finding something appropriate to the user's needs) are brought to the
foreground. Using the Endeca interface at McMaster University in Ontario
(http://libcat.mcmaster.ca/), the options to use to refine the search
include: call number, publication year, location, format, subject,
publication type, language, geographical region, subject era, and
author.

I see a problem with the use of the term "resource" versus "entity". We
can't really do what the original FRBR document states. We don't have
properly "FRBRized" records whereby entities are clearly separated and
linked by relationships.

I can think of a great example of why it's important to think of
entities, especially with the IDENTIFY user task. The Internet Movie
Database (http://www.imdb.com/ ) gets it right in my view.

The IDENTIFY task in RDA is defined as:

identify the resource described (i.e., to confirm that the resource
described corresponds to the resource sought, or to distinguish between
two or more resources with similar characteristics).

In FRBR, the IDENTIFY task is defined as "using the data retrieved to
identify an entity."

I see this as a disambiguation function, one in which the search result
is not a list of resources or regular bibliographic records (defined at
the manifestation level), but a list of entities, divided by the type of
entity.

This is exactly what IMDB does!

If I do a search on "Tolkien" in IMDB I get a web page that separates
ways in which "Tolkien" could be identified with a particular entity.

The divisions are: popular name (the person entity), keywords (these
seem to be user supplied tags), names-partial matches (more people
entities), titles-partial matches (work/expression/manifestation
entities), and companies-partial matches (corporate body entities).

When I look at that type of result in IMDB, I see something that
resembles the reference interview. When the user asked for "Tolkien" did
the user mean the author? Was the user thinking of that name in a title?
Maybe there is corporate body or event that the user had in mind? Is the
user scouring for anything so a wide general keyword search that dredges
everything up would be an appropriate response?

So what would a next gen catalogue have?

How about good index structures with references and other tools like
spell-check to facilitate the FIND function.

How about a disambiguation function for FRBR entities to meet the
IDENTIFY function? We need to FRBRize, that is deconstruct and
reassemble, our catalogue records to align them in sensible
relationships and matching user needs.

How about filtering features such as faceting and clustering software to
help users SELECT the resource they want?

How about a rich set of networked and interlinked features that allow
the user to OBTAIN the item?

I see RDA has handling the descriptive elements for all this, as well as
the basic index structures. I think the next generation of MARC and
other software services will have to handle the raw manipulation of
data. How and where the data will be stored seems to be a significant
piece of the puzzle in building the next gen catalogue.

Thomas Brenndorfer, B.A, M.L.I.S.

Guelph Public Library

100 Norfolk St.

Guelph, ON

N1H 4J6

(519) 824-6220 ext. 276

tbrenndorfer_at_library.guelph.on.ca