Re: Book tagging: Amazon and LibraryThing

From: Tim Spalding <tim_at_nyob> Date: Mon, 26 Feb 2007 13:33:57 -0500 To: NGC4LIB_at_listserv.nd.edu

Jonathan et al.,

So, first, I want to reiterate that FRBR is a good idea. Large-scale
adoption of it or any similar system would be a huge boon to
libraries. My money is not where my mouth is--LibraryThing itself has
a totally binary understanding of works. It puts our data head and
shoulders above even sites like Amazon, where the tags and reviews for
the British paperback and the American hardback are in separate
buckets. Any system is better than none, and FRBR is a lot better than
none! I also think a "Fuzzy FRBR" could be 95% the same model as the
unfuzzy one. I do not want to overturn FRBR.

What I want to argue for—and I'll stop after this, since you've heard
it too often—is that FRBR partakes of the same all-or-nothing "binary"
logic that much of the rest of library data does. I think this is out
of step with reality and with how information is organized and
searched today. "Show me the editions of Hamlet" is a question to
which a FRBRized OPAC can provide no relevancy ranking. "Here's the
Pop-up Hamlet and here's the Folio edition—take your pick!" This isn't
how we think, and it's very much not how search engines work. It puts
FRBR on the wrong side of an epochal shift in how data is processed
and understood.

I don't think I misunderstand the model. The Hamlet example shows that
you must create a concept of "the work Hamlet" over and above all real
and existing copies. It exists only as an ideal. It exists or doesn't.
And it's children are all equal. This is certainly a clean and
powerful model, but it doesn't do everything we'd want it to do. And
we err when we forget that it's just a model.

Certainly rigorous models can be helpful, but the idea that "The
digital world demands a rigorous formal data model" and cannot "work
on an implicit, sloppy, ambiguous, un-stated model like we de facto
have now" is where I get off the bus.

This is very much the crux of the matter. As I see it, the web works
on just the chaos you dismiss.

Search for Hamlet on Google, and Google does not consult a database of
ideals for the Platonic "Hamlet" and then check all its children. It
has no rigorous data model. The pages about Hamlet are assembled and
sorted statistically drawing on the content and actions of millions of
pages and people. It works on sloppy, ambiguous data.

Libraries have failed to play their due part in the great web
conversation in part because their models do not work like the web
works. Things like links, regular people and statistics are not
traditional elements of library data. So, rather than putting
themselves wholeheartedly on the web, they've tried to cut special
deals with search engines. Rather than let linking sort out relevance,
they have stuck with OPACS that nobody can link to and which have no
social existence. They have stuck with models of "aboutness" and, with
FRBR, "belonging" which are binary, not rich.

PS: Explicitly private tags are an excellent idea. If possible, the UI
should allow them while not adding much complexity. I favor allowing
something like "[given by my mistress]" to separate boxes, etc. We've
avoided them largely because security is hard and we're growing so
fast we're likely to screw-up. Even Amazon had a day when all the
reviewers were exposed. (As you might expect, authors reviewed
themselves a lot!) If you don't promise privacy, you can't fail to
deliver...