Re: $$$ Library data is the best $$$

From: Rinne, Nathan (ESC) <RinneN_at_nyob> Date: Tue, 15 Sep 2009 10:20:46 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

Ugh: 

"And I don't think that to say this is not to say that what the
marketplace pursues is wrong"

Should be 

"And I don't think that to say this is to say that what the marketplace
pursues is wrong"

(no "not")

Nathan Rinne

Media Cataloging Technician

Educational Service Center

11200 93rd Avenue North

Maple Grove MN. 55369

Email: rinnen_at_district279.org

-----Original Message-----
From: Rinne, Nathan (ESC) 
Sent: Tuesday, September 15, 2009 9:51 AM
To: 'Next generation catalogs for libraries'
Subject: [NGC4LIB] $$$ Library data is the best $$$ 

All,

First off, I like what Bernhard has to say about the "openness of our
product".  What you see is what you get.  Any educated person who looks
at it closely will not only be able to figure out how it works and how
to use it - but can also clearly see its weaknesses (which yes, are
many) 

So yes, its transparency is one of its strengths, even if now, because
of our current environment, this does not make a large impression on us.
With Google, on the other hand, because of the "under-the-hood" nature
of the beast, that cannot be said - even if, given the current
environment, this does not strike us as problematic.  

So, who really is "open"?  It depends on what we are talking about, and
what we find desirable at the time, depending on our circumstances.  

Second, if people can get on board with what Aaron Dobbs suggests, I
hope it works - that the "trick" can be pulled off.  I really do hope
that Google is ready to listen ("...it would be nice if Google, et. al.
WOULD use our data and develop cool mash-ups that we could piggy-back
off of  to the mutual benefit of both ourselves AND Google."[--Jane W.
Jacobs]). 

But. 

Jonathan said: 

"The idea of libraries collectively "demanding" that Google figure out
how to get meaning out of our data that we haven't managed to encode in
an un-ambiguous machine-readable way in the first place (not only
legacy, but we STILL don't do it right)...   while at the same time
complaining that all Google ever does is take from us and we'd rather
they didn't have our data at all or had to pay a lot of money for it...
It's pretty ironic."

Again, it's good that we can point out all the problems we have.  But
again, as Bernhard points out - we, and anybody who takes the time to
look really hard, at least *can clearly see* all the problems.  And
let's not forget all that is good about what we have!  There *is* value.
Tremendous value.  And scholars and the elites of society know it.

Based on the arguments of Jane W. Jacobs a few posts back ("the
perversity of human psychology might be at work... it may be the
availability NOT the inaccessibility of MARC records that make some
people ignore them or question their value") and the views of Thomas
Mann, I am going to take the contrarian position that we should not be
giving our metadata away freely without knowing how it is going to be
used - to Google, or anyone (yes, data is a "public good", but the
government doesn't just give away roads, or anything else for that
matter, without knowing how they will be used).  ***It is a fool's
errand mainly because if we do this, the data, in effect, will be
perceived as valueless - or at least of no greater value than any man's
tags.***  If we are not proud (not arrogant) of our data - and can not
see that it is more valuable than everyman's tags, eventually people
will not only not see the value of using it, but also not see the value
in continuing to produce it (since producing it is expensive).  The
comprehensive data that we have was created to be the backbone of one,
whole, functioning system (and therefore its abuse cannot be justified
using reasoning like the following [from Google]: "We have over 100
metadata sources, and this is why we have so many errors: if you have
only one source of truth, you never have any doubt") and definitely not
to just be extra keywords thrown in the "vocabulary-controlled-less"
hopper.  Again, our data was created to collocate items that are
determined to go together - and to reveal relationships (through
browsing, cross-referencing, etc.).  Yes, it has all kinds of problems
(thank you Kelley McGrath) and needs to be updated to be more
computer-friendly - and mashed-up in different ways! - but the
underlying function and purpose of our data is lost when used with
Google books the way it is now. 

This is hard for us to deal with, because librarians love to be helpful
and giving - to a fault (its why many of us don't feel like we'd be good
at all in the private sector).  And who doesn't - on occasion at least -
love to give things away for free, without expecting anything in return?
Carrots and sticks be damned!, right? 

Anyway, from the article that Jim linked to: 

"Will the fate of the digital republic of letters be determined by the
laws of the marketplace or will there be provisions to protect the
public good?"

(from the library director of Harvard U:
http://www.publishersweekly.com/article/CA6696290.html)

Indeed.  And I don't think that to say this is not to say that what the
marketplace pursues is wrong:

"Justice is a denial of mercy, and mercy is a denial of justice.  Only a
higher force can reconcile these opposites: wisdom.  The problem cannot
be solved, but wisdom can transcend it.  Similarly, societies need
stability and change, tradition and innovation, public interest and
private interest, planning and laissez-faire, order and freedom, growth
and decay.  Everywhere society's health depends on the simultaneous
pursuit of mutually opposed activities or aims.  The adoption of a final
solution means a kind of death sentence for man's humanity and spells
either cruelty or dissolution, generally both... Divergent problems
offend the logical mind."

Schumacher, E. F. A Guide for the Perplexed. New York: Harper & Row,
1977, 127.

So, in sum, to say what I've said above ***doesn't mean that libraries
should not share their data as much as possible***.  It just means that,
in the pursuit of some intelligent balance, we should think more in
terms of negotiated contracts and agreements, not simply just giving
stuff away (even with guidance, which for-profit firms may determine it
is in their best interest to ignore: "I would also be far from shocked
if Google says "Um, that's really not worth to us how many resources it
would take to figure out."[--Jonathan Rochkind, in response to
Bernhard's saying collocation of volumes of a multipart was key])

Hence OCLC should, if Google does not think it is in their best
interests to negotiate, back out if they can until we can get things
right. 

Also, if you don't like the subject line, how about this?: 

"Library data is the worst form of data, except for all those other
forms that have been tried from time to time."   

Regards, 

Nathan Rinne

Media Cataloging Technician

Educational Service Center

11200 93rd Avenue North

Maple Grove MN. 55369

Email: rinnen_at_district279.org

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind
Sent: Monday, September 14, 2009 12:40 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Library functions and GBS

Bernhard Eversberg wrote:
>
> Collocation:
> Presently, it is difficult to get all volumes of a multipart together,
> be it monographic or serial/periodical. GBS scans individual volumes
> and records their title page titles without regard to series title in
> the metadata.
>   

Ha, have you tried doing that in worldcat, or with your own catalog 
records? Our standard cataloging practices do NOT make this easy.  It 
doesn't help that an individual record _could_ be for the multi-volume 
set OR could be for just one volume in the set, depending on what the 
catalog library held at the time they cataloged. (Bernhard has written 
about how German cataloging handles multi-volume sets a LOT less 
ambigously).

Now, granted, Google is practically the _expert_ at trying to pull 
meaningful data out of soup where it's not clearly expressed, that's the

business they are in. I have no doubt that if they decided to throw 
sufficient resources at the problem of collocating multi-volume sets, 
they could arise at a reasonable (but not perfect) approximation. 

I would also be far from shocked if Google says "Um, that's really not 
worth to us how many resources it would take to figure out."

The idea of libraries collectively "demanding" that Google figure out 
how to get meaning out of our data that we haven't managed to encode in 
an un-ambiguous machine-readable way in the first place (not only 
legacy, but we STILL don't do it right)...   while at the same time 
complaining that all Google ever does is take from us and we'd rather 
they didn't have our data at all or had to pay a lot of money for it... 
It's pretty ironic.

Jonathan