Re: The Return of Cards? [mailing list]

From: James Weinheimer <weinheimer.jim.l_at_nyob> Date: Sun, 13 Oct 2013 22:57:23 +0200 To: NGC4LIB_at_LISTSERV.ND.EDU

On 10/13/2013 7:27 AM, Alexander Johannesen wrote:
<snip>
I'd just like to add a few bits about why Linked Data is or was 
important. It's not really about sharing the data anymore, it has become 
almost a secondary nice to have feature of meta data; surely you give 
out the meta data in order to make things findable? No, the real 
importance of why the library world should have been quicker and smarter 
about it is about namespace real-estate, and the power of identifiers, 
and it's this subtler connection in which things are truly found.
...
So, for example, we want to talk about Mark Twain. I could link my data 
to a URI (which is just a string of letters to make up an identifier; 
that it's a URI that you can plonk in a browser or do a HTTP GET on to 
resolve it is an added bonus) so that we can make sure that when I talk 
about Mark Twain, I mean the Mark Twain that is linked to this one
http://id.loc.gov/authorities/names/n79021164

And wouldn't it be great if that was the case?
</snip>

[Sorry for the long message, but it is usual with me and I can't find a 
way to make it simpler]

If this is not just rhetorical question but one that is seriously asked, 
then I have an answer that, so far as my own experience is concerned is 
definitive (although others may have other experiences): Wouldn't it be 
great if that was the case? The answer is decidedly no.

When id.loc.gov first came out I *really, really, really* wanted to 
include it into my catalog in some way. I don't believe the API had come 
out yet, but there are other ways if you are creative enough although 
they may not be perfect. I showed it to several of my users (students 
and faculty) and while they found it kind of neat, especially the 
"Visualization" tool, it did not provide them with any information they 
thought would be useful for their purposes. I think this offers a clear 
example of looking at a tool like this as a developer, as a cataloger, 
and as a user.

The underlying purpose of the kind of record we see in id.loc.gov is 
*not* so much to provide *data* to manipulate in all kinds of new and 
wonderful ways, but to help people discover information that is *within 
a particular collection*. So, with the record for Mark Twain, what is 
there? We find various forms of his name, which is not important in and 
of itself, but it is there so that when someone searches for e.g. 
"Tuwen, Make", people see a reference that says: "See:  Twain, Mark, 
1835-1910." (http://1.usa.gov/162o37r)

In this case with Mark Twain, you also discover that he has different 
"bibliographic identities" (in cataloger-speak), which translates into 
normal speak as: if you want to find everything by Mark Twain, you also 
have to look under the names:
Clemens, Samuel Langhorne, 1835-1910
Conte, Louis de, 1835-1910
Snodgrass, Quintus Curtius, 1835-1910

The rest of the information in the record is for catalogers, documenting 
where the information for each form of name came from and maybe some 
more. So, for the user, this information is good only for *resource 
discovery* within the realm of the *specific catalogs* that use these 
forms. Other catalogs have different rules and different forms. For 
example, pre-AACR2 rules (but lots of other rules too) treat the concept 
of "bibliographic identities" differently and the heading to search for 
everything by Mark Twain was only "Clemens, Samuel Langhorne, 
1835-1910". We can see how this was handled in the transition at 
Princeton University with the first card under "Clemens, Samuel" 
bit.ly/1ajsS8s <http://bit.ly/1ajsS8s>but if you browse to the next 
cards, you will see that his books are under "Clemens" as was correct 
before AACR2.

So, the only real information from id.loc.gov that is of use to the 
public is that they have to look under three other forms of name to find 
everything by Twain. To revive this type of information would only 
result in creating a tool that begins to work the way the catalog was 
designed to work (i.e. back in the 19th century). That is important, by 
the way.

If we look for an author who did not use pseudonyms, all we see are 
different forms of the name, e.g. "Goethe, Johann Wolfgang von, 
1749-1832" http://id.loc.gov/authorities/names/n79003362.html It is of 
minimal use for the user to know that Goethe has also been published 
under "Ko-tê, 1749-1832" although if they search for "Ko-tê" they will 
find the reference to Goethe.

When we use the VIAF http://viaf.org/viaf/50566653/ we get something 
that may be more useful more useful to the public, which is the correct 
form of name to search in different catalogs. So, we discover we need to 
search "????, ???? 1835-1910" in Russian catalogs, and in Arabic 
catalogs, ????? ????? 1835-1910
A tool could be made to search Mark Twain's Russian form of name 
automatically in the correct catalogs, e.g. http://bit.ly/1boDipB in the 
Russian catalogs. That may--or may not--be useful to someone to know 
that materials cataloged in Russia use this form and can be searched 
correctly.

In Worldcat Identities http://www.worldcat.org/identities/lccn-n79-21164 
we find different information derived from the catalog. We see genres, 
roles, his most widely held works and a word cloud of his subjects. 
Worldcat Identities, and especially the word cloud at the bottom *may* 
be of the most use to the public of all of these tools, but it needs to 
be tested. Again, when I have showed these tools to people, although 
they found them interesting, they could not tell me how those tools 
could help them in any substantive way in anything they could imagine.

Compare these tools to dbpedia http://dbpedia.org/page/Mark_Twain that 
gives lots of concrete information and tons of links about Mark Twain.

Today, all this can be linked together with linked data (which can 
definitely be done) but following John Marr's questions, it seems to me 
to do so would be to create the very definition of "information overload".

I want it clear that I am *not* saying that some kind of tool should not 
be built, because it definitely should be built, but we must look at it 
through the eyes of the person consuming it. Otherwise, we may be 
creating something for *us* and not for the people who need to use it. 
Linked data may end up creating a different kind of chaos. This is why I 
say that linked data *may* create something useful for the public, but 
it just as well confuse them more than ever.
<http://dbpedia.org/page/Mark_Twain>
-- 
James Weinheimer weinheimer.jim.l_at_gmail.com First Thus 
http://catalogingmatters.blogspot.com/ First Thus Facebook Page 
https://www.facebook.com/FirstThus Cooperative Cataloging Rules 
http://sites.google.com/site/opencatalogingrules/ Cataloging Matters 
Podcasts http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html