Re: Link resolvers as loosely coupled systems for holdings?

From: Ross Singer <ross.singer_at_nyob> Date: Mon, 10 Sep 2007 14:23:42 -0400 To: NGC4LIB_at_listserv.nd.edu

So, I can go into a little more detail about the Umlaut now.

Although it was initially designed to be a small piece of a (as yet
unrealized) much larger social citation management/catalog
application, it's purpose was to analyze an incoming citation,
determine the context of that citation in relation to the specific
user's access to information, enhance the metadata and present options
for 'acting' upon the citation.

So I realize that's a pretty buzzword laden paragraph, but I'll try to
clarify it a bit.

The first thing we wanted to exploit from the link resolver (we use
SFX), was the fact that it could, just by doing it's normal routine,
let us know:
  1. Where the person was coming from
  2. What they were looking for
  3. Where they went
which has all sorts of practical uses, especially if you can add
another data point:  who the person is.

What the link resolver was really bad at was letting the user know if
the citation was actually available to them in some medium other than
the subscribed electronic publications.of the link resolver's home
institution (read:  if a Georgia Tech user was searching in a Georgia
Tech subscribed database, our SFX setup good at notifying said user if
the /journal article/ they wanted to access was available to them
/electronically/.  Anything outside of that constraint had a much
higher margin of error was compensated for by long disclaimers in our
SFX menu about how it might appear that we don't hold things that we
actually do and if it appears we don't hold something, please contact
a reference librarian).

Where things would break down (from the user's perspective) was:
  1.  For print, we chose not to load our holdings into SFX, so the
only link to our print subscriptions was a link to our OPAC on
ISSN/ISBN.  Even if successful, the user had to wade through the
holdings statements to determine if the library held the item
requested (and more on this in a bit)
  2.  For books (rarer in OpenURL requests -- but possible via
Worldcat and other sources), unless the library held the exact ISBN
requested, it would fail.
  3.  Conference proceedings (which are a huge deal at an engineering
school like ours) were a disaster (more on this in a bit).
  4.  Most importantly, the link resolver did nothing about resources
that the user had access to but weren't explicitly part of Georgia
Tech's "collection".  This could be items in the user's public
library, but even more importantly, open access pre/post print
materials in repositories like arxiv.org and citeseer.

Trying to solve numbers 1 & 3 really showed how deficient our ILS
systems and data are for dealing with a regular, run-of-the-mill
OpenURL requests and, especially, trying to handle workarounds to more
complicated OpenURL resolving issues.

Our catalog's (Voyager) Z39.50 server, I would say, is probably pretty
representative of the sort of access one can hope for from their ILS.
It cannot do full field searching (so, for example, you cannot say, "I
want the title 'nature'", Voyager will return everything with the word
'nature' in it, which becomes a problem if you don't have a standard
identifier in your OpenURL).  It, unlike some other ILSes, /does/
allow you to retrieve holdings data, but it is not in a particularly
machine readable format.

What it is not terribly good at, though, is quickly responding to a
bunch of requests when you're trying to find out what a citation is.
In fact, while developing the Umlaut, I was consistently bringing
Voyager down under the weight (which, in reality was incredibly light)
of multiple requests to identify an item that might not be cataloged
the same way that the citation was indexed in a vendor database.

What we finally had to do was export our catalog records out of
Voyager into an Indexdata Zebra index to get the latency and stability
to acceptable levels.  What this also afforded us was the ability to
index whatever we fields we needed to, which allowed us to address #3,
the conference proceedings problem.  We could index the 440 & 490 $v
which then allowed us to resolve incoming citations from Compendex
(which is our most heavily used database).  Compendex sends OpenURLs
for conferences with the ISSN, but we generally catalog with the ISBN
(assuming there is one), since the ISBN is unique to the volume.  What
this allowed us to is seach for:
Proceedings  of SPIE--the international society for optical engineering
Volume:  1140
Year:  1989 (or 1988 or 1990 -- since it's not clear if this is the
date of the conference or the publication)

See:  http://findit.library.gatech.edu/go/1054047
vs.  http://tinyurl.com/ypvku3

Of course, the introduction of non-indigenous species presents another
set of problems and by using Zebra, we had now lost the capability of
getting holdings from Voyager.  So yet another custom piece had to be
built to retrieve holdings directly from Voyager's Oracle database.
These pieces should probably be merged (read:  drop Zebra and design a
solution using Voyager's Oracle backend exclusively), but that
requires more time and resources than the current solution.

#2 had an easy solution with OCLC's xISBN service.  For ISBN requests,
we ask xISBN for all related ISBNs before querying Zebra.  Thankfully,
ISBNs in OpenURL requests are pretty infrequent (for us, anyway), so
we're not at the point that we need to pay for the service.

That left #4.  The Umlaut uses the Google and Yahoo APIs to see if an
item is in an open access repository and links to it as if it was any
other fulltext target.

See:  http://findit.library.gatech.edu/go/1054124

The other thing the Umlaut does to determine access to items is query
OCLC's Resolver Registry with the user's IP address and will
incorporate the link resolver registered for the location they are
physically sitting.

The intention was for the user to be able to add however many
institutions apply to them:
  I am staff at Georgia Tech
  I am taking graduate classes online at Florida State
  I have a library card for Atlanta-Fulton County Public Library
  I have alumni privileges from the University of Michigan
(well, let's say fictional user does, /I/ don't)
Holdings from any of these locations would appear.  The basic
infrastructure for this is in place for the next release of Umlaut,
but it's unpolished as it hasn't been able to be a priority to work on
the Umlaut much anymore.

I realize this post is incredibly long and all over the place, but it
begins to highlight the sorts of things we need to be thinking about
(and working around) when we try to bring our services together.

-Ross.

On 9/10/07, Stephens, Owen <o.stephens_at_imperial.ac.uk> wrote:
> Thanks for the information about Umlaut - I had (I'm afraid) assumed
> this was simply an open source link resolver - I realise now it is much
> more along the lines that I was thinking when I wrote my initial mail in
> this thread. (btw is there a live instance I can look at in action?)
>
> I agree that a link resolver could be seen as a anything that can
> interpret an OpenURL and provide some service. However, the development
> of the OpenURL came from the concept of 'appropriate copy', which was
> driven by the idea that e-journals were available through many different
> routes, only some of which were relevant in a particular context.
>
> I think my point is that in the form of the current link resolvers I can
> see how a loosely coupled holdings system would work, but for some
> reason we have generally stopped at the e-journal information (as
> Jonathon points out, mainly outsourced). There is perhaps more than one
> issue here:
>
> An OpenURL link has essentially become a 'electronic holdings' link.
> There is absolutely no reason why it shouldn't become a 'holdings' link
> as far as I can see. Why have we not taken this additional step? (some
> sites have with journals I think, but not perhaps with books?).
>
> Many Universities already have a commercial link resolver - so perhaps
> we already have in our hands the power to implement this aspect of an
> NGC? Added to this Link resolvers have a tendency to be simpler, cheaper
> and perhaps easier to develop than an ILS.
>
> Based on Jonathan's mail and the information about Umlaut, perhaps the
> most important thing to say is that I don't think that holdings should
> be in a single database, but I would wholeheartedly agree that any
> system storing holdings information should be able to "have it's data
> made available through an API such that other actual user-facing
> interfaces can use this data" so that one or more systems can be easily
> queried given [the information contained in] an OpenURL and return
> holdings information in a standardised format. To take it a step
> further, I would like holdings data to be expressed in a format that can
> be used to calculate how it relates to any particular query (e.g. the
> user wants vol. 22 published in 1984) - the commercial link resolvers
> tend to make holdings data available in a way that allows their
> evaluation against a search criteria.
>
> The point about it not necessarily being a single database is that we
> have personal views of the information world depending on what we have
> access to. I may have access to a physical location because of my
> current geographical location, an electronic collection because of my
> institutional affiliation(s), and possibly even personal collections
> that I own or pay private subscriptions to. It would be nice if Umlaut
> or its equivalent could query a set of holdings information relevant to
> me rather than just that defined by the database owner. But this is
> looking several steps ahead.
>
> Owen
>
>
>
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind_at_jhu.edu]
> Sent: 07 September 2007 20:14
> To: Next generation catalogs for libraries
> Cc: Stephens, Owen
> Subject: Re: [NGC4LIB] Link resolvers as loosely coupled systems for
> holdings?
>
> The thing is, that a 'link resolver' is really just about the
> _interface_.  A 'link resolver' is anything that takes an OpenURL and
> returns... well, anything, really.
>
> So where does it get this information? Usually it gets this information
> from an internal database.
>
> The Umlaut link resolver front-end, which I am working on, instead gets
> it's data _both_ from SFX's database, AND from my ILS. If all the data
> was in the ILS, could it just look there?  Sure. But of course, most of
> our ILS's aren't capable of controlling this kind of data---and also,
> one of the things most of us look for in a 'link resolver' is actually
> the _outsourcing_ of the maintenance of _some_ of this data. That's what
>
> we get with SFX, or with SerSol's product.
>
> So here's how I'd translate what you're saying:
> All of our holdings info, physical, electronic, etc., should be in _one_
>
> database. This should be neither our 'link resolver', nor our 'OPAC',
> but ideally a free-standing module of it's own, for maintaining
> holdings. This database needs to have it's data made available through
> an API such that other actual user-facing interfaces can use this data.
> That other user facing interfaces needs to include a 'link resolver'
> (Ie, some software that responds to OpenURLs), it needs to include our
> 'OPAC' (some software that lets users search, and then tells them what
> we have and what the holdings are), and it probably needs to include
> other things too.
>
> The way these interfaces and functions are actually divided up among
> software packages is another story. Perhaps one piece of software will
> do all these things "OPAC", "link resolver", etc. More likely, there
> will be several. Right now, the division is usually between one software
>
> bundling a bunch of functions called an 'opac', and another called a
> 'link resolver'. That division of responsibilities for _interface_ might
>
> change.
>
> But regardless, yes, all our holdings info should be in one single
> database. That's not a 'link resolver' though, 'link resolver' is the
> interface, in fact.
>
> How you accomplish this--especially taking in libaries current desire to
>
> outsource the management of current 'link resolver' data--is not exactly
>
> clear.
>
> Jonathan
>
> Stephens, Owen wrote:
> > I'm a big fan of link servers (my experience is all with the SFX
> product
> > to date). Recent postings in the FRBRization threads has made me
> > consider how they work as loosely coupled system for libraries, and I
> > think point towards a (slightly more) FRBRized view of the world. In
> > fact I would guess that actually most (all?) link resolvers are built
> > with (to some extent) a FRBRized view of e-journals because it was the
> > logical way to build them.
> >
> > I feel that potentially link resolvers could be leveraged much more
> than
> > currently and some of the things I'd like to see from an NGC ponit of
> > view might be possible with tools that are already available to us. In
> > the best "oh well, it's Friday" tradition, the following (slightly
> long
> > and possibly rambling) post is an exploration of this idea - for those
> > who can be bothered I'd be interested to know:
> >
> > Do others share my view of the potential here?
> > Any critical reaction (constructive if you can!)?
> > Is anyone aware of work in this area?
> >
> > Just to think about journals to start with, as this already works to
> > some extent.
> >
> > If we have an OpenURL with each journal record in the catalogue, then
> we
> > are essentially putting a 'click here for electronic holdings' link
> next
> > to each title. At this point it ceases to be relevant whether the user
> > is looking at the print or e- record for the journal in the catalogue
> -
> > in terms of presenting the electronic holdings, the OpenURL link does
> > the same in both cases. This starts to suggest that having one or two
> > bib records to represent the journals electronic holdings is
> irrelevant.
> >
> > If we go one step further and have an OpenURL that picks up the users
> > Resolver address rather than just the local institutions address, then
> > we present the electronic holdings that the user in question has
> access
> > to - personalised holdings statements - brilliant.
> >
> > However, we can also see the limitations. In most cases the resolvers
> > only deal with electronic holdings. I can't see any real reason for
> this
> > except that this is the space they were designed to work in (What I
> > wouldn't give for some nice, machine-parsable, holdings statements for
> > our print journals). Some libraries have taken the step of putting
> their
> > print holdings into their resolvers, and some have worked out ways of
> > getting their resolvers to display print holding information from
> their
> > catalogues - either seems quite a big step forward to me.
> >
> > If we think about books, then link resolvers have much more limited
> use
> > to date. SFX certainly deals with some of the e-book packages, but not
> > all, and I've not seen any real implementations of this - probably
> > because the use of OpenURLs in A&I databases is so much more
> immediately
> > powerful when dealing with journal citations. I think this is bound to
> > change. It would be interesting to experiment with putting book
> > manifestation/edition/holding(item) information into a link resolver
> and
> > see how it worked - has anyone got any experience with this type of
> > thing?
> >
> > Finally, another limitation is that link resolvers tend not to talk to
> > each other. If I'm from Institution A and I'm searching the catalogue
> of
> > Institution B and find an item I want, then what might I want to know?
> > Whether A has it electronically (i.e. I can access it now), whether A
> > has it physically (i.e. I can go to my own library), whether B has it
> > physically (i.e. I can go and get it) and possibly if B has it
> > electronically (if I have access to Bs electronic collections, or if
> it
> > is available to me if I go to B and use it in the library). (there are
> > almost certainly other combinations/possibilities, but you can fill
> > these in). To answer these questions would require As link resolver
> and
> > Bs link resolver to communicate all their electronic and physical
> > holdings into a central place (probably actually As resolver I guess),
> > and present me with a unified list of access details. I think some
> > consortia (e.g. CDL) have done something like this when running
> multiple
> > link resolvers across consortium, but I've not seen any examples where
> > the resolvers can spontaneously communicate on demand.
> >
> > So - some questions.
> > Should we all start moving our print journal holdings into link
> > resolvers? If not, why not?
> > Should we be putting e-book or print book information into link
> > resolvers? Ditto?
> > Where should we start in terms of making it easy for link resolvers to
> > share information with each other?
> > Does anyone else think that the idea of an OPAC with holdings
> > information driven purely by link resolvers has potential? (I suppose
> > more generally - can we build on the idea of link resolvers to form a
> > loosely coupled holdings information system?)
> >
> > Best
> >
> > Owen
> >
> > Owen Stephens
> > Assistant Director: e-Strategy and Information Resources
> > Imperial College London Library
> > Imperial College London
> > South Kensington
> > London SW7 2AZ
> >
> >
> > Tel: 020 7594 8829
> > Email: o.stephens_at_imperial.ac.uk
> >
> >
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Dykas, Felicity A.
> > Sent: 06 September 2007 18:06
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Cutter's Rules in full text - a case for
> > FRBRization
> >
> > Aggregator neutral records are being used for serials and I think we
> > should implement them for monographs.  If this is done there will be
> one
> > record in WorldCat for all digitized copies of a particular book.
> > Separate records in WorldCat for the NetLibrary version, ebrary
> version,
> > Google-scanned version, etc., is a problem.  I cringe when I add
> another
> > record because the provider is different.
> >
> > We've cataloged a few books that were scanned by Google and are
> creating
> > one record for a title, even if more than one copy has been scanned.
> In
> > the URL field we are indicating who held the original book:
> >
> http://laurel.lso.missouri.edu/search/Y?searchtype=o&searcharg=166255505
> > &SORT=D&searchscope=8.  Cataloging rules for online materials continue
> > to be in flux (or at least not clear) and we may be taking some
> > liberties in what we're doing.
> >
> > I think separate records for print and online will facilitate
> searching
> > and identification (eventually).
> >
> > Felicity Dykas
> > MU Libraries
> > University of Missouri--Columbia
> >
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Frances Dean McNamara
> > Sent: Thursday, September 06, 2007 9:42 AM
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Cutter's Rules in full text - a case for
> > FRBRization
> >
> > At ALA OCLC was describing how they will start adding records to
> > Worldcat for Google and Google member library e-books from the Google
> > Book Search project.  However they plan to add new separate bibs for
> > every instance, using "institutional records" where there are separate
> > instances of the same book for Michigan, Harvard, NYPL, etc.  They
> will
> > automatically retain the OCLC# for the print copy.  In fact they are
> > creating these new records from that print copy.
> >
> > The proliferation of separate bibs in Worldcat for all these copies of
> > the same thing is probably going to be messy.  I don't think that is
> > being done to help people searching for the title, it's to help
> > librarians know what's been digitized and who has the file, I think.
> >
> > What we really want is an easy way to know that something is available
> > in print and electronic form and to easily be able to decide which
> form
> > is the right one for what we are doing at that moment, don't you
> think?
> > Isn't this like link resolver linking?  Wouldn't it be better to keep
> > that information somewhere and use a link resolver to go find out
> which
> > electronic versions are available to me?  Especially since we are
> > already finding that what is available to someone in one country may
> not
> > be available in another.
> >
> > I'm not understanding why people think separate bib records are useful
> > for this.  I can't help thinking that adding these things to
> > knowledgebases for link resolvers may provide a better end result for
> > users.
> >
> > Frances McNamara
> > University of Chicago
> >
> >
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>