Re: Link resolvers as loosely coupled systems for holdings?

From: Stephens, Owen <o.stephens_at_nyob> Date: Tue, 11 Sep 2007 09:59:07 +0100 To: NGC4LIB_at_listserv.nd.edu

Thanks all for the responses.

One of the questions I asked was 'why have we stopped at electronic
journals?' [when using link resolvers].

Karen specifically tackled the question of 'why not books?', and
suggested that this was to do with scalability. I don't know enough to
say if this is an issue or not (although my instinct is that it would
be). At least some Link Resolvers do deal with e-books, although I don't
know how successfully (anyone with any experience of this?). I'd also
question at what point scalability becomes an issue - the library I have
recently left had something in the region of 300,000 unique monograph
titles - would this present a problem to the link resolvers on the
market?

Also, if we see e-books going in the same direction as e-journals
(available via multiple suppliers on the web, only some of which an
institution will have access to), then the 'appropriate copy' problem
arises again - and this is what link resolvers were designed to solve
for journals - will we need a similar solution for books?

In the end, I suppose I think that trying to use a link resolver for
monograph holdings would be an interesting experiment that could help us
understand the practical problems of achieving a loosely coupled
architecture - any volunteers :)

I also think Karen's idea for Google books (and OCA etc.) is a good one.
But surely the question is why isn't Google providing the list of
available books indexed by identifier? Surely it is in the interest of
Google for us to drive traffic to their site? If they provided an ISBN
and/or OCLC lookup then it would be trivial to add 'view this book at
Google' to an SFX/SerSoln/Umlaut etc menu.

Jonathan raised a few issues. Firstly, he agreed with me about the way
we express holdings data, and the fact that the physical holdings data
in the ILS cannot be evaluated by software. Ross also mentioned this
problem and how it lead to users having to wade through lists of print
holdings. Ross also said they had decided not to put their print
holdings in SFX. So - my question would be - why are we still doing
this? Why haven't we just dumped our print holdings into our link
resolvers? Or re-expressed them in the ILS to allow computers to read
them sensibly? I'm as guilty as anyone else here, but my worry is that
the reason I haven't pushed for this is that I'm somehow scared to break
the tradition of recording holdings in MARC format in the ILS. Is this
it - or are there real practical reasons we haven't done this? More than
the book issue (which is clearly more complicated) the question of why
we haven't taken this step with print journals baffles and worries me.

Finally, Ross kindly posted some more information about Umlaut. I can
really relate to where he is coming from, and think that the work on
Umlaut so far sounds very interesting and exciting. It seems a shame
that the commercial link resolvers aren't doing more work in this
direction (although Jonathan's point about them being simple because
they don't try to do too much is well taken), and also that it sounds
like Ross isn't able to devote as much development to Umlaut at the
moment.

I have to say that I especially liked the way Umlaut starts to open up
access to respositories - this has been something that has troubled me
for a few years - if we think that 'appropriate copy' is a problem for
journals, when we go down to article level publishing, then it's going
to get really messy...

Owen

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Ross Singer
Sent: 10 September 2007 19:24
To: NGC4LIB_at_listserv.nd.edu
Subject: Re: [NGC4LIB] Link resolvers as loosely coupled systems for
holdings?

So, I can go into a little more detail about the Umlaut now.

Although it was initially designed to be a small piece of a (as yet
unrealized) much larger social citation management/catalog
application, it's purpose was to analyze an incoming citation,
determine the context of that citation in relation to the specific
user's access to information, enhance the metadata and present options
for 'acting' upon the citation.

So I realize that's a pretty buzzword laden paragraph, but I'll try to
clarify it a bit.

The first thing we wanted to exploit from the link resolver (we use
SFX), was the fact that it could, just by doing it's normal routine,
let us know:
  1. Where the person was coming from
  2. What they were looking for
  3. Where they went
which has all sorts of practical uses, especially if you can add
another data point:  who the person is.

What the link resolver was really bad at was letting the user know if
the citation was actually available to them in some medium other than
the subscribed electronic publications.of the link resolver's home
institution (read:  if a Georgia Tech user was searching in a Georgia
Tech subscribed database, our SFX setup good at notifying said user if
the /journal article/ they wanted to access was available to them
/electronically/.  Anything outside of that constraint had a much
higher margin of error was compensated for by long disclaimers in our
SFX menu about how it might appear that we don't hold things that we
actually do and if it appears we don't hold something, please contact
a reference librarian).

Where things would break down (from the user's perspective) was:
  1.  For print, we chose not to load our holdings into SFX, so the
only link to our print subscriptions was a link to our OPAC on
ISSN/ISBN.  Even if successful, the user had to wade through the
holdings statements to determine if the library held the item
requested (and more on this in a bit)
  2.  For books (rarer in OpenURL requests -- but possible via
Worldcat and other sources), unless the library held the exact ISBN
requested, it would fail.
  3.  Conference proceedings (which are a huge deal at an engineering
school like ours) were a disaster (more on this in a bit).
  4.  Most importantly, the link resolver did nothing about resources
that the user had access to but weren't explicitly part of Georgia
Tech's "collection".  This could be items in the user's public
library, but even more importantly, open access pre/post print
materials in repositories like arxiv.org and citeseer.

Trying to solve numbers 1 & 3 really showed how deficient our ILS
systems and data are for dealing with a regular, run-of-the-mill
OpenURL requests and, especially, trying to handle workarounds to more
complicated OpenURL resolving issues.

Our catalog's (Voyager) Z39.50 server, I would say, is probably pretty
representative of the sort of access one can hope for from their ILS.
It cannot do full field searching (so, for example, you cannot say, "I
want the title 'nature'", Voyager will return everything with the word
'nature' in it, which becomes a problem if you don't have a standard
identifier in your OpenURL).  It, unlike some other ILSes, /does/
allow you to retrieve holdings data, but it is not in a particularly
machine readable format.

What it is not terribly good at, though, is quickly responding to a
bunch of requests when you're trying to find out what a citation is.
In fact, while developing the Umlaut, I was consistently bringing
Voyager down under the weight (which, in reality was incredibly light)
of multiple requests to identify an item that might not be cataloged
the same way that the citation was indexed in a vendor database.

What we finally had to do was export our catalog records out of
Voyager into an Indexdata Zebra index to get the latency and stability
to acceptable levels.  What this also afforded us was the ability to
index whatever we fields we needed to, which allowed us to address #3,
the conference proceedings problem.  We could index the 440 & 490 $v
which then allowed us to resolve incoming citations from Compendex
(which is our most heavily used database).  Compendex sends OpenURLs
for conferences with the ISSN, but we generally catalog with the ISBN
(assuming there is one), since the ISBN is unique to the volume.  What
this allowed us to is seach for:
Proceedings  of SPIE--the international society for optical engineering
Volume:  1140
Year:  1989 (or 1988 or 1990 -- since it's not clear if this is the
date of the conference or the publication)

See:  http://findit.library.gatech.edu/go/1054047
vs.  http://tinyurl.com/ypvku3

Of course, the introduction of non-indigenous species presents another
set of problems and by using Zebra, we had now lost the capability of
getting holdings from Voyager.  So yet another custom piece had to be
built to retrieve holdings directly from Voyager's Oracle database.
These pieces should probably be merged (read:  drop Zebra and design a
solution using Voyager's Oracle backend exclusively), but that
requires more time and resources than the current solution.

#2 had an easy solution with OCLC's xISBN service.  For ISBN requests,
we ask xISBN for all related ISBNs before querying Zebra.  Thankfully,
ISBNs in OpenURL requests are pretty infrequent (for us, anyway), so
we're not at the point that we need to pay for the service.

That left #4.  The Umlaut uses the Google and Yahoo APIs to see if an
item is in an open access repository and links to it as if it was any
other fulltext target.

See:  http://findit.library.gatech.edu/go/1054124

The other thing the Umlaut does to determine access to items is query
OCLC's Resolver Registry with the user's IP address and will
incorporate the link resolver registered for the location they are
physically sitting.

The intention was for the user to be able to add however many
institutions apply to them:
  I am staff at Georgia Tech
  I am taking graduate classes online at Florida State
  I have a library card for Atlanta-Fulton County Public Library
  I have alumni privileges from the University of Michigan
(well, let's say fictional user does, /I/ don't)
Holdings from any of these locations would appear.  The basic
infrastructure for this is in place for the next release of Umlaut,
but it's unpolished as it hasn't been able to be a priority to work on
the Umlaut much anymore.

I realize this post is incredibly long and all over the place, but it
begins to highlight the sorts of things we need to be thinking about
(and working around) when we try to bring our services together.

-Ross.

On 9/10/07, Stephens, Owen <o.stephens_at_imperial.ac.uk> wrote:
> Thanks for the information about Umlaut - I had (I'm afraid) assumed
> this was simply an open source link resolver - I realise now it is
much
> more along the lines that I was thinking when I wrote my initial mail
in
> this thread. (btw is there a live instance I can look at in action?)
>
> I agree that a link resolver could be seen as a anything that can
> interpret an OpenURL and provide some service. However, the
development
> of the OpenURL came from the concept of 'appropriate copy', which was
> driven by the idea that e-journals were available through many
different
> routes, only some of which were relevant in a particular context.
>
> I think my point is that in the form of the current link resolvers I
can
> see how a loosely coupled holdings system would work, but for some
> reason we have generally stopped at the e-journal information (as
> Jonathon points out, mainly outsourced). There is perhaps more than
one
> issue here:
>
> An OpenURL link has essentially become a 'electronic holdings' link.
> There is absolutely no reason why it shouldn't become a 'holdings'
link
> as far as I can see. Why have we not taken this additional step? (some
> sites have with journals I think, but not perhaps with books?).
>
> Many Universities already have a commercial link resolver - so perhaps
> we already have in our hands the power to implement this aspect of an
> NGC? Added to this Link resolvers have a tendency to be simpler,
cheaper
> and perhaps easier to develop than an ILS.
>
> Based on Jonathan's mail and the information about Umlaut, perhaps the
> most important thing to say is that I don't think that holdings should
> be in a single database, but I would wholeheartedly agree that any
> system storing holdings information should be able to "have it's data
> made available through an API such that other actual user-facing
> interfaces can use this data" so that one or more systems can be
easily
> queried given [the information contained in] an OpenURL and return
> holdings information in a standardised format. To take it a step
> further, I would like holdings data to be expressed in a format that
can
> be used to calculate how it relates to any particular query (e.g. the
> user wants vol. 22 published in 1984) - the commercial link resolvers
> tend to make holdings data available in a way that allows their
> evaluation against a search criteria.
>
> The point about it not necessarily being a single database is that we
> have personal views of the information world depending on what we have
> access to. I may have access to a physical location because of my
> current geographical location, an electronic collection because of my
> institutional affiliation(s), and possibly even personal collections
> that I own or pay private subscriptions to. It would be nice if Umlaut
> or its equivalent could query a set of holdings information relevant
to
> me rather than just that defined by the database owner. But this is
> looking several steps ahead.
>
> Owen
>
>
>
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind_at_jhu.edu]
> Sent: 07 September 2007 20:14
> To: Next generation catalogs for libraries
> Cc: Stephens, Owen
> Subject: Re: [NGC4LIB] Link resolvers as loosely coupled systems for
> holdings?
>
> The thing is, that a 'link resolver' is really just about the
> _interface_.  A 'link resolver' is anything that takes an OpenURL and
> returns... well, anything, really.
>
> So where does it get this information? Usually it gets this
information
> from an internal database.
>
> The Umlaut link resolver front-end, which I am working on, instead
gets
> it's data _both_ from SFX's database, AND from my ILS. If all the data
> was in the ILS, could it just look there?  Sure. But of course, most
of
> our ILS's aren't capable of controlling this kind of data---and also,
> one of the things most of us look for in a 'link resolver' is actually
> the _outsourcing_ of the maintenance of _some_ of this data. That's
what
>
> we get with SFX, or with SerSol's product.
>
> So here's how I'd translate what you're saying:
> All of our holdings info, physical, electronic, etc., should be in
_one_
>
> database. This should be neither our 'link resolver', nor our 'OPAC',
> but ideally a free-standing module of it's own, for maintaining
> holdings. This database needs to have it's data made available through
> an API such that other actual user-facing interfaces can use this
data.
> That other user facing interfaces needs to include a 'link resolver'
> (Ie, some software that responds to OpenURLs), it needs to include our
> 'OPAC' (some software that lets users search, and then tells them what
> we have and what the holdings are), and it probably needs to include
> other things too.
>
> The way these interfaces and functions are actually divided up among
> software packages is another story. Perhaps one piece of software will
> do all these things "OPAC", "link resolver", etc. More likely, there
> will be several. Right now, the division is usually between one
software
>
> bundling a bunch of functions called an 'opac', and another called a
> 'link resolver'. That division of responsibilities for _interface_
might
>
> change.
>
> But regardless, yes, all our holdings info should be in one single
> database. That's not a 'link resolver' though, 'link resolver' is the
> interface, in fact.
>
> How you accomplish this--especially taking in libaries current desire
to
>
> outsource the management of current 'link resolver' data--is not
exactly
>
> clear.
>
> Jonathan
>
> Stephens, Owen wrote:
> > I'm a big fan of link servers (my experience is all with the SFX
> product
> > to date). Recent postings in the FRBRization threads has made me
> > consider how they work as loosely coupled system for libraries, and
I
> > think point towards a (slightly more) FRBRized view of the world. In
> > fact I would guess that actually most (all?) link resolvers are
built
> > with (to some extent) a FRBRized view of e-journals because it was
the
> > logical way to build them.
> >
> > I feel that potentially link resolvers could be leveraged much more
> than
> > currently and some of the things I'd like to see from an NGC ponit
of
> > view might be possible with tools that are already available to us.
In
> > the best "oh well, it's Friday" tradition, the following (slightly
> long
> > and possibly rambling) post is an exploration of this idea - for
those
> > who can be bothered I'd be interested to know:
> >
> > Do others share my view of the potential here?
> > Any critical reaction (constructive if you can!)?
> > Is anyone aware of work in this area?
> >
> > Just to think about journals to start with, as this already works to
> > some extent.
> >
> > If we have an OpenURL with each journal record in the catalogue,
then
> we
> > are essentially putting a 'click here for electronic holdings' link
> next
> > to each title. At this point it ceases to be relevant whether the
user
> > is looking at the print or e- record for the journal in the
catalogue
> -
> > in terms of presenting the electronic holdings, the OpenURL link
does
> > the same in both cases. This starts to suggest that having one or
two
> > bib records to represent the journals electronic holdings is
> irrelevant.
> >
> > If we go one step further and have an OpenURL that picks up the
users
> > Resolver address rather than just the local institutions address,
then
> > we present the electronic holdings that the user in question has
> access
> > to - personalised holdings statements - brilliant.
> >
> > However, we can also see the limitations. In most cases the
resolvers
> > only deal with electronic holdings. I can't see any real reason for
> this
> > except that this is the space they were designed to work in (What I
> > wouldn't give for some nice, machine-parsable, holdings statements
for
> > our print journals). Some libraries have taken the step of putting
> their
> > print holdings into their resolvers, and some have worked out ways
of
> > getting their resolvers to display print holding information from
> their
> > catalogues - either seems quite a big step forward to me.
> >
> > If we think about books, then link resolvers have much more limited
> use
> > to date. SFX certainly deals with some of the e-book packages, but
not
> > all, and I've not seen any real implementations of this - probably
> > because the use of OpenURLs in A&I databases is so much more
> immediately
> > powerful when dealing with journal citations. I think this is bound
to
> > change. It would be interesting to experiment with putting book
> > manifestation/edition/holding(item) information into a link resolver
> and
> > see how it worked - has anyone got any experience with this type of
> > thing?
> >
> > Finally, another limitation is that link resolvers tend not to talk
to
> > each other. If I'm from Institution A and I'm searching the
catalogue
> of
> > Institution B and find an item I want, then what might I want to
know?
> > Whether A has it electronically (i.e. I can access it now), whether
A
> > has it physically (i.e. I can go to my own library), whether B has
it
> > physically (i.e. I can go and get it) and possibly if B has it
> > electronically (if I have access to Bs electronic collections, or if
> it
> > is available to me if I go to B and use it in the library). (there
are
> > almost certainly other combinations/possibilities, but you can fill
> > these in). To answer these questions would require As link resolver
> and
> > Bs link resolver to communicate all their electronic and physical
> > holdings into a central place (probably actually As resolver I
guess),
> > and present me with a unified list of access details. I think some
> > consortia (e.g. CDL) have done something like this when running
> multiple
> > link resolvers across consortium, but I've not seen any examples
where
> > the resolvers can spontaneously communicate on demand.
> >
> > So - some questions.
> > Should we all start moving our print journal holdings into link
> > resolvers? If not, why not?
> > Should we be putting e-book or print book information into link
> > resolvers? Ditto?
> > Where should we start in terms of making it easy for link resolvers
to
> > share information with each other?
> > Does anyone else think that the idea of an OPAC with holdings
> > information driven purely by link resolvers has potential? (I
suppose
> > more generally - can we build on the idea of link resolvers to form
a
> > loosely coupled holdings information system?)
> >
> > Best
> >
> > Owen
> >
> > Owen Stephens
> > Assistant Director: e-Strategy and Information Resources
> > Imperial College London Library
> > Imperial College London
> > South Kensington
> > London SW7 2AZ
> >
> >
> > Tel: 020 7594 8829
> > Email: o.stephens_at_imperial.ac.uk
> >
> >
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Dykas, Felicity A.
> > Sent: 06 September 2007 18:06
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Cutter's Rules in full text - a case for
> > FRBRization
> >
> > Aggregator neutral records are being used for serials and I think we
> > should implement them for monographs.  If this is done there will be
> one
> > record in WorldCat for all digitized copies of a particular book.
> > Separate records in WorldCat for the NetLibrary version, ebrary
> version,
> > Google-scanned version, etc., is a problem.  I cringe when I add
> another
> > record because the provider is different.
> >
> > We've cataloged a few books that were scanned by Google and are
> creating
> > one record for a title, even if more than one copy has been scanned.
> In
> > the URL field we are indicating who held the original book:
> >
>
http://laurel.lso.missouri.edu/search/Y?searchtype=o&searcharg=166255505
> > &SORT=D&searchscope=8.  Cataloging rules for online materials
continue
> > to be in flux (or at least not clear) and we may be taking some
> > liberties in what we're doing.
> >
> > I think separate records for print and online will facilitate
> searching
> > and identification (eventually).
> >
> > Felicity Dykas
> > MU Libraries
> > University of Missouri--Columbia
> >
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Frances Dean McNamara
> > Sent: Thursday, September 06, 2007 9:42 AM
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Cutter's Rules in full text - a case for
> > FRBRization
> >
> > At ALA OCLC was describing how they will start adding records to
> > Worldcat for Google and Google member library e-books from the
Google
> > Book Search project.  However they plan to add new separate bibs for
> > every instance, using "institutional records" where there are
separate
> > instances of the same book for Michigan, Harvard, NYPL, etc.  They
> will
> > automatically retain the OCLC# for the print copy.  In fact they are
> > creating these new records from that print copy.
> >
> > The proliferation of separate bibs in Worldcat for all these copies
of
> > the same thing is probably going to be messy.  I don't think that is
> > being done to help people searching for the title, it's to help
> > librarians know what's been digitized and who has the file, I think.
> >
> > What we really want is an easy way to know that something is
available
> > in print and electronic form and to easily be able to decide which
> form
> > is the right one for what we are doing at that moment, don't you
> think?
> > Isn't this like link resolver linking?  Wouldn't it be better to
keep
> > that information somewhere and use a link resolver to go find out
> which
> > electronic versions are available to me?  Especially since we are
> > already finding that what is available to someone in one country may
> not
> > be available in another.
> >
> > I'm not understanding why people think separate bib records are
useful
> > for this.  I can't help thinking that adding these things to
> > knowledgebases for link resolvers may provide a better end result
for
> > users.
> >
> > Frances McNamara
> > University of Chicago
> >
> >
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>