Re: Cataloging Web Resources - policies

From: Jonathan Rochkind <rochkind_at_nyob> Date: Mon, 4 Aug 2008 11:51:53 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

The problem I keep running into is there is actually no aggregate of 
only _open access_ (scholarly) materials on the net.

Contrary to popular belief, OAISter is not limited to open access 
materials, but includes metadata for a significant number of articles 
that are pay-to-view. 

The Directory of Open Access Repositories, you might think would be a 
more reliable way to find open access content to aggregate.  But a 
significant number of the (mostly university) repositories listed there 
also include a significant amount of non-open access content 
(temporarily embargoed, available only for licensed university patrons, 
etc.;  even though the DOAR collection policy says this isn't so, in 
fact it is).

This has been an issue as I try to find ways to expose more online 
content to my users. I do not want to direct my user to a link that, 
once she clicks on it, she's told she can't view it, or worse, asked to 
pay $40 to view it.  If she wants that kind of search, google or plenty 
of other existing free search engines can be used.  I see my role as 
providing a more "curated" collection, and part of that is not sending 
her to articles she can't actually view, causing wasted time and 
frustration.

Jonathan

Weinheimer Jim wrote:
> OAIster is great. What I link into with the "extend search" tool from my own catalog are Intute, Infomine, OAIster, BASE (the German one you may be thinking of at Bielefeld http://www.base-search.net/) and the NDLTD theses and dissertations http://www.ndltd.org. (The site is down, but the database is working, e.g. http://www.scirus.com/srsapp/search?btn=Search&ds=ndlrep&rep=ndl&q=rome. 
>
> There are many more of the open archives on all different topics and there will doubtlessly be more and more. The records will have to be harvested in bigger and bigger databases, or some kind of super-federated search will have to be made.
>
> Maybe it will end up with a lot of subject-based OCLC clearinghouse databases that everyone can link into, harvest records and so on.
>
> It's amazing all of this could be done right now!
>
> One of the concerns however, is that Google has stopped working with OAI-PMH and they are going toward their own version: XML sitemaps. I don't know how that will end up.
>
> Jim
>
>   
>> Â Hi,
>> Â 
>> Â James Weinheimer wrote:
>> Â 
>> Â "I really believe that web resources must be handled cooperatively if we
>> Â are to have the slightest chance of success".
>> Â 
>> Â I fully agree!
>> Â 
>> Â There is OAIster
>> Â (http://quod.lib.umich.edu/cgi/b/bib/bib-idx?c=oaister;page=simple) for
>> Â institutional repositories and in general, any 'OAI compatible catalogue' of
>> Â e-documents from any institution who likes to participate.
>> Â 
>> Â I can't find it back but there is a huge (which tries to be comprehensive)
>> Â "open access e-journals" database maintained somewhere in Germany. I
>> Â remember the owner was looking for any volunteers to take over the
>> Â responsibility of one list of e-journals (for instance someone responsible for
>> Â DOAJ, someone else for Biomed / Pubmed central, someone else for Hindawi, etc.).
>> Â 
>> Â What about a shared (at least OAI) catalogue of "more general web
>> Â resources"? And the same for "e-books"?
>> Â 
>> Â Best regards
>> Â Cï¿½cile Gass
>> Â Bibliothï¿½ques
>> Â Universitï¿½ Libre de Bruxelles CP180
>> Â 1050 Bruxelles
>> Â 32-2-650-47-39
>> Â 
>> Â -----Message d'origine-----
>> Â Deï¿½: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND..EDU]
>> Â De la part de James Weinheimer
>> Â Envoyï¿½: mercredi 30 juillet 2008 10:09
>> Â ï¿½: NGC4LIB_at_LISTSERV.ND.EDU
>> Â Objetï¿½: Re: [NGC4LIB] Cataloging Web Resources - policies
>> Â 
>> Â Kyle Banerjee wrote:
>> Â 
>> Â > The catalog is quite good at what it does, so the trick is to figure
>> Â > out how to adapt it so that it makes sense in an environment where
>> Â > information is distributed and patrons are in whatever environment
>> Â > they are in.
>> Â >
>> Â > We should avoid using it for things it is poorly suited for such as
>> Â > keeping track of Web resources. Otherwise, we detract from the value
>> Â > it contributes and relegate it more quickly to irrelevance.
>> Â 
>> Â If this is accepted, then we run into the problem that as more and more
>> Â materials wind up as web resources, fewer and fewer people will use the
>> Â library catalog and as a result, use the library less and less.
>> Â 
>> Â I think the problem is, and from the comments I have seen in this string,
>> Â that libraries want to believe that web resources are essentially the same
>> Â as physical items, and therefore, we can fit web resources into our
>> Â traditional processes. I think this is absolutely wrong. So long as each
>> Â separate library insists on doing its own selection, cataloging, and
>> Â maintenance--as it is done with books, etc.--I don't think there is the
>> Â slightest chance of success since this is essentially an impossible task.
>> Â Doing it all separately with physical materials has been bad enough, but at
>> Â least we have a certain amount of help: book dealers help a lot with
>> Â selection, there is copy cataloging available and such like.
>> Â 
>> Â For selection of web resources, there is precious little help out there. And
>> Â certainly nothing as organized as the book publishers/book dealers and so
>> Â on. Also, when it comes to cataloging web resources, while they aren't any
>> Â harder to catalog than anything else: certainly no more difficult than any
>> Â serial, movie, or book, the problems lie much more in examining the item
>> Â (try to find the latest date of update; it's hard to know where a site even
>> Â begins and ends), and the fact that any part of it can change at any time.
>> Â 
>> Â Of course, any serial or loose-leaf publication can change just as much as a
>> Â web resource, but in the case of the web resource, it changes without any
>> Â notification. At least with a serial or loose-leaf publication, you get the
>> Â "notification" of the change when the new issue or update arrives in
>> Â the
>> Â mail. As these materials work their way through the library work flow, the
>> Â necessary record maintenance can be performed in an orderly manner. So, if
>> Â there were notifications of changes (a possibility for rss feeds?) at least
>> Â one part of the problem could be solved.
>> Â 
>> Â But these are subsidiary concerns. The main issue is something a little
>> Â different. More and more people are questioning whether each library should
>> Â be redoing the cataloging of the same book over and over and over again.
>> Â I've written about this issue myself. The only real argument for editing the
>> Â record for a book that was cataloged elsewhere is that the item received
>> Â locally may be different in some subtle but important ways. (The records may
>> Â be subquality as well, but we'll bypass this for now) Therefore, I may have
>> Â a slightly different edition; perhaps I must bring out some subjects that my
>> Â users need, and so on. Although these arguments can be debated--and rejected
>> Â or accepted--we must admit that they really do not hold true for web
>> Â resources.
>> Â 
>> Â With a web resource, we are all looking at precisely the same thing. Perhaps
>> Â my "image" of it may be slightly different if I am looking at it with
>> Â Firefox and someome else with Opera and someone else with Explorer, but we
>> Â are still looking at the same things. With web resources, it really doesn't
>> Â make any sense, especially economic sense, to redo selection and cataloging
>> Â in each institution. If someone wants to "upgrade" a metadata record
>> Â with
>> Â better subjects, that's fine--just let everybody benefit.
>> Â 
>> Â Maintenance is the worst part of it all, and shows the fundamental
>> Â difference with physical items. A selector can have worked miracles to find
>> Â an item, selected it, the cataloger can make one of the best records in the
>> Â entire database, and .... the site changes tomorrow! It changes so much that
>> Â you can't even tell it's the same thing. With physical items this is not so
>> Â depressing since the original item is still around, so the selection and
>> Â cataloging are still valid. In the world of the web, this is not so.
>> Â 
>> Â Somewhere Dostoevsky (I think in "House of the Dead") was describing
>> Â his
>> Â incarceration in Siberia, and mentioned that they were worked almost to
>> Â death in various building projects. He mentioned that there was a bright
>> Â side: at least the laborers could take some pride in their work as they
>> Â watched the buildings grow. A much worse torture, in his opinion, would be
>> Â incredibly hard work that had no meaning or use at all, such as pouring
>> Â water back and forth into glasses for hours on end, or digging holes only to
>> Â fill them up immediately.
>> Â 
>> Â I think that doing record maintenance for web resources, where all your work
>> Â goes down the tubes, would perhaps qualify for such a torment. And pity the
>> Â poor selector who can't even know if he or she has already selected a
>> Â specific web resource! And finally, imagine this happening in thousands of
>> Â libraries every single day with people agonizing over exactly the same
>> Â materials.
>> Â 
>> Â For all of these reasons, and probably some more I could come up with!, I
>> Â really believe that web resources must be handled cooperatively if we are to
>> Â have the slightest chance of success. I think it might work, since we are
>> Â all literally looking at the same things, not at separate physical copies of
>> Â this "manifestation" of a book that may differ in a few crucial ways,
>> Â but
>> Â exactly the same files. In a correctly configured system (I am thinking of
>> Â something along the Intute or Infomine type), cataloging needs to be done
>> Â one--and only one--time. Maintenance can be done on this one record, and
>> Â everyone could benefit. With some tweaking of the metadata record, an
>> Â "audience"-type code could be made, handled by the selectors, to
>> Â create
>> Â filters for the general public, undergraduates, graduates, or researchers,
>> Â and in this way, selectors could cooperate among themselves. Perhaps the
>> Â users could also get involved with Web2.0 possibilities. I am sure the web
>> Â resource creators could be involved as well for updates, and other
>> Â possibilities.
>> Â 
>> Â The relationship of the local OPAC to this database could be done through an
>> Â extend search mechanism, such as in my own catalog, or automatic harvesting,
>> Â or it could be in other, innovative ways that creative sorts could invent.
>> Â 
>> Â I guess that's what I think the "next generation catalog" will be:
>> Â something
>> Â that allows true and deep cooperation with all kinds of groups we have never
>> Â cooperated with before. Of course, that means that there would be a lot of
>> Â trust, but that is another topic and I've gone on long enough.
>> Â 
>> Â James WeinheimerÂ Â j.weinheimer_at_aur.edu
>> Â Director of Library and Information Services
>> Â The American University of Rome
>> Â via Pietro Roselli, 4
>> Â 00153 Rome, Italy
>> Â voice- 011 39 06 58330919 ext. 327
>> Â fax-011 39 06 58330992
>>     
>
>   

-- 
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 
rochkind (at) jhu.edu