Re: leading zeros on OCLC numbers and Google book search

From: Emily Lynema <emily_lynema_at_nyob> Date: Fri, 24 Jul 2009 09:39:37 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

At NCSU, we have just stripped the leading 0s from all of our OCLC 
numbers in our 035 fields. We do not maintain OCLC number in the 001. 
Based on our questions to OCLC, it sounds like the way the numbers are 
treated in these 2 fields is different. But the link Ed passed on below 
[1] (which OCLC also pointed us to) indicates that records pulled down 
from OCLC will have an unpadded OCLC number in the 035. So it seems 
pretty clear how the 035 should be handled.

This change was required to make our local ILS compatible with WorldCat 
Local, so yes, unpadded numbers in the 035 do indeed work with that system.

We had actually padded all of our OCLC numbers out to 9 digits in a 
recent batch reclamation. This turned out to be a huge source of 
problems for integration with OCLC services. I'm not sure if you will 
see the same integration problems if you have left your OCLC numbers 
padded to 8 digits in the 035.

And yes, this did require a complete re-index in our ILS to make this 
change.

Maurice may have sent this already, but here's a quote from Tom Miller 
at OCLC (although sometimes the things they say seem to conflict):

"In terms of best practice for ILS interoperability with services such 
as WorldCat Local, we always recommend that (if it's re-indexing) an
institution locally index the OCLC number in 035 $a with NO padding and
NO prefixing.  (The only exception to this recommendation is for Voyager
sites, which doesn't pertain to you.)  The second most common practice
that we accommodate is to index the OCLC number in the 001 field; again,
with no padding or prefixing."

-emily lynema
NCSU Libraries

[1] http://www.oclc.org/support/documentation/WorldCat/tb/253/

------------------------------ Date: Thu, 23 Jul 2009 08:52:41 -0700 
From: Ed Jones <ejones_at_NU.EDU> Subject: Re: leading zeros on OCLC 
numbers and Google book search There's a detailed history of the OCLC 
control number at the end of their Technical Bulletin 253 (September 
2006): http://www.oclc.org/support/documentation/WorldCat/tb/253/ Ed 
Jones -----Original Message----- From: Next generation catalogs for 
libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Karen Coyle 
Sent: Thursday, July 23, 2009 8:34 AM To: NGC4LIB_at_LISTSERV.ND.EDU 
Subject: Re: [NGC4LIB] leading zeros on OCLC numbers and Google book 
search The weird padding comes about from the history of the number. 
First, remember that OCLC numbers were first developed in a world where 
fixed lengths, especially for identifiers, were the norm. The number was 
always issued as a fixed length, but the length had to change. So the 
prefix changed, and the numbers, when exported, were padded with zeroes. 
This is what I can reconstruct: oclc999999 (10 chars, 6 digits) 
ocl79999999 (11 chars, 7 digits) ocm99999999 (11 chars, 8 digits) 
ocn999999999 (12 chars, 9 digits) kc Jonathan Rochkind wrote:
 > > Thanks Maurice, that's helpful. Bah, this isn't how I would have done
 > > it.  As OCLC numbers are simply incremented integers.... why mess
 > > around with all this prefixing and 0-padding in the first place?  If I

 > > were OCLC, I'd just say "a normalized number has all alphabetic
 > > prefixes and leading zeroes removed, it's just an integer."  And write

 > > my internal routines to do this to all OCLC numbers before trying to
 > > match them. That would be a LOT simpler and less error prone.
 > > Is there something I'm missing about some value in prefixing some
 > > OCLCnums with "ocm" and others not, some with padded leading zeroes,
 > > some not?  It seems pointlessly confusing to me.
 > >
 > > Oh, but wait, OCLC DOES go on to say that. After explaining padding
 > > and prefixing, they THEN go on to say they recommend an "institution
 > > locally index the OCLC number in 035 $a with NO padding and NO
 > > prefixing."  Um, okay, that's what I said. So why do they want it
 > > indexed one way, but insist it be stored in the record in a different
 > > crazy way?   This is just asking for confusion.
 > >
 > > Jonathan
 > >
 > > Maurice York wrote:
 >> >> For what it's worth, we had quite a go-around with padded OCLC
 >> >> numbers in
 >> >> our local database when trying to implement WorldCat Local. Even
OCLC's
 >> >> internal documentation was a bit vague on standard practice. After a
 >> >> good
 >> >> bit of back and forth, they clarified best practice for us and
 >> >> updated their
 >> >> documentation. Here's how they describe correctly normalized numbers:
 >> >>
 >> >> OCLC numbers less than eight digits are zero padded to eight digits
and
 >> >> prefixed with ocm
 >> >>
 >> >> OCLC numbers equal to eight digits are not zero padded, but are
prefixed
 >> >> with ocm
 >> >>
 >> >> OCLC numbers equal to nine digits are neither prefixed nor padded
 >> >> In terms of interoperability with WorldCat services (and I'm sure
 >> >> they gave
 >> >> the same guideline to Google when they were setting up GBS with WCL)
is:
 >> >>
 >> >> "In terms of best practice for ILS interoperability with services
 >> >> such as
 >> >> WorldCat Local, we always recommend that (if it's re-indexing) an
 >> >> institution locally index the OCLC number in 035 $a with NO padding
and
 >> >> NO prefixing.  (The only exception to this recommendation is for
Voyager
 >> >> sites.)  The second most common practice
 >> >> that we accommodate is to index the OCLC number in the 001 field;
again,
 >> >> with no padding or prefixing."
 >> >>
 >> >> So, I think the upshot is, if you're going to have interaction
 >> >> between your
 >> >> local ILS and WCL or GBS services, you're going to need to strip
those
 >> >> zeros. I'm guessing Google is going to go with OCLC's standard, and
 >> >> they're
 >> >> highly unlikely to change it.
 >> >>
 >> >> -M
 >> >>
 >> >>
 >> >> ************************************
 >> >> Maurice York
 >> >> Head, Information Technology
 >> >> NCSU Libraries
 >> >> North Carolina State University
 >> >> Raleigh, NC 27695
 >> >>
 >> >> maurice_york_at_ncsu.edu
 >> >> Phone: 919-515-3518
 >> >>
 >> >>
 >> >> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu>

 >> >> wrote:
 >> >>
 >> >>
 >>> >>> So we really do need feedback from Google on how they want us to
 >>> >>> normalize
 >>> >>> oclcnumbers before sending to them, and what, if any OCLCnum
 >>> >>> normalization
 >>> >>> they do on their end, and if they could start.
 >>> >>>
 >>> >>> Good luck getting that feedback though, like I said, when I've
tried,
 >>> >>> there's nobody left at Google who cares about the GBS API at all,
and
 >>> >>> certainly nobody who cares about OCLC numbers. Or at least 
nobody I
 >>> >>> could
 >>> >>> find. Whoever worked on the original implementation is now off to
 >>> >>> some other
 >>> >>> project.
 >>> >>>
 >>> >>> Jonathan
 >>> >>>
 >>> >>>
 >>> >>> Xiaoming Liu wrote:
 >>> >>>
 >>> >>>
 >>>> >>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind
<rochkind_at_jhu.edu>
 >>>> >>>> wrote:
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
 >>>>> >>>>> What we actually need is for OCLC to publish a spec on
 >>>>> >>>>> "normalizing" OCLC
 >>>>> >>>>> numbers.  Which I guess would actually be as simple as "remove
 >>>>> >>>>> leading
 >>>>> >>>>> zeroes."
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>> >>>>>
 >>>> >>>> I cannot speak for OCLC, but  xOCLCNUM service includes a
 >>>> >>>> "getVariants"
 >>>> >>>> service which normalizes OCLCNUM somehow, such as:
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVar
iants
 >>>> >>>>
 >>>> >>>>
 >>>> >>>> The API document has a link to how OCLCNUM variants are used:
 >>>> >>>>
 >>>> >>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
 >>>> >>>>
 >>>> >>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
 >>>> >>>>
 >>>> >>>> It may be clear from the service that when you use naked OCLCNUM,
you
 >>>> >>>> should
 >>>> >>>> remove the leaving zeros, but if you use it with a prefix 
"ocm", it

 >>>> >>>> was
 >>>> >>>> recommended to pad the number to 8 digits, such as "ocm07913025".
 >>>> >>>>
 >>>> >>>> The getVariants service was suggested by Tod Matola in OCLC.
 >>>> >>>>
 >>>> >>>> xiaoming
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
 >>>> >>>>
 >>>>> >>>>> So I actually think Google is doing an acceptable thing, 
and you
 >>>>> >>>>> should
 >>>>> >>>>> remove leading zeroes before making a query to it. Although it
 >>>>> >>>>> would be
 >>>>> >>>>> kind
 >>>>> >>>>> of Google to normalize on making a query too. But I 
wouldn't hold
 >>>>> >>>>> your
 >>>>> >>>>> breath; my impression on this stuff, after trying to talk to
 >>>>> >>>>> Google about
 >>>>> >>>>> it
 >>>>> >>>>> before, is that it's pretty much a Finished Thing that 
nobody at
 >>>>> >>>>> Google
 >>>>> >>>>> is
 >>>>> >>>>> currently working on and nobody at Google currently cares 
about.
 >>>>> >>>>>
 >>>>> >>>>> But it would be nice if OCLC published a statement saying 
"remove
 >>>>> >>>>> leading
 >>>>> >>>>> zeroes from OCLC numbers before comparing two OCLCnumbers 
to see
 >>>>> >>>>> if they
 >>>>> >>>>> match, or submitting an OCLC number to a foreign system for
 >>>>> >>>>> comparison."
 >>>>> >>>>> Jonathan
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>> >>>>> Jimmy Ghaphery wrote:
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>> >>>>>
 >>>>>> >>>>>> NGC4LIB,
 >>>>>> >>>>>>
 >>>>>> >>>>>> We have noticed an issue with using the Google API for older
 >>>>>> >>>>>> items where
 >>>>>> >>>>>> we have leading zeros in the OCLC number.
 >>>>>> >>>>>>
 >>>>>> >>>>>> For example with the leading zero, no result found:
 >>>>>> >>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
 >>>>>> >>>>>>
 >>>>>> >>>>>> Take out the zero:
 >>>>>> >>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
 >>>>>> >>>>>>
 >>>>>> >>>>>> What is the collective take on this? Does this seem like a
 >>>>>> >>>>>> reasonable
 >>>>>> >>>>>> accommodation that Google should make (ideally at someone's
 >>>>>> >>>>>> request with
 >>>>>> >>>>>> more juice than me, hint OCLC)? Or should I scurry about 
and make
 >>>>>> >>>>>> changes locally?
 >>>>>> >>>>>>
 >>>>>> >>>>>> -Jimmy
 >>>>>> >>>>>>
 >>>>>> >>>>>> --
 >>>>>> >>>>>> Jimmy Ghaphery
 >>>>>> >>>>>> Head, Library Information Systems
 >>>>>> >>>>>> VCU Libraries
 >>>>>> >>>>>> http://www.library.vcu.edu
 >>>>>> >>>>>> --
 >>>>>> >>>>>>
 >>>>>> >>>>>>
 >>>>>> >>>>>>
 >>>>>> >>>>>>
 >>>>>> >>>>>>
 >>>>>> >>>>>>
 >>>> >>>>
 >> >>
 >> >>
 > >
 > >