Re: leading zeros on OCLC numbers and Google book search

From: Jonathan Rochkind <rochkind_at_nyob> Date: Thu, 23 Jul 2009 10:19:40 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Thanks Maurice, that's helpful. Bah, this isn't how I would have done 
it.  As OCLC numbers are simply incremented integers.... why mess around 
with all this prefixing and 0-padding in the first place?  If I were 
OCLC, I'd just say "a normalized number has all alphabetic prefixes and 
leading zeroes removed, it's just an integer."  And write my internal 
routines to do this to all OCLC numbers before trying to match them. 
That would be a LOT simpler and less error prone. 

Is there something I'm missing about some value in prefixing some 
OCLCnums with "ocm" and others not, some with padded leading zeroes, 
some not?  It seems pointlessly confusing to me.

Oh, but wait, OCLC DOES go on to say that. After explaining padding and 
prefixing, they THEN go on to say they recommend an "institution locally 
index the OCLC number in 035 $a with NO padding and NO prefixing."  Um, 
okay, that's what I said. So why do they want it indexed one way, but 
insist it be stored in the record in a different crazy way?   This is 
just asking for confusion.

Jonathan

Maurice York wrote:
> For what it's worth, we had quite a go-around with padded OCLC numbers in
> our local database when trying to implement WorldCat Local. Even OCLC's
> internal documentation was a bit vague on standard practice. After a good
> bit of back and forth, they clarified best practice for us and updated their
> documentation. Here's how they describe correctly normalized numbers:
>
> OCLC numbers less than eight digits are zero padded to eight digits and
> prefixed with ocm
>
> OCLC numbers equal to eight digits are not zero padded, but are prefixed
> with ocm
>
> OCLC numbers equal to nine digits are neither prefixed nor padded
> In terms of interoperability with WorldCat services (and I'm sure they gave
> the same guideline to Google when they were setting up GBS with WCL) is:
>
> "In terms of best practice for ILS interoperability with services such as
> WorldCat Local, we always recommend that (if it's re-indexing) an
> institution locally index the OCLC number in 035 $a with NO padding and
> NO prefixing.  (The only exception to this recommendation is for Voyager
> sites.)  The second most common practice
> that we accommodate is to index the OCLC number in the 001 field; again,
> with no padding or prefixing."
>
> So, I think the upshot is, if you're going to have interaction between your
> local ILS and WCL or GBS services, you're going to need to strip those
> zeros. I'm guessing Google is going to go with OCLC's standard, and they're
> highly unlikely to change it.
>
> -M
>
>
> ************************************
> Maurice York
> Head, Information Technology
> NCSU Libraries
> North Carolina State University
> Raleigh, NC 27695
>
> maurice_york_at_ncsu.edu
> Phone: 919-515-3518
>
>
> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu> wrote:
>
>   
>> So we really do need feedback from Google on how they want us to normalize
>> oclcnumbers before sending to them, and what, if any OCLCnum normalization
>> they do on their end, and if they could start.
>>
>> Good luck getting that feedback though, like I said, when I've tried,
>> there's nobody left at Google who cares about the GBS API at all, and
>> certainly nobody who cares about OCLC numbers. Or at least nobody I could
>> find. Whoever worked on the original implementation is now off to some other
>> project.
>>
>> Jonathan
>>
>>
>> Xiaoming Liu wrote:
>>
>>     
>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>> wrote:
>>>
>>>
>>>
>>>       
>>>> What we actually need is for OCLC to publish a spec on "normalizing" OCLC
>>>> numbers.  Which I guess would actually be as simple as "remove leading
>>>> zeroes."
>>>>
>>>>
>>>>
>>>>         
>>> I cannot speak for OCLC, but  xOCLCNUM service includes a "getVariants"
>>> service which normalizes OCLCNUM somehow, such as:
>>>
>>>
>>> http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVariants
>>>
>>> The API document has a link to how OCLCNUM variants are used:
>>>
>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
>>>
>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
>>>
>>> It may be clear from the service that when you use naked OCLCNUM, you
>>> should
>>> remove the leaving zeros, but if you use it with a prefix "ocm", it was
>>> recommended to pad the number to 8 digits, such as "ocm07913025".
>>>
>>> The getVariants service was suggested by Tod Matola in OCLC.
>>>
>>> xiaoming
>>>
>>>
>>>
>>>
>>>       
>>>> So I actually think Google is doing an acceptable thing, and you should
>>>> remove leading zeroes before making a query to it. Although it would be
>>>> kind
>>>> of Google to normalize on making a query too. But I wouldn't hold your
>>>> breath; my impression on this stuff, after trying to talk to Google about
>>>> it
>>>> before, is that it's pretty much a Finished Thing that nobody at Google
>>>> is
>>>> currently working on and nobody at Google currently cares about.
>>>>
>>>> But it would be nice if OCLC published a statement saying "remove leading
>>>> zeroes from OCLC numbers before comparing two OCLCnumbers to see if they
>>>> match, or submitting an OCLC number to a foreign system for comparison."
>>>> Jonathan
>>>>
>>>>
>>>> Jimmy Ghaphery wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> NGC4LIB,
>>>>>
>>>>> We have noticed an issue with using the Google API for older items where
>>>>> we have leading zeros in the OCLC number.
>>>>>
>>>>> For example with the leading zero, no result found:
>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
>>>>>
>>>>> Take out the zero:
>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
>>>>>
>>>>> What is the collective take on this? Does this seem like a reasonable
>>>>> accommodation that Google should make (ideally at someone's request with
>>>>> more juice than me, hint OCLC)? Or should I scurry about and make
>>>>> changes locally?
>>>>>
>>>>> -Jimmy
>>>>>
>>>>> --
>>>>> Jimmy Ghaphery
>>>>> Head, Library Information Systems
>>>>> VCU Libraries
>>>>> http://www.library.vcu.edu
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>       
>
>