Re: leading zeros on OCLC numbers and Google book search

From: Jonathan Rochkind <rochkind_at_nyob> Date: Thu, 23 Jul 2009 11:46:24 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Okay, but... that's not the world we live in anymore. Anybody. Anywhere.

Karen Coyle wrote:
> The weird padding comes about from the history of the number. First,
> remember that OCLC numbers were first developed in a world where fixed
> lengths, especially for identifiers, were the norm. The number was
> always issued as a fixed length, but the length had to change. So the
> prefix changed, and the numbers, when exported, were padded with zeroes.
> This is what I can reconstruct:
>
> oclc999999  (10 chars, 6 digits)
> ocl79999999 (11 chars, 7 digits)
> ocm99999999 (11 chars, 8 digits)
> ocn999999999 (12 chars, 9 digits)
>
> kc
>
> Jonathan Rochkind wrote:
>   
>> Thanks Maurice, that's helpful. Bah, this isn't how I would have done
>> it.  As OCLC numbers are simply incremented integers.... why mess
>> around with all this prefixing and 0-padding in the first place?  If I
>> were OCLC, I'd just say "a normalized number has all alphabetic
>> prefixes and leading zeroes removed, it's just an integer."  And write
>> my internal routines to do this to all OCLC numbers before trying to
>> match them. That would be a LOT simpler and less error prone.
>> Is there something I'm missing about some value in prefixing some
>> OCLCnums with "ocm" and others not, some with padded leading zeroes,
>> some not?  It seems pointlessly confusing to me.
>>
>> Oh, but wait, OCLC DOES go on to say that. After explaining padding
>> and prefixing, they THEN go on to say they recommend an "institution
>> locally index the OCLC number in 035 $a with NO padding and NO
>> prefixing."  Um, okay, that's what I said. So why do they want it
>> indexed one way, but insist it be stored in the record in a different
>> crazy way?   This is just asking for confusion.
>>
>> Jonathan
>>
>> Maurice York wrote:
>>     
>>> For what it's worth, we had quite a go-around with padded OCLC
>>> numbers in
>>> our local database when trying to implement WorldCat Local. Even OCLC's
>>> internal documentation was a bit vague on standard practice. After a
>>> good
>>> bit of back and forth, they clarified best practice for us and
>>> updated their
>>> documentation. Here's how they describe correctly normalized numbers:
>>>
>>> OCLC numbers less than eight digits are zero padded to eight digits and
>>> prefixed with ocm
>>>
>>> OCLC numbers equal to eight digits are not zero padded, but are prefixed
>>> with ocm
>>>
>>> OCLC numbers equal to nine digits are neither prefixed nor padded
>>> In terms of interoperability with WorldCat services (and I'm sure
>>> they gave
>>> the same guideline to Google when they were setting up GBS with WCL) is:
>>>
>>> "In terms of best practice for ILS interoperability with services
>>> such as
>>> WorldCat Local, we always recommend that (if it's re-indexing) an
>>> institution locally index the OCLC number in 035 $a with NO padding and
>>> NO prefixing.  (The only exception to this recommendation is for Voyager
>>> sites.)  The second most common practice
>>> that we accommodate is to index the OCLC number in the 001 field; again,
>>> with no padding or prefixing."
>>>
>>> So, I think the upshot is, if you're going to have interaction
>>> between your
>>> local ILS and WCL or GBS services, you're going to need to strip those
>>> zeros. I'm guessing Google is going to go with OCLC's standard, and
>>> they're
>>> highly unlikely to change it.
>>>
>>> -M
>>>
>>>
>>> ************************************
>>> Maurice York
>>> Head, Information Technology
>>> NCSU Libraries
>>> North Carolina State University
>>> Raleigh, NC 27695
>>>
>>> maurice_york_at_ncsu.edu
>>> Phone: 919-515-3518
>>>
>>>
>>> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>> wrote:
>>>
>>>
>>>       
>>>> So we really do need feedback from Google on how they want us to
>>>> normalize
>>>> oclcnumbers before sending to them, and what, if any OCLCnum
>>>> normalization
>>>> they do on their end, and if they could start.
>>>>
>>>> Good luck getting that feedback though, like I said, when I've tried,
>>>> there's nobody left at Google who cares about the GBS API at all, and
>>>> certainly nobody who cares about OCLC numbers. Or at least nobody I
>>>> could
>>>> find. Whoever worked on the original implementation is now off to
>>>> some other
>>>> project.
>>>>
>>>> Jonathan
>>>>
>>>>
>>>> Xiaoming Liu wrote:
>>>>
>>>>
>>>>         
>>>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> What we actually need is for OCLC to publish a spec on
>>>>>> "normalizing" OCLC
>>>>>> numbers.  Which I guess would actually be as simple as "remove
>>>>>> leading
>>>>>> zeroes."
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> I cannot speak for OCLC, but  xOCLCNUM service includes a
>>>>> "getVariants"
>>>>> service which normalizes OCLCNUM somehow, such as:
>>>>>
>>>>>
>>>>> http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVariants
>>>>>
>>>>>
>>>>> The API document has a link to how OCLCNUM variants are used:
>>>>>
>>>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
>>>>>
>>>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
>>>>>
>>>>> It may be clear from the service that when you use naked OCLCNUM, you
>>>>> should
>>>>> remove the leaving zeros, but if you use it with a prefix "ocm", it
>>>>> was
>>>>> recommended to pad the number to 8 digits, such as "ocm07913025".
>>>>>
>>>>> The getVariants service was suggested by Tod Matola in OCLC.
>>>>>
>>>>> xiaoming
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> So I actually think Google is doing an acceptable thing, and you
>>>>>> should
>>>>>> remove leading zeroes before making a query to it. Although it
>>>>>> would be
>>>>>> kind
>>>>>> of Google to normalize on making a query too. But I wouldn't hold
>>>>>> your
>>>>>> breath; my impression on this stuff, after trying to talk to
>>>>>> Google about
>>>>>> it
>>>>>> before, is that it's pretty much a Finished Thing that nobody at
>>>>>> Google
>>>>>> is
>>>>>> currently working on and nobody at Google currently cares about.
>>>>>>
>>>>>> But it would be nice if OCLC published a statement saying "remove
>>>>>> leading
>>>>>> zeroes from OCLC numbers before comparing two OCLCnumbers to see
>>>>>> if they
>>>>>> match, or submitting an OCLC number to a foreign system for
>>>>>> comparison."
>>>>>> Jonathan
>>>>>>
>>>>>>
>>>>>> Jimmy Ghaphery wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> NGC4LIB,
>>>>>>>
>>>>>>> We have noticed an issue with using the Google API for older
>>>>>>> items where
>>>>>>> we have leading zeros in the OCLC number.
>>>>>>>
>>>>>>> For example with the leading zero, no result found:
>>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
>>>>>>>
>>>>>>> Take out the zero:
>>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
>>>>>>>
>>>>>>> What is the collective take on this? Does this seem like a
>>>>>>> reasonable
>>>>>>> accommodation that Google should make (ideally at someone's
>>>>>>> request with
>>>>>>> more juice than me, hint OCLC)? Or should I scurry about and make
>>>>>>> changes locally?
>>>>>>>
>>>>>>> -Jimmy
>>>>>>>
>>>>>>> --
>>>>>>> Jimmy Ghaphery
>>>>>>> Head, Library Information Systems
>>>>>>> VCU Libraries
>>>>>>> http://www.library.vcu.edu
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>       
>>     
>
>
> --
> -----------------------------------
> Karen Coyle / Digital Library Consultant
> kcoyle@kcoyle.net http://www.kcoyle.net
> ph.: 510-540-7596   skype: kcoylenet
> fx.: 510-848-3913
> mo.: 510-435-8234
> ------------------------------------
>
>