Re: leading zeros on OCLC numbers and Google book search

From: Karen Coyle <lists_at_nyob> Date: Thu, 23 Jul 2009 08:33:56 -0700 To: NGC4LIB_at_LISTSERV.ND.EDU

The weird padding comes about from the history of the number. First, 
remember that OCLC numbers were first developed in a world where fixed 
lengths, especially for identifiers, were the norm. The number was 
always issued as a fixed length, but the length had to change. So the 
prefix changed, and the numbers, when exported, were padded with zeroes. 
This is what I can reconstruct:

oclc999999  (10 chars, 6 digits)
ocl79999999 (11 chars, 7 digits)
ocm99999999 (11 chars, 8 digits)
ocn999999999 (12 chars, 9 digits)

kc

Jonathan Rochkind wrote:
> Thanks Maurice, that's helpful. Bah, this isn't how I would have done 
> it.  As OCLC numbers are simply incremented integers.... why mess 
> around with all this prefixing and 0-padding in the first place?  If I 
> were OCLC, I'd just say "a normalized number has all alphabetic 
> prefixes and leading zeroes removed, it's just an integer."  And write 
> my internal routines to do this to all OCLC numbers before trying to 
> match them. That would be a LOT simpler and less error prone.
> Is there something I'm missing about some value in prefixing some 
> OCLCnums with "ocm" and others not, some with padded leading zeroes, 
> some not?  It seems pointlessly confusing to me.
>
> Oh, but wait, OCLC DOES go on to say that. After explaining padding 
> and prefixing, they THEN go on to say they recommend an "institution 
> locally index the OCLC number in 035 $a with NO padding and NO 
> prefixing."  Um, okay, that's what I said. So why do they want it 
> indexed one way, but insist it be stored in the record in a different 
> crazy way?   This is just asking for confusion.
>
> Jonathan
>
> Maurice York wrote:
>> For what it's worth, we had quite a go-around with padded OCLC 
>> numbers in
>> our local database when trying to implement WorldCat Local. Even OCLC's
>> internal documentation was a bit vague on standard practice. After a 
>> good
>> bit of back and forth, they clarified best practice for us and 
>> updated their
>> documentation. Here's how they describe correctly normalized numbers:
>>
>> OCLC numbers less than eight digits are zero padded to eight digits and
>> prefixed with ocm
>>
>> OCLC numbers equal to eight digits are not zero padded, but are prefixed
>> with ocm
>>
>> OCLC numbers equal to nine digits are neither prefixed nor padded
>> In terms of interoperability with WorldCat services (and I'm sure 
>> they gave
>> the same guideline to Google when they were setting up GBS with WCL) is:
>>
>> "In terms of best practice for ILS interoperability with services 
>> such as
>> WorldCat Local, we always recommend that (if it's re-indexing) an
>> institution locally index the OCLC number in 035 $a with NO padding and
>> NO prefixing.  (The only exception to this recommendation is for Voyager
>> sites.)  The second most common practice
>> that we accommodate is to index the OCLC number in the 001 field; again,
>> with no padding or prefixing."
>>
>> So, I think the upshot is, if you're going to have interaction 
>> between your
>> local ILS and WCL or GBS services, you're going to need to strip those
>> zeros. I'm guessing Google is going to go with OCLC's standard, and 
>> they're
>> highly unlikely to change it.
>>
>> -M
>>
>>
>> ************************************
>> Maurice York
>> Head, Information Technology
>> NCSU Libraries
>> North Carolina State University
>> Raleigh, NC 27695
>>
>> maurice_york_at_ncsu.edu
>> Phone: 919-515-3518
>>
>>
>> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu> 
>> wrote:
>>
>>  
>>> So we really do need feedback from Google on how they want us to 
>>> normalize
>>> oclcnumbers before sending to them, and what, if any OCLCnum 
>>> normalization
>>> they do on their end, and if they could start.
>>>
>>> Good luck getting that feedback though, like I said, when I've tried,
>>> there's nobody left at Google who cares about the GBS API at all, and
>>> certainly nobody who cares about OCLC numbers. Or at least nobody I 
>>> could
>>> find. Whoever worked on the original implementation is now off to 
>>> some other
>>> project.
>>>
>>> Jonathan
>>>
>>>
>>> Xiaoming Liu wrote:
>>>
>>>    
>>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>      
>>>>> What we actually need is for OCLC to publish a spec on 
>>>>> "normalizing" OCLC
>>>>> numbers.  Which I guess would actually be as simple as "remove 
>>>>> leading
>>>>> zeroes."
>>>>>
>>>>>
>>>>>
>>>>>         
>>>> I cannot speak for OCLC, but  xOCLCNUM service includes a 
>>>> "getVariants"
>>>> service which normalizes OCLCNUM somehow, such as:
>>>>
>>>>
>>>> http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVariants 
>>>>
>>>>
>>>> The API document has a link to how OCLCNUM variants are used:
>>>>
>>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
>>>>
>>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
>>>>
>>>> It may be clear from the service that when you use naked OCLCNUM, you
>>>> should
>>>> remove the leaving zeros, but if you use it with a prefix "ocm", it 
>>>> was
>>>> recommended to pad the number to 8 digits, such as "ocm07913025".
>>>>
>>>> The getVariants service was suggested by Tod Matola in OCLC.
>>>>
>>>> xiaoming
>>>>
>>>>
>>>>
>>>>
>>>>      
>>>>> So I actually think Google is doing an acceptable thing, and you 
>>>>> should
>>>>> remove leading zeroes before making a query to it. Although it 
>>>>> would be
>>>>> kind
>>>>> of Google to normalize on making a query too. But I wouldn't hold 
>>>>> your
>>>>> breath; my impression on this stuff, after trying to talk to 
>>>>> Google about
>>>>> it
>>>>> before, is that it's pretty much a Finished Thing that nobody at 
>>>>> Google
>>>>> is
>>>>> currently working on and nobody at Google currently cares about.
>>>>>
>>>>> But it would be nice if OCLC published a statement saying "remove 
>>>>> leading
>>>>> zeroes from OCLC numbers before comparing two OCLCnumbers to see 
>>>>> if they
>>>>> match, or submitting an OCLC number to a foreign system for 
>>>>> comparison."
>>>>> Jonathan
>>>>>
>>>>>
>>>>> Jimmy Ghaphery wrote:
>>>>>
>>>>>
>>>>>
>>>>>        
>>>>>> NGC4LIB,
>>>>>>
>>>>>> We have noticed an issue with using the Google API for older 
>>>>>> items where
>>>>>> we have leading zeros in the OCLC number.
>>>>>>
>>>>>> For example with the leading zero, no result found:
>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
>>>>>>
>>>>>> Take out the zero:
>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
>>>>>>
>>>>>> What is the collective take on this? Does this seem like a 
>>>>>> reasonable
>>>>>> accommodation that Google should make (ideally at someone's 
>>>>>> request with
>>>>>> more juice than me, hint OCLC)? Or should I scurry about and make
>>>>>> changes locally?
>>>>>>
>>>>>> -Jimmy
>>>>>>
>>>>>> -- 
>>>>>> Jimmy Ghaphery
>>>>>> Head, Library Information Systems
>>>>>> VCU Libraries
>>>>>> http://www.library.vcu.edu
>>>>>> -- 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>       
>>
>>   
>
>

-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------