Re: leading zeros on OCLC numbers and Google book search

From: Karen Coyle <lists_at_nyob>
Date: Thu, 23 Jul 2009 08:51:52 -0700
To: NGC4LIB_at_LISTSERV.ND.EDU
Right. So knock off those leading zeroes and have it at. - kc

Jonathan Rochkind wrote:
> Okay, but... that's not the world we live in anymore. Anybody. Anywhere.
>
> Karen Coyle wrote:
>> The weird padding comes about from the history of the number. First,
>> remember that OCLC numbers were first developed in a world where fixed
>> lengths, especially for identifiers, were the norm. The number was
>> always issued as a fixed length, but the length had to change. So the
>> prefix changed, and the numbers, when exported, were padded with zeroes.
>> This is what I can reconstruct:
>>
>> oclc999999  (10 chars, 6 digits)
>> ocl79999999 (11 chars, 7 digits)
>> ocm99999999 (11 chars, 8 digits)
>> ocn999999999 (12 chars, 9 digits)
>>
>> kc
>>
>> Jonathan Rochkind wrote:
>>  
>>> Thanks Maurice, that's helpful. Bah, this isn't how I would have done
>>> it.  As OCLC numbers are simply incremented integers.... why mess
>>> around with all this prefixing and 0-padding in the first place?  If I
>>> were OCLC, I'd just say "a normalized number has all alphabetic
>>> prefixes and leading zeroes removed, it's just an integer."  And write
>>> my internal routines to do this to all OCLC numbers before trying to
>>> match them. That would be a LOT simpler and less error prone.
>>> Is there something I'm missing about some value in prefixing some
>>> OCLCnums with "ocm" and others not, some with padded leading zeroes,
>>> some not?  It seems pointlessly confusing to me.
>>>
>>> Oh, but wait, OCLC DOES go on to say that. After explaining padding
>>> and prefixing, they THEN go on to say they recommend an "institution
>>> locally index the OCLC number in 035 $a with NO padding and NO
>>> prefixing."  Um, okay, that's what I said. So why do they want it
>>> indexed one way, but insist it be stored in the record in a different
>>> crazy way?   This is just asking for confusion.
>>>
>>> Jonathan
>>>
>>> Maurice York wrote:
>>>    
>>>> For what it's worth, we had quite a go-around with padded OCLC
>>>> numbers in
>>>> our local database when trying to implement WorldCat Local. Even 
>>>> OCLC's
>>>> internal documentation was a bit vague on standard practice. After a
>>>> good
>>>> bit of back and forth, they clarified best practice for us and
>>>> updated their
>>>> documentation. Here's how they describe correctly normalized numbers:
>>>>
>>>> OCLC numbers less than eight digits are zero padded to eight digits 
>>>> and
>>>> prefixed with ocm
>>>>
>>>> OCLC numbers equal to eight digits are not zero padded, but are 
>>>> prefixed
>>>> with ocm
>>>>
>>>> OCLC numbers equal to nine digits are neither prefixed nor padded
>>>> In terms of interoperability with WorldCat services (and I'm sure
>>>> they gave
>>>> the same guideline to Google when they were setting up GBS with 
>>>> WCL) is:
>>>>
>>>> "In terms of best practice for ILS interoperability with services
>>>> such as
>>>> WorldCat Local, we always recommend that (if it's re-indexing) an
>>>> institution locally index the OCLC number in 035 $a with NO padding 
>>>> and
>>>> NO prefixing.  (The only exception to this recommendation is for 
>>>> Voyager
>>>> sites.)  The second most common practice
>>>> that we accommodate is to index the OCLC number in the 001 field; 
>>>> again,
>>>> with no padding or prefixing."
>>>>
>>>> So, I think the upshot is, if you're going to have interaction
>>>> between your
>>>> local ILS and WCL or GBS services, you're going to need to strip those
>>>> zeros. I'm guessing Google is going to go with OCLC's standard, and
>>>> they're
>>>> highly unlikely to change it.
>>>>
>>>> -M
>>>>
>>>>
>>>> ************************************
>>>> Maurice York
>>>> Head, Information Technology
>>>> NCSU Libraries
>>>> North Carolina State University
>>>> Raleigh, NC 27695
>>>>
>>>> maurice_york_at_ncsu.edu
>>>> Phone: 919-515-3518
>>>>
>>>>
>>>> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>>> wrote:
>>>>
>>>>
>>>>      
>>>>> So we really do need feedback from Google on how they want us to
>>>>> normalize
>>>>> oclcnumbers before sending to them, and what, if any OCLCnum
>>>>> normalization
>>>>> they do on their end, and if they could start.
>>>>>
>>>>> Good luck getting that feedback though, like I said, when I've tried,
>>>>> there's nobody left at Google who cares about the GBS API at all, and
>>>>> certainly nobody who cares about OCLC numbers. Or at least nobody I
>>>>> could
>>>>> find. Whoever worked on the original implementation is now off to
>>>>> some other
>>>>> project.
>>>>>
>>>>> Jonathan
>>>>>
>>>>>
>>>>> Xiaoming Liu wrote:
>>>>>
>>>>>
>>>>>        
>>>>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind 
>>>>>> <rochkind_at_jhu.edu>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          
>>>>>>> What we actually need is for OCLC to publish a spec on
>>>>>>> "normalizing" OCLC
>>>>>>> numbers.  Which I guess would actually be as simple as "remove
>>>>>>> leading
>>>>>>> zeroes."
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> I cannot speak for OCLC, but  xOCLCNUM service includes a
>>>>>> "getVariants"
>>>>>> service which normalizes OCLCNUM somehow, such as:
>>>>>>
>>>>>>
>>>>>> http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVariants 
>>>>>>
>>>>>>
>>>>>>
>>>>>> The API document has a link to how OCLCNUM variants are used:
>>>>>>
>>>>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
>>>>>>
>>>>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
>>>>>>
>>>>>> It may be clear from the service that when you use naked OCLCNUM, 
>>>>>> you
>>>>>> should
>>>>>> remove the leaving zeros, but if you use it with a prefix "ocm", it
>>>>>> was
>>>>>> recommended to pad the number to 8 digits, such as "ocm07913025".
>>>>>>
>>>>>> The getVariants service was suggested by Tod Matola in OCLC.
>>>>>>
>>>>>> xiaoming
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          
>>>>>>> So I actually think Google is doing an acceptable thing, and you
>>>>>>> should
>>>>>>> remove leading zeroes before making a query to it. Although it
>>>>>>> would be
>>>>>>> kind
>>>>>>> of Google to normalize on making a query too. But I wouldn't hold
>>>>>>> your
>>>>>>> breath; my impression on this stuff, after trying to talk to
>>>>>>> Google about
>>>>>>> it
>>>>>>> before, is that it's pretty much a Finished Thing that nobody at
>>>>>>> Google
>>>>>>> is
>>>>>>> currently working on and nobody at Google currently cares about.
>>>>>>>
>>>>>>> But it would be nice if OCLC published a statement saying "remove
>>>>>>> leading
>>>>>>> zeroes from OCLC numbers before comparing two OCLCnumbers to see
>>>>>>> if they
>>>>>>> match, or submitting an OCLC number to a foreign system for
>>>>>>> comparison."
>>>>>>> Jonathan
>>>>>>>
>>>>>>>
>>>>>>> Jimmy Ghaphery wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>            
>>>>>>>> NGC4LIB,
>>>>>>>>
>>>>>>>> We have noticed an issue with using the Google API for older
>>>>>>>> items where
>>>>>>>> we have leading zeros in the OCLC number.
>>>>>>>>
>>>>>>>> For example with the leading zero, no result found:
>>>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
>>>>>>>>
>>>>>>>> Take out the zero:
>>>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
>>>>>>>>
>>>>>>>> What is the collective take on this? Does this seem like a
>>>>>>>> reasonable
>>>>>>>> accommodation that Google should make (ideally at someone's
>>>>>>>> request with
>>>>>>>> more juice than me, hint OCLC)? Or should I scurry about and make
>>>>>>>> changes locally?
>>>>>>>>
>>>>>>>> -Jimmy
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Jimmy Ghaphery
>>>>>>>> Head, Library Information Systems
>>>>>>>> VCU Libraries
>>>>>>>> http://www.library.vcu.edu
>>>>>>>> -- 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>               
>>>>       
>>>     
>>
>>
>> -- 
>> -----------------------------------
>> Karen Coyle / Digital Library Consultant
>> kcoyle@kcoyle.net http://www.kcoyle.net
>> ph.: 510-540-7596   skype: kcoylenet
>> fx.: 510-848-3913
>> mo.: 510-435-8234
>> ------------------------------------
>>
>>   
>
>


-- 
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------
Received on Thu Jul 23 2009 - 11:54:19 EDT