The weird padding comes about from the history of the number. First,
remember that OCLC numbers were first developed in a world where fixed
lengths, especially for identifiers, were the norm. The number was
always issued as a fixed length, but the length had to change. So the
prefix changed, and the numbers, when exported, were padded with zeroes.
This is what I can reconstruct:
oclc999999 (10 chars, 6 digits)
ocl79999999 (11 chars, 7 digits)
ocm99999999 (11 chars, 8 digits)
ocn999999999 (12 chars, 9 digits)
kc
Jonathan Rochkind wrote:
> Thanks Maurice, that's helpful. Bah, this isn't how I would have done
> it. As OCLC numbers are simply incremented integers.... why mess
> around with all this prefixing and 0-padding in the first place? If I
> were OCLC, I'd just say "a normalized number has all alphabetic
> prefixes and leading zeroes removed, it's just an integer." And write
> my internal routines to do this to all OCLC numbers before trying to
> match them. That would be a LOT simpler and less error prone.
> Is there something I'm missing about some value in prefixing some
> OCLCnums with "ocm" and others not, some with padded leading zeroes,
> some not? It seems pointlessly confusing to me.
>
> Oh, but wait, OCLC DOES go on to say that. After explaining padding
> and prefixing, they THEN go on to say they recommend an "institution
> locally index the OCLC number in 035 $a with NO padding and NO
> prefixing." Um, okay, that's what I said. So why do they want it
> indexed one way, but insist it be stored in the record in a different
> crazy way? This is just asking for confusion.
>
> Jonathan
>
> Maurice York wrote:
>> For what it's worth, we had quite a go-around with padded OCLC
>> numbers in
>> our local database when trying to implement WorldCat Local. Even OCLC's
>> internal documentation was a bit vague on standard practice. After a
>> good
>> bit of back and forth, they clarified best practice for us and
>> updated their
>> documentation. Here's how they describe correctly normalized numbers:
>>
>> OCLC numbers less than eight digits are zero padded to eight digits and
>> prefixed with ocm
>>
>> OCLC numbers equal to eight digits are not zero padded, but are prefixed
>> with ocm
>>
>> OCLC numbers equal to nine digits are neither prefixed nor padded
>> In terms of interoperability with WorldCat services (and I'm sure
>> they gave
>> the same guideline to Google when they were setting up GBS with WCL) is:
>>
>> "In terms of best practice for ILS interoperability with services
>> such as
>> WorldCat Local, we always recommend that (if it's re-indexing) an
>> institution locally index the OCLC number in 035 $a with NO padding and
>> NO prefixing. (The only exception to this recommendation is for Voyager
>> sites.) The second most common practice
>> that we accommodate is to index the OCLC number in the 001 field; again,
>> with no padding or prefixing."
>>
>> So, I think the upshot is, if you're going to have interaction
>> between your
>> local ILS and WCL or GBS services, you're going to need to strip those
>> zeros. I'm guessing Google is going to go with OCLC's standard, and
>> they're
>> highly unlikely to change it.
>>
>> -M
>>
>>
>> ************************************
>> Maurice York
>> Head, Information Technology
>> NCSU Libraries
>> North Carolina State University
>> Raleigh, NC 27695
>>
>> maurice_york_at_ncsu.edu
>> Phone: 919-515-3518
>>
>>
>> On Wed, Jul 22, 2009 at 3:50 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>> wrote:
>>
>>
>>> So we really do need feedback from Google on how they want us to
>>> normalize
>>> oclcnumbers before sending to them, and what, if any OCLCnum
>>> normalization
>>> they do on their end, and if they could start.
>>>
>>> Good luck getting that feedback though, like I said, when I've tried,
>>> there's nobody left at Google who cares about the GBS API at all, and
>>> certainly nobody who cares about OCLC numbers. Or at least nobody I
>>> could
>>> find. Whoever worked on the original implementation is now off to
>>> some other
>>> project.
>>>
>>> Jonathan
>>>
>>>
>>> Xiaoming Liu wrote:
>>>
>>>
>>>> On Wed, Jul 22, 2009 at 2:37 PM, Jonathan Rochkind <rochkind_at_jhu.edu>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> What we actually need is for OCLC to publish a spec on
>>>>> "normalizing" OCLC
>>>>> numbers. Which I guess would actually be as simple as "remove
>>>>> leading
>>>>> zeroes."
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I cannot speak for OCLC, but xOCLCNUM service includes a
>>>> "getVariants"
>>>> service which normalizes OCLCNUM somehow, such as:
>>>>
>>>>
>>>> http://xisbn.worldcat.org/webservices/xid/oclcnum/07913025?method=getVariants
>>>>
>>>>
>>>> The API document has a link to how OCLCNUM variants are used:
>>>>
>>>> http://xisbn.worldcat.org/xisbnadmin/xoclcnum/api.htm#getvariants
>>>>
>>>> http://www.oclc.org/support/documentation/worldcat/tb/253/253.pdf
>>>>
>>>> It may be clear from the service that when you use naked OCLCNUM, you
>>>> should
>>>> remove the leaving zeros, but if you use it with a prefix "ocm", it
>>>> was
>>>> recommended to pad the number to 8 digits, such as "ocm07913025".
>>>>
>>>> The getVariants service was suggested by Tod Matola in OCLC.
>>>>
>>>> xiaoming
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> So I actually think Google is doing an acceptable thing, and you
>>>>> should
>>>>> remove leading zeroes before making a query to it. Although it
>>>>> would be
>>>>> kind
>>>>> of Google to normalize on making a query too. But I wouldn't hold
>>>>> your
>>>>> breath; my impression on this stuff, after trying to talk to
>>>>> Google about
>>>>> it
>>>>> before, is that it's pretty much a Finished Thing that nobody at
>>>>> Google
>>>>> is
>>>>> currently working on and nobody at Google currently cares about.
>>>>>
>>>>> But it would be nice if OCLC published a statement saying "remove
>>>>> leading
>>>>> zeroes from OCLC numbers before comparing two OCLCnumbers to see
>>>>> if they
>>>>> match, or submitting an OCLC number to a foreign system for
>>>>> comparison."
>>>>> Jonathan
>>>>>
>>>>>
>>>>> Jimmy Ghaphery wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> NGC4LIB,
>>>>>>
>>>>>> We have noticed an issue with using the Google API for older
>>>>>> items where
>>>>>> we have leading zeros in the OCLC number.
>>>>>>
>>>>>> For example with the leading zero, no result found:
>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC07913025
>>>>>>
>>>>>> Take out the zero:
>>>>>> http://books.google.com/books/feeds/volumes?q=OCLC7913025
>>>>>>
>>>>>> What is the collective take on this? Does this seem like a
>>>>>> reasonable
>>>>>> accommodation that Google should make (ideally at someone's
>>>>>> request with
>>>>>> more juice than me, hint OCLC)? Or should I scurry about and make
>>>>>> changes locally?
>>>>>>
>>>>>> -Jimmy
>>>>>>
>>>>>> --
>>>>>> Jimmy Ghaphery
>>>>>> Head, Library Information Systems
>>>>>> VCU Libraries
>>>>>> http://www.library.vcu.edu
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>>
>
>
--
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596 skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------
Received on Thu Jul 23 2009 - 11:36:19 EDT