Re: Book tagging: Amazon and LibraryThing

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Thu, 8 Mar 2007 17:22:17 -0500
To: NGC4LIB_at_listserv.nd.edu
Hey, what's in "Book" in addition to "Fiction" and "Non-fiction" that
you aren't showing us? Looks like it might be a bug of some kind.

Total Book:            139410    216657    64.3
FICTION TOTAL:    31948     39755    80.4
NONFIC TOTAL:     83592    124348    67.2

Does not add up.

Jonathan

Tim Spalding wrote:
> Harvey,
>
> They are useful—fascinating even. All told, I think the coverage is
> good. I like how things are pretty evenly distributed, but SciFi does
> a bit better than average and westerns fall off a cliff! That seems
> about right; I've never noticed westerns being big on LibraryThing.
> They don't look too big in your holdings either, but the percent is
> still off.
>
> Thanks for the data. Now all we have to do is figure out how to
> present LibraryThing data to libraries. I think we're going to ask for
> an ISBN import. Spidering the site and etc. seems too fiddly.
>
> Tim
>
> On 3/6/07, Hahn, Harvey <hhahn_at_ahml.info> wrote:
>> I previously wrote:
>> |Tim Spalding wrote:
>> ||I made an ISBN feed, so libraries could compare their holdings with
>> ||LibraryThing
>> |
>> |I've downloaded your file to compare with our public library's
>> |333,000-record bib database, not all of which have ISBNs. (It'll take
>> |a while, though--I've got a lot of other things on my plate, too.)
>>
>> Well (after filtering out microforms, maps, pamphlets, magazines, and
>> equipment), here are the results for ADULT materials *only*:
>>
>> [The first line would be read something like this:
>> 145650 (56.6%) of the 257475 ISBNs that our library has in the given
>> category (in this case, "grand total") matched ISBNs in LibraryThing's
>> ISBN list]
>>
>> =====================================================
>> NUMBER OF ISBNs THAT MATCHED LibraryThing's ISBN LIST
>> =====================================================
>>
>>                  Matches   Items     Percent
>>
>> GRAND TOTAL:     145650    257475    56.6
>>
>> Book:            139410    216657    64.3
>> Nonbook:           6240     40818    15.3
>>
>> Book subtotals:
>>
>> FICTION TOTAL:    31948     39755    80.4
>>
>> Fiction:          19537     23969    81.5
>> Mystery:           7928     10296    77.0
>> SciFi:             2873      2941    97.7
>> Western:            281       864    32.5
>> Gen pbk fic:        472       611    77.3
>> Romances:           857      1074    79.8
>>
>> NONFIC TOTAL:     83592    124348    67.2
>>
>> 000's:             2350      3080    76.3
>> 100's:             3504      4753    73.7
>> 200's:             4870      5906    82.5
>> 300's:            11535     19047    60.6
>> 400's:             1142      1460    78.2
>> 500's:             3114      4308    72.3
>> 600's:            18688     29770    62.8
>> 700's:            14331     21615    66.3
>> 800's:             6573     10014    65.6
>> 900's:            11113     16369    67.9
>> Biographies:       6372      8026    79.4
>>
>> =====================================================
>>
>>
>> As I mentioned, I've got other things going on, too, so it may be a week
>> or two before I can post YOUTH results, unless you don't want/need those
>> results because of the nature of LT's ISBN list.  By the way, these
>> breakdowns are pretty easy on our system because we created a
>> 5-character positional code (LOCATION) in every item, based primarily on
>> formats and DDC/genres.  I massaged the output of our III system to
>> eventually (after comparing against LT) come up with a file where each
>> entry had the form "xxxxxiiiiiiiiii n"--x is the positional code, i is
>> the ISBN, and n is 1 or 0, depending on whether that ISBN was found (via
>> a binary search) in LT's sorted ISBN list or not.  By using VBS's
>> regular expression object in an OCLC Connexion OML (VBA-like) macro, I
>> can do counts on the file using a pattern match on the 5-character codes
>> and summing the 0/1 results of matching LibraryThing's ISBN list.
>>
>> I hope these results are interesting and useful!
>>
>> Harvey
>>
>> --
>> ===========================================
>> Harvey E. Hahn, Manager, Technical Services Department
>> Arlington Heights (Illinois) Memorial Library
>> 847/506-2644 - FX: 847/506-2650 - Email: hhahn(at)ahml(dot)info
>> OML & Scripts web pages: http://www.ahml.info/oml/
>> Personal web pages: http://users.anet.com/~packrat
>>
>

--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
Received on Thu Mar 08 2007 - 16:22:31 EST