Re: Book tagging: Amazon and LibraryThing

From: Tim Spalding <tim_at_nyob>
Date: Thu, 8 Mar 2007 16:42:59 -0500
To: NGC4LIB_at_listserv.nd.edu
Harvey,

They are useful—fascinating even. All told, I think the coverage is
good. I like how things are pretty evenly distributed, but SciFi does
a bit better than average and westerns fall off a cliff! That seems
about right; I've never noticed westerns being big on LibraryThing.
They don't look too big in your holdings either, but the percent is
still off.

Thanks for the data. Now all we have to do is figure out how to
present LibraryThing data to libraries. I think we're going to ask for
an ISBN import. Spidering the site and etc. seems too fiddly.

Tim

On 3/6/07, Hahn, Harvey <hhahn_at_ahml.info> wrote:
> I previously wrote:
> |Tim Spalding wrote:
> ||I made an ISBN feed, so libraries could compare their holdings with
> ||LibraryThing
> |
> |I've downloaded your file to compare with our public library's
> |333,000-record bib database, not all of which have ISBNs. (It'll take
> |a while, though--I've got a lot of other things on my plate, too.)
>
> Well (after filtering out microforms, maps, pamphlets, magazines, and
> equipment), here are the results for ADULT materials *only*:
>
> [The first line would be read something like this:
> 145650 (56.6%) of the 257475 ISBNs that our library has in the given
> category (in this case, "grand total") matched ISBNs in LibraryThing's
> ISBN list]
>
> =====================================================
> NUMBER OF ISBNs THAT MATCHED LibraryThing's ISBN LIST
> =====================================================
>
>                  Matches   Items     Percent
>
> GRAND TOTAL:     145650    257475    56.6
>
> Book:            139410    216657    64.3
> Nonbook:           6240     40818    15.3
>
> Book subtotals:
>
> FICTION TOTAL:    31948     39755    80.4
>
> Fiction:          19537     23969    81.5
> Mystery:           7928     10296    77.0
> SciFi:             2873      2941    97.7
> Western:            281       864    32.5
> Gen pbk fic:        472       611    77.3
> Romances:           857      1074    79.8
>
> NONFIC TOTAL:     83592    124348    67.2
>
> 000's:             2350      3080    76.3
> 100's:             3504      4753    73.7
> 200's:             4870      5906    82.5
> 300's:            11535     19047    60.6
> 400's:             1142      1460    78.2
> 500's:             3114      4308    72.3
> 600's:            18688     29770    62.8
> 700's:            14331     21615    66.3
> 800's:             6573     10014    65.6
> 900's:            11113     16369    67.9
> Biographies:       6372      8026    79.4
>
> =====================================================
>
>
> As I mentioned, I've got other things going on, too, so it may be a week
> or two before I can post YOUTH results, unless you don't want/need those
> results because of the nature of LT's ISBN list.  By the way, these
> breakdowns are pretty easy on our system because we created a
> 5-character positional code (LOCATION) in every item, based primarily on
> formats and DDC/genres.  I massaged the output of our III system to
> eventually (after comparing against LT) come up with a file where each
> entry had the form "xxxxxiiiiiiiiii n"--x is the positional code, i is
> the ISBN, and n is 1 or 0, depending on whether that ISBN was found (via
> a binary search) in LT's sorted ISBN list or not.  By using VBS's
> regular expression object in an OCLC Connexion OML (VBA-like) macro, I
> can do counts on the file using a pattern match on the 5-character codes
> and summing the 0/1 results of matching LibraryThing's ISBN list.
>
> I hope these results are interesting and useful!
>
> Harvey
>
> --
> ===========================================
> Harvey E. Hahn, Manager, Technical Services Department
> Arlington Heights (Illinois) Memorial Library
> 847/506-2644 - FX: 847/506-2650 - Email: hhahn(at)ahml(dot)info
> OML & Scripts web pages: http://www.ahml.info/oml/
> Personal web pages: http://users.anet.com/~packrat
>
Received on Thu Mar 08 2007 - 15:42:46 EST