You probably know that there is a part of the ISBN that identifies the
publisher. Edward Betts of the Open Library did a run through the OL
database and matched up the variant forms of publisher names based on
the ISBN in the record. His blog post
http://blog.openlibrary.org/2009/07/20/isbn-publisher-codes/
links to the full file for downloading with counts for each publisher.
In the file http://home.us.archive.org/~edward/isbn/index.html, if you
click on an individual publisher, you see all the various publisher
names and the dates in which they are used (which sometimes doesn't mean
anything, but at other times shows publisher name changes), something like:
0-06: 41084: (1073-1997) Harper & Row
15191: (1953-2010) HarperCollins
6351: ( 1-2009) HarperCollins Publishers
5122: (1921-2007) HarperSanFrancisco
3550: (1933-2009) HarperPerennial
2704: (1970-2009) HarperCollinsPublishers
2121: (1947-1988) Barnes & Noble Books
1908: (1993-2009) William Morrow
1642: (1900-2004) Perennial Library
1599: (1952-1988) Barnes & Noble
It seems to me that this would be a good start for 1) creating an
identifier for publishers (http://blahblah/0-06), and 2) a beginning of
an authority record with all forms of the name.
Yes, there are errors (as you can see above), so there would need to be
some cleanup, but I'm excited to be able to even think about having a
publisher "entity" and not just a string in our data.
kc
--
-----------------------------------
Karen Coyle / Digital Library Consultant
kcoyle@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596 skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------
Received on Tue Jul 21 2009 - 09:19:40 EDT