And I forgot to put this in my last message, but I don't think the
work that the Open Library has done on this front should be
underestimated. They have a lot of resources that they are coining
URIs for:
http://openlibrary.org/b/OL22369858M (see:
http://openlibrary.org/b/OL22369858M.rdf)
which is a start -- they add some bibliographic data (but not a lot of
detail) which can then be expanded upon:
http://semanticlibrary.org/items/47406.html
which doesn't show much, but under the hood reveals:
http://semanticlibrary.org/items/47406.rdf
which provides considerably more data, but builds upon the data
already there (this gets a little complicated -- Semantic Library is a
data incubator project intended to show what can be done with open
data, generally designed as if there was no RDF from the source --
hence the rather weak assertions back to Open Library -- but since the
OL has RDF, this will probably eventually need to be reconciled).
It would be nice (and, honestly, trivial) to see something similar
done with http://lccn.loc.gov/, for example:
http://lccn.loc.gov/67029221
If the "Dublin Core" link used DC in RDF/XML rather than OAI DC, it
would be possible for others to build upon what's there and we'd all
be talking about the same resource.
-Ross.
On Fri, Oct 23, 2009 at 2:43 PM, Ross Singer <rossfsinger_at_gmail.com> wrote:
> On Fri, Oct 23, 2009 at 10:44 AM, James Weinheimer <j.weinheimer_at_aur.edu> wrote:
>> Karen mentioned that the entire file of LC is in the Internet Archive. I was
>> unaware of that, but I can't find it. The files I can find are MARC21
>> ISO2709 files which is the equivalent of what TBL said about pdf files.
>> While MARC may be "well-documented" is is not "well-understood" by anybody
>> except catalogers. Nobody will dig the information out of that.
>
> For the people that haven't found the MARC records on archive.org:
> http://www.archive.org/details/marcrecords
>
> I disagree that MARC is much of an impediment to data sharing,
> certainly it isn't conducive to it (outside of the library domain, of
> course), but it, in and of itself, is no harder to work with than your
> notion of text delimited files. There are, after all, parsers in
> pretty much any programming language you could possibly want and tools
> (yaz-marcdump + xslt, for instance) for turning it some other
> serialization that may or may not be preferable. I cannot see how you
> would share the data that we have in CSV format in any ideal way.
>
> The problem is not the data carrier, it's the data. As Matthew Beacom
> mentioned, the corpus of records we have is prose, not a data set, and
> as such, is extraordinarily difficult to glean the hard facts from.
> Further complicating matters is that it's a very select set
> (librarians, and, more realistically, catalogers) that understand the
> nuances of the prose (especially the punctuation). This is what is
> frustrating turning the larger collections of records into linked
> data, we have our best minds, /with access to the people who
> understand the embedded semantics/ and we can't figure out how to
> model it efficiently. God help the poor soul with a CSV file and no
> library background.
>
> That being said, Jim, I understand your restlessness. Part of what
> makes linked data so good, so powerful and so necessary is that it we
> don't have to solve all of our problems at once. Get the low hanging
> fruit (titles, subjects, control numbers, standard numbers, etc.),
> mint URIs for them and release the data. Then, as we free more and
> more data from our own cleverness, it can be be asserted as it comes
> along, because the identifiers (URIs) are already out there and we
> know we're talking about the same thing. So, yes, you're right.
> Let's just release something.
>
> The key is not to get swept under by the criticisms that it's not
> perfect (see: lcsh.info/id.loc.gov/authorities) by our own ranks.
>
> -Ross.
>
Received on Fri Oct 23 2009 - 16:00:55 EDT