Re: OCLC recommends Open Data Commons Attribution License

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Tue, 11 Sep 2012 18:12:12 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
I know what 'provenance' means, come on.

Yes, I understand if one statement comes from Amazon and another from 
OCLC, they could have seperate attribution/provenance. You're missing my 
potential cases.

i start out ingesting data from OCLC into my system. Okay, the title 
statement "Makbeth" comes from OCLC, and needs attribution/provenance 
data there to comply with their license.

Now someone corrects the typo to "Macbeth". Does that statement still 
legally require attribution/provenance to OCLC (assumign their "BY" type 
license is enforceable in the first place?).  (And to the person that 
edited it to fix it, if they share their fixes under a "BY" license and 
that's legally enforceable?).  There are certainly ways of keeping an 
attribution/provenance chain as well, but it's not simply as a 'quad'.

The Amazon/OCLC combo example is more complicated. Let's say I have a 
system that gets data from a bunch of places.  It's got data from OCLC 
for a certain ISBN and data from Amazon for that same ISBN, and maybe 
some other places too. The OCLC one says "Makbeth", but all the others 
say "Macbeth". So my system algorithmically picks "Macbeth". Let's say 3 
other systems all agreed "Macbeth" -- and they all demand "BY" 
attribution in licensing (and assuming it's enforceable) -- do I legally 
need to keep provenance/attribution to... all 3 of those systems that 
agreed it was 'Macbeth'?  (This is one reason copyright of 'facts' is a 
mess, everyone agrees on em cause, well, they're just facts.).  If my 
system is RDF-like and keeps individual data elements as entities, then 
clearly OCLC doesn't need attribution/provenance of that new "Macbeth" 
title I got from the other 3 sources -- but later when someone corrects 
the Worldcat record to "Macbeth", it's going to _look_ like I should 
have given OCLC attribution credit.

There are potential solutions to all these things, I'm not saying it's 
impossible. I'm saying it's complicated. If you have an application with 
use cases such that you decide the value of tracking provenance is worth 
the complexity and added cost to implement, then great, and you'll 
probably want to look into the work the linked data provenance folks are 
doing and such. If you have an application where you decide the value of 
tracking provenance is not high enough to justify the 
development/impelementation cost... but you want to use data which 
legally requires you to do so anyway....   you are unhappy. And pointers 
to ongoing innovation by the linked data provenance folks probably won't 
make you too much happier.

 >> But to license your data (assuming it's enforceable) such that it's
 >> only convenient/non-burdensome to comply with the license using one
 >> particular not-even-yet-finished-being-invented technology, and not
 >> other technologies, is obviously a barrier/burden to use.
 >
 > Again, I'm a bit confused. Are you thinking that linked data provenance
 > would be used in licensing OCLC data? I haven't heard anything that
 > would imply that

I may the one that's confused, but I thought that was what we were 
talking about!  This thread is about OCLC's recommendation to use an 
attribution license. In at least some places, OCLC claims to be 
licensing their data under an ODC-BY license requiring attribution already.

Some people (including the very good post Graham Seaman referre4d to at 
http://creativecommons.org/weblog/entry/33768) suggest this can be 
technically very challenging.

I thought you were suggesting the linked data provenance work as help in 
that challenge. Was I misunderstanding?

I think the linked data provenance work is neat and I'm glad they are 
doing; I think it's unfortunate for OCLC to license data, or recommend 
others license days, in such a way that many uses will require such 
fairly innovative technology to comply with the license.
Received on Tue Sep 11 2012 - 18:13:11 EDT