Re: OCLC recommends Open Data Commons Attribution License

From: Jonathan Rochkind <rochkind_at_nyob>
Date: Tue, 11 Sep 2012 15:38:50 -0400
To: NGC4LIB_at_LISTSERV.ND.EDU
On 9/11/2012 3:26 PM, Karen Coyle wrote:
> I suggest you keep an eye on the W3C provenance work. It was recently
> explained to me that they see a move from triples to quads, where the
> source is no more burdensome than the subject, predicate or object, and
> there is no "keeping track." It comes with the data.

But data is not immutable. What happens when a human edits a literal 
value, does it need multiple provenance values, must the original 
attribution be maintained? does it depend how much they edit? You have 
to run a 'diff', and if there are NO substrings in common, it loses it? 
What if a machine edits possibly by merging together two data sets?

Even if you keep a complete history of all immutable snapshots 
(obviously an increased technical cost in itself which you may or may 
not have wanted to do otherwise), it's not at all clear when a given 
piece of data has changed 'enough' that the original attribution is no 
longer required (although if it's been entirely replaced with a value 
from another source, it SEEMS like it would be. But what if that value 
from another source is actually MOSTLY the same as the original value, 
even if the new value came entirely from antoher source with it's OWN 
licensing requirements. We're talking about data here, the title I get 
from Amazon is quite likely to be similar to the title I get from 
WorldCat, even though the 'provenance' of each does not include the 
other. If for a particular book, I start with a title from WorldCat, and 
replace it with a title from Amazon....  does the WorldCat attribution 
license still apply because my data set has somehow been tainted?)

Also, it's great that the W3C is doing provenance work to make this 
easier (great in my opinion because it's a genuinely useful function for 
reasons other than license requirements).

But to license your data (assuming it's enforceable) such that it's only 
convenient/non-burdensome to comply with the license using one 
particular not-even-yet-finished-being-invented technology, and not 
other technologies, is obviously a barrier/burden to use.
Received on Tue Sep 11 2012 - 15:40:00 EDT