On 9/11/12 12:38 PM, Jonathan Rochkind wrote:
> On 9/11/2012 3:26 PM, Karen Coyle wrote:
>> I suggest you keep an eye on the W3C provenance work. It was recently
>> explained to me that they see a move from triples to quads, where the
>> source is no more burdensome than the subject, predicate or object, and
>> there is no "keeping track." It comes with the data.
>
> But data is not immutable. What happens when a human edits a literal
> value, does it need multiple provenance values, must the original
> attribution be maintained? does it depend how much they edit? You have
> to run a 'diff', and if there are NO substrings in common, it loses
> it? What if a machine edits possibly by merging together two data sets?
Jonathan, the PROV primer [1] has a good explanation of what provenance
means, and about how it relates to versioning. I haven't yet found what
I consider to be a good description of how updates and versioning are
intended to work with linked data, but I have heard assumptions that it
will be wiki-like, with all previous versions maintained with time
stamps. This will be unlike what we have today with MARC where you don't
have a choice of versions (although in theory the 040 could at least
allow you to select records based on who edited them last). So, no,
there isn't a concept, AFAIK, of multiple provenance, only different
versions of a statement.
Also, the provenance is at the statement level, not the dataset or
record level.
>
> Even if you keep a complete history of all immutable snapshots
> (obviously an increased technical cost in itself which you may or may
> not have wanted to do otherwise), it's not at all clear when a given
> piece of data has changed 'enough' that the original attribution is no
> longer required (although if it's been entirely replaced with a value
> from another source, it SEEMS like it would be. But what if that value
> from another source is actually MOSTLY the same as the original value,
> even if the new value came entirely from antoher source with it's OWN
> licensing requirements. We're talking about data here, the title I get
> from Amazon is quite likely to be similar to the title I get from
> WorldCat, even though the 'provenance' of each does not include the
> other. If for a particular book, I start with a title from WorldCat,
> and replace it with a title from Amazon.... does the WorldCat
> attribution license still apply because my data set has somehow been
> tainted?)
Again, provenance would be at the statement level, so you could use a
title from Amazon and, for example, a place of publication from
WorldCat, and you would know where they came from. That's all provenance
gives you. It doesn't have anything to do with licensing, per se, but I
believe you are responding to the "ODC-BY" license that OCLC now
recommends. "BY" just means "with attribution." And note that ODC
licenses apply to databases only, not the underlying data in the
database. (Which makes it even harder to think about what this means for
WorldCat data, rather than WorldCat. But at that point I just get confused.)
OCLC's attribution [2] requirements do not seem to be contrary to
mashups, and it doesn't require that every record or every field get
attribution, so I don't see the problem as you state it. They do make
reference to "attribution stacking" in their FAQ [3], but I read that as
being on the database as a whole, not individual elements or records.
>
> Also, it's great that the W3C is doing provenance work to make this
> easier (great in my opinion because it's a genuinely useful function
> for reasons other than license requirements).
>
> But to license your data (assuming it's enforceable) such that it's
> only convenient/non-burdensome to comply with the license using one
> particular not-even-yet-finished-being-invented technology, and not
> other technologies, is obviously a barrier/burden to use.
Again, I'm a bit confused. Are you thinking that linked data provenance
would be used in licensing OCLC data? I haven't heard anything that
would imply that.
Finished technology? What a concept!
kc
[1] http://www.w3.org/TR/prov-primer/
[2] http://www.oclc.org/us/en/data/attribution.html
[3] http://www.oclc.org/worldcat/recorduse/datalicensing/questions.htm
--
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Tue Sep 11 2012 - 16:52:00 EDT