In general, I think we care about privacy more than the users do. I
don't think this means we care about privacy too much; it is indeed our
responsibility to safe-guard our user's privacy even when they don't
think about it. As with many things, it's our job to think about things
so they don't have to.
I think there are certainly ways to use reccommender data like this
without a privacy invasion though, this stuff seems totally appropriate
to me. But it is useful and important to go over various 'attack'
scenarios.
In Tim's early example where a user is the only person to have checked
out two books, which would allow someone to figure out what books they
had checked out from reccommender data---wouldn't this require the
attacker _knowing_ that they were the only person to check out those
books? How would they know that?
Jonathan
David Pattern wrote:
> Because we had a large amount of checkout data to start with (from memory, it was around 2 million transactions over a 10 year period), we went for a data point of 7 or 8 (I'd need to double-check the code to find the exact figure).
>
> Our "people who borrowed this, also borrowed..." service has been live since Nov 2005 and has increasingly grown in popularity, getting up to 4000 clicks per month. Our users are also able to view their entire circ history from within their account page on the OPAC.
>
> Although I'd argue that we protect user privacy just as strongly in the UK as you do in the US, the UK's Data Protection Act allows for a more flexible framework for collecting user generated data. The bottom line is that data must not be used so that it identifies an individual and data must not be stored for longer than is necessary. Once a student graduates, their borrower record is deleted, and that breaks the link between the circulation transactions and a specific individual.
>
> When we launched the service, I did expect we'd get a few queries from users (e.g. "what data is the library collecting?", "what does the library do with the data?", etc) but, to date, we've not received any.
>
> regards
> Dave Pattern
> University of Huddersfield
>
>
>
> ________________________________
>
> From: Next generation catalogs for libraries on behalf of Tim Spalding
> Sent: Wed 21/05/2008 03:26
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] User Privacy (was: [NGC4LIB] bibtip (How it works))
>
>
>
> What you people think is the appropriate amount number of data points
> necessary to protect patron privacy in a recommendation system?
>
> One point would be a situation where, if only one user took out or
> looked at both Book A and Book B, the recommendation system would
> reveal this coincidence. I contend this would violate patron
> privacy-if you knew one book someone took out you could discover
> others. The logic of small numbers would undermine the idea of
> anonymity.
>
> I'm thinking you need at least three, and probably more. John Blyberg
> went for three or more in his SOPAC recommendations
> (http://www.blyberg.net/2007/01/31/dynamic-item-recommendations/). I'm
> not sure if that was for quality or privacy. That was based on opt-in
> data.
>
> Tim
>
>
>
>
>
>
>
>
>
> This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability.
>
>
--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
Received on Wed May 21 2008 - 09:20:38 EDT