Re: User Privacy

From: Kevin M Kidd <kiddk_at_nyob> Date: Wed, 21 May 2008 12:45:53 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I would like to piggy-back on Edward's excellent comments on user privacy. This is a big soapbox issue for me, because I feel that we as a profession must begin to leverage this user/usage data right now.

First, I don't think I have ever read, or heard a librarian say that he/she wants to abandon circulation confidentiality. What exactly does that mean? I would like to know who is advocating such a thing.

To simply say that leveraging any user and usage data is tantamount to publishing it for big brother to harvest smacks of an attempt to suppress a real debate - a debate that needs to happen right now.

1.) Why shouldn't a user be able to opt-in to new services which make available data that libraries traditionally kept private? Are they complete idiots? Is this what we think?

2.) What is the difference between circ data that a user might want to share with other library users and a personal book list a LibraryThing user shares with the world (a list which is, I must say, public by default)?

3.) There *is* a difference between implicit recommendations built using statistical analysis (ala BibTip), and personalized, profile-based recommendations built by Amazon and Netflix. Again, the point is, we can build services without referring *at all* to the actions and/or preferences of individual, identifiable patrons. Why should we not do this? This is a no-brainer.

4.) The point of our profession is to provide the best service possible to our patrons. That is the number one goal - is it not? We talk about Web 2.0 and Library 2.0 as opportunities to improve service. But it is obvious that we treat this "new service model" as if it is just a grab-bag of individual toys we can use to improve our Web sites. But all of these new services we are scrambling to add to our arsenal are fundamentally about data - user and usage data. The effectiveness of such services, when de-coupled from the data that drives them, is greatly diminished.

While we have no right to use patron data in such a way that it can point back to an individual patron *without his or her consent*, it is clear that we can greatly improve service by developing innovative uses for the data we do collect every minute of every day.

--------------------------------------
Kevin M. Kidd, MA, MLIS
Library Applications & Systems Manager
Boston College Libraries
Phone: 617-552-1359
Fax: 617-552-1089
e-Mail: kevin.kidd_at_bc.edu
Blog: http://datadrivenlibrary.blogspot.com/

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Edward M. Corrado
Sent: Wednesday, May 21, 2008 12:03 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] User Privacy

There are certainly ways to put in safe guards to be able to make
constructive use of circulation and other data while protecting patron
privacy. There is a need for legal safe guards (such as those in place
with medical records) but there is also a responsibility of libraries to
try to "scrub" any personal data so individual people can't be tracked
down by it. Dr. Scott Nicholson has addressed various ways to do this in
his Bibliomining research (http://www.bibliomining.com/). That said, I
would argue that today many librarians typically do a better job talking
about how they don't keep this personal data then they do actually
making sure that they don't (you keep backups, right? how about
transaction logs, e-mail server logs, etc.?). Some libraries are better
at this than others but I bet many librarians would be shocked at how
much private/confidential information they have about some of there
patrons "laying around" if they ever performed an extensive audit.

I'd also argue that this is not a new issue, or a generational issue
(I've seen discussion where more "experienced" librarians have
criticized new librarians who advocate using this data as not
understanding or respecting to a large enough degree patron privacy).
Studies from the 1980s and early 1990s have shown that librarians would
regularly give out the names and topics of mediated searches [1, 2] and
a 1993 study questioned the confidentiality of inter-library loan
records [3]. The concept of government agencies (or others) being
interested in circulation records is not a new issue in 2008, or even in
the 2000's The FBI Library Awareness Program existed in the 1970's and
80's. There was also an earlier program in the 1940's and after the 1968
Democratic National Convention, the FBI examined circulation records in
several public and academic libraries.

Libraries going forward need to figure out what value-added services
they can, and should provide. No longer can libraries stay relevant just
by acquiring resources. The School of Science, for ScienceDirect as the
library can. All they need is a purchase order. One of the ways that
libraries can add value to the institution is by making use of this
trove of information available to them. As Estabrook [4] wrote in 1996,
" in the name of one
good--keeping patron records confidential--we are sacrificing another:
targeted and tailored
services to library users." The trick is how to do this in an
economically feasible, yet useful way while still providing an
acceptable level of privacy/confidentiality. There are a lot of great
possibilities awaiting to be discovered in this area, which makes being
a librarian in this era quite exciting.

Edward

[1] Isbell, Mary K., and M. Kathleen Cook. 1986. Confidentiality of
online bibliographic searches: Attitudes and practices. RQ,. 25: 483-487.

[2] Wilkes, Adeline W., and Susan Marie Grant. 1995. Confidentiality
policies and procedures of the reference departments in Texas academic
libraries. RQ 34 (4): 473.

[3] Nolan, Christopher W. 1993. The confidentiality of interlibrary loan
records. Journal of   Academic Librarianship 19 (2):81.

[4] Estabrook, Leigh S. 1996. Sacred trust or competitive opportunity:
Using patron records. Library Journal 121 (2): 48.

Walt Crawford wrote:
> Jonathan, off-list:
>
> What a fine paragraph. The first sentence had me wary (because so many
> people use it as an excuse to weaken privacy policies), and then you
> immediately turn it around with professional responsibility. Great stuff.
> Thanks!
> -walt crawford-
>
> On Wed, May 21, 2008 at 7:42 AM, Jonathan Rochkind <rochkind_at_jhu.edu> wrote:
>
>
>> In general, I think we care about privacy more than the users do.  I
>> don't think this means we care about privacy too much; it is indeed our
>> responsibility to safe-guard our user's privacy even when they don't
>> think about it. As with many things, it's our job to think about things
>> so they don't have to.
>>
>> I think there are certainly ways to use reccommender data like this
>> without a privacy invasion though, this stuff seems totally appropriate
>> to me.  But it is useful and important to go over various 'attack'
>> scenarios.
>>
>> In Tim's early example where a user is the only person to have checked
>> out two books, which would allow someone to figure out what books they
>> had checked out from reccommender data---wouldn't this require the
>> attacker _knowing_ that they were the only person to check out those
>> books? How would they know that?
>>
>> Jonathan
>>
>> David Pattern wrote:
>>
>>
>>> Because we had a large amount of checkout data to start with (from memory,
>>> it was around 2 million transactions over a 10 year period), we went for a
>>> data point of 7 or 8 (I'd need to double-check the code to find the exact
>>> figure).
>>>
>>> Our "people who borrowed this, also borrowed..." service has been live
>>> since Nov 2005 and has increasingly grown in popularity, getting up to 4000
>>> clicks per month.  Our users are also able to view their entire circ history
>>> from within their account page on the OPAC.
>>>
>>> Although I'd argue that we protect user privacy just as strongly in the UK
>>> as you do in the US, the UK's Data Protection Act allows for a more flexible
>>> framework for collecting user generated data.  The bottom line is that data
>>> must not be used so that it identifies an individual and data must not be
>>> stored for longer than is necessary.  Once a student graduates, their
>>> borrower record is deleted, and that breaks the link between the circulation
>>> transactions and a specific individual.
>>>
>>> When we launched the service, I did expect we'd get a few queries from
>>> users (e.g. "what data is the library collecting?", "what does the library
>>> do with the data?", etc) but, to date, we've not received any.
>>>
>>> regards
>>> Dave Pattern
>>> University of Huddersfield
>>>
>>>
>>>
>>> ________________________________
>>>
>>> From: Next generation catalogs for libraries on behalf of Tim Spalding
>>> Sent: Wed 21/05/2008 03:26
>>> To: NGC4LIB_at_LISTSERV.ND.EDU
>>> Subject: Re: [NGC4LIB] User Privacy (was: [NGC4LIB] bibtip (How it works))
>>>
>>>
>>>
>>> What you people think is the appropriate amount number of data points
>>> necessary to protect patron privacy in a recommendation system?
>>>
>>> One point would be a situation where, if only one user took out or
>>> looked at both Book A and Book B, the recommendation system would
>>> reveal this coincidence. I contend this would violate patron
>>> privacy-if you knew one book someone took out you could discover
>>> others. The logic of small numbers would undermine the idea of
>>> anonymity.
>>>
>>> I'm thinking you need at least three, and probably more. John Blyberg
>>> went for three or more in his SOPAC recommendations
>>> (http://www.blyberg.net/2007/01/31/dynamic-item-recommendations/). I'm
>>> not sure if that was for quality or privacy. That was based on opt-in
>>> data.
>>>
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> This transmission is confidential and may be legally privileged. If you
>>> receive it in error, please notify us immediately by e-mail and remove it
>>> from your system. If the content of this e-mail does not relate to the
>>> business of the University of Huddersfield, then we do not endorse it and
>>> will accept no liability.
>>>
>>>
>>>
>>>
>> --
>> Jonathan Rochkind
>> Digital Services Software Engineer
>> The Sheridan Libraries
>> Johns Hopkins University
>> 410.516.8886
>> rochkind (at) jhu.edu
>>
>>