I forgot that I had already submitted an enhancement request to the
Innovative Users Group to request a numeric data type. Today I updated it
with a pointer to this conversation. But of course, this type of new
direction in library catalogs is not likely to be selected via an
enhancement voting process among users of commercial systems.
Cindy Harper
On Thu, Jan 22, 2015 at 7:49 AM, Cindy Harper <cindyharper1145_at_gmail.com>
wrote:
> Eric, I also would like to see library catalogs and discovery systems
> permit the implementers to define their own variables, with at least a
> numerical data type in addition to the standard text type, then permit
> users to rank the results by that numerical variable, or to include
> numerical ranges in their queries.
>
> I envision web mining that collects, for instance, the number of mentions
> for a title in a given set of web pages, or the number of awards that book
> has won. I'm slowly working on an example of the former by collecting the
> websites associated with the print periodicals we subscribe to, and
> creating a Google custom search based on those. Of course, I can't, under
> Google CSE's terms of use, retain the results from automated searches of
> the CSE, but I'm thinking about this problem.
>
> My understanding of linked data is limited at best, but I think these kind
> of datasets - the awards and reviews - would be the kind of data that could
> be shared and munged into the discovery system with linked data. Am I wrong?
>
> What is needed is the ability for the site to have user-defined numeric
> variables, with support for numeric operations on those variables in the
> discovery system.
>
> Cindy Harper
>
> On Mon, Jan 19, 2015 at 11:41 AM, Eric Lease Morgan <emorgan_at_nd.edu>
> wrote:
>
>> I still believe our library catalogs and “discovery” systems do not do
>> everything they can do, specifically, I believe they can include more
>> quantitative data/information.
>>
>> With the advent of digitized materials (like the things found in the
>> HathiTrust, institutional repositories, journal article indexes/databases,
>> and “digital libraries”) it is possible to count and measure
>> characteristics of individual items and then have those measurements saved
>> in the surrogate index record. Some of the things include, but are not
>> limited to:
>>
>> * length of document in words
>> * 100 most frequently used words or ngrams (excluding stop words)
>> * 100 most frequently used parts-of-speech
>> * list of unique or infrequently used words
>> * 25 most statistically significant words or phrases
>> * a list of the frequent or statistically significant named entities
>>
>> There are other measurements that could be taken such as the likelihood
>> the materials was written by a man or a woman. The likelihood the document
>> corresponds to a particular genre. The reading level of the document could
>> be calculated and scaled against education levels. Specialized coefficients
>> can be modeled — such a “great books” coefficient — and then applied to
>> each item to denote how it discusses the “great ideas”. [1]
>>
>> Given these sort of things in the surrogate index records, it would be
>> possible for our catalogs and discovery systems to answer questions such as:
>>
>> * find me a short, easy-to-read philosophy book
>> * find me a thorough, college-level biology book
>> * find me a book that takes place in and around Paris and from a
>> woman’s point of view
>> * given this set of previously marked items, create a graph
>> illustrating their use of pronouns
>> * given this set of previously marked items, create a timeline
>> illustrating what takes place when
>> * given this set of previously marked times, create a world map
>> illustrating what takes place where
>>
>> With the advent of full text, our systems can to beyond find & discover
>> and towards use & understanding.
>>
>> [1] “great books” - http://bit.ly/1AC2aFd
>>
>> —
>> Eric Lease Morgan
>>
>
>
Received on Sun Jan 25 2015 - 16:49:51 EST