Re: Adding EAD to the 'layer of discovery'?

From: Custer, Mark <CUSTERM_at_nyob> Date: Mon, 28 Dec 2009 11:02:27 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

First off, I'd like to thank both Tod and Derek for their very informative responses.  Right now, AquaBrowser and the TRLN Endeca-based system are still the only two live implementations that I know of that are searching full MARC records side-by-side with full EAD records.  I think that Blacklight is capable of this "out-of-the-box" as well, but I just don't know of any live sites that are indexing both.

In any event, I thought that I'd point out one more example, even though neither of the two OPACS represented therein incorporate EAD.  The resource that I'm referring to is the Hathi Trust Digital Library:
http://catalog.hathitrust.org/

Here, there are 2 different ways to search their digitized texts (but I imagine that these two might be merged in the future):

1) "About" interface (search box on the far left), which searches the MARC records (which is based on Vufind, I think)
2) "Within" interface (search box in the middle, labeled beta), which searches the full-text from the OCR (which is also based on Solr/Lucene)

Regarding EAD documents, however, all of an EAD record (not just its MARC derivative) would still fall into the "about" camp (though some of it might need to be filtered out, such as information about its creator, for example).  But, I guess that it's only now that "full-text" OPACS are starting to emerge, that other types of non-MARC metadata are starting to be merged-in more thoroughly as well.

Mark Custer

-----Original Message-----
From: Next generation catalogs for libraries [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Derek Rodriguez
Sent: Wednesday, December 23, 2009 10:03 AM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] Adding EAD to the 'layer of discovery'?

Hi Mark,

    The TRLN libraries did recently complete the incorporation of EAD 
documents into our Endeca-based system.  Thanks for calling attention to 
this effort!  Initially conducted as a pilot, 
<http://www.trln.org/endeca/task-groups/ead/index.htm>, we took this 
into production in August.  Currently, we are harvesting over 6,400 EAD 
encoded finding aids nightly from Duke University Libraries, the Duke 
University Medical Center Archives, NC State University, and UNC Chapel 
Hill. We support search and display of EAD content in the consortium UI, 
Search TRLN, <http://search.trln.org>, and each of our member 
institutions' scoped Endeca interfaces. 

    For indexing purposes, we extract a handful of EAD fields using XSLT 
and merge them with content from collection level MARC records before 
indexing.  The <eadid/> is added to the MARC records to facilitate this 
match.  Since we define our own data model in Endeca, we can map the 
contents of each EAD element to an appropriate field.  The elements we 
index include <bioghist/>, <overview/>, and <scopecontent/> and 
<unititle/> at all levels of the <dsc/>

    We use some of the <ead/> sourced fields stored in Endeca to support 
display such as <accessrestrict/>, <userestrict/>, and <prefercite/>.  
To display <abstract/>, <overview/>, <bioghist/>, and <dsc/> content we 
retrieve the EAD documents on-the-fly, parse them with XSLT, and display 
them in tabs on our full record screen as shown in the "Ammons papers" 
example you provided.  These representations are intended to support 
discovery and so we still link out to the finding aid of record 
maintained by each archives department.

    The main benefit to users is discovery of archival materials 
along-side published materials and keyword indexing of specific EAD 
elements.   Your point about relevance ranking is a good one since many 
of the elements in our EAD documents are significantly larger than most 
of the metadata records in our indexes making it possible that records 
matching on these fields could appear at the top of most results lists.  
To counteract this, we refined the Endeca relevance rank settings to 
weight matches in these fields much lower than matches in other fields.  
For facets, we don't actually populate facets with metadata from the EAD 
records, we just use the metadata in the collection-level MARC records 
for this.  Since our archives departments had been maintaining MARC 
records for these finding aids for several years, this did not represent 
a change in workflow.

    You also mention advanced search options.  The goal of this project 
was make EAD-sourced content available to all users in our standard 
interfaces.  So at this point, we have not implemented advanced search 
functionality specifically geared toward EAD content in this discovery 
layer.  That said, some of our libraries do provide an advanced search 
option for granular searches of their 'finding aids of record'.  A good 
example is that offered by NCSU 
<http://www.lib.ncsu.edu/findingaids/search/advanced>.

    I hope this is helpful.  Please let me know if you have questions.

    Derek

-- 
Derek Rodriguez
Program Officer
Triangle Research Libraries Network
CB# 3940, Wilson Library
Chapel Hill, NC 27514-8890
919-962-8022 fax:919-962-4452
derek_at_trln.org
http://www.trln.org

Custer, Mark wrote:
> I'm curious if anyone on the list has experience with adding their EAD documents into a larger discovery system?
>
> Here are two examples of what  I mean:
>
>
> *         Triangle Research Library Network now indexes (and displays) entire EAD documents.
>
> Example (in which I've restricted my results to "archival materials" and entered "ammons" as my keyword):
>
> http://search.trln.org/search?Nty=1&Ntk=Keyword&Ntt=ammons&N=200092
>
>
> *         University of Chicago library's implementation of AquaBrowser seems to index entire EAD documents.
>
> Example (in which I've searched for "American Automobile Brief History", quotes included, and where the first 3 results returned should be for archival finding aids):
> http://lens.lib.uchicago.edu/?q=%22american%20automobile%20brief%20history%22
>
> So, this leads me to three questions in particular:
>
>
> 1.       Can anyone point me to any other online examples of "discovery tools" that are ingesting entire EAD documents?  Summon, Encore, Primo, Blacklight, etc.??? (but, again, I'm not asking about OPACS that only search a surrogate of the EAD)
>
>
>
> 2.       For those of you that are including the entire EAD in your library's discovery tool, did you already have surrogate MARC records for those collections in your catalog?  If so, how are you dealing with those now that you're adding the EAD?
>
>
>
> 3.       What do you think of whole retrieval experience (advanced search options, facets, incorporation into the relevancy algorithm, etc.)?
>
> Thanks in advance for any and all advice and/or other examples that might be out there,
>
>
> Mark Custer
>
>