Re: The problem with OPACs [was: New subject keyword search]

From: Ted P Gemberling <tgemberl_at_nyob>
Date: Mon, 30 Jul 2007 11:52:11 -0500
To: NGC4LIB_at_listserv.nd.edu
Michael and Ross,
I worked on Michael's catalog for awhile last night. I found that of the
218 hits Ross retrieved for Spanish Civil War (without parentheses),
about 63 (almost a third) are completely irrelevant. For example, books
on the Spanish-American War will often be retrieved, because, since it's
only about 30 years after the Civil War, many of the people involved in
the Civil War were also involved in the Spanish-American War.

Also, some books that really are on the Spanish Civil War are treated as
"low relevance," on the last page of hits:

Franco, Franco, Franco
Soldiers of Salamis
Breaking point: Hemingway, Dos Passos, and the murder of Jose Robles
El archivo que Franco expolio de Cataluna (apologies for the lack of
diacritics on the Spanish)
Spain, Portugal, and the great powers, 1931-1941

Now, when I put quotes around Spanish Civil War on the basic search, I
get a tighter group of 101 items that really are on that war. But I know
to put quotes around the words because Michael said so on his last
message. That is bibliographic instruction. Nothing on the initial
search screen indicates that.

By using Michael's advanced search screen and putting spain, civil war,
and 1936-1939 on separate lines (as subject keywords), I discover that
there actually are 203 titles in the catalog that have the subject
heading Spain--History--Civil War, 1936-1939 (including additional
subdivisions). So the basic search with Spanish Civil War in quotes
misses half of them. Some seem like they would be particularly
important:

Agrarian reform and peasant revolution in Spain: origins of the Civil
War
Homage to Catalonia
The Revolution and the Civil War in Spain
Spain, the unfinished revolution
The Republic and the Civil War in Spain
France and Munich: before and after the surrender
Durruti, the people armed

It will also miss any books in Spanish or French, of which there are
quite a few:

Islam y Guerra Civil Espanola: moros con Franco y con la Republica
La Guerra civil Espanola
Los que perdimos
La guerre civile espagnole et la litterature francaise

From what I could see, it also misses almost all "primary sources," that
is, accounts written at the time by observers, apparently because the
term "Spanish Civil War" wasn't in common use at the time:

The yoke and the arrows: a report on Spain
Civil war in Spain (1938)
Correspondent in Spain (1938)
Men in battle: a story of Americans in Spain
Defense of Madrid (1937)

Now, actually, I have to admit there are a few things that will be
retrieved by the basic search and not by the subject. I believe I found
one or two titles that had "Spanish Civil War" somewhere on contents
notes but not in subject headings. For example, one had subject heading
Literature--History and had Spanish Civil War on a content note, along
with many other things. But since that is not a prominent subject in
that book, it probably wouldn't be one of the most important books for
someone to find at the beginning of their research. But I wouldn't want
to say that keyword searches are useless. They're a very helpful tool
when we're looking for something that may have "fallen through the
cracks." Online catalogs are a big advance over card catalogs.

Michael, you said that Voyager doesn't allow options other than what
your catalog gives. But compare these two Voyager catalogs. Wichita
State University, where I used to work:

http://libcat.wichita.edu/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=hbSearch

or the University of Alabama, Birmingham's, Sterne Library:

http://www.mhsl.uab.edu/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First

They both provide a "basic search" that includes not just keywords but
alphabetical searches, and a "guided" or "advanced" search equivalent to
your "advanced search." I don't think your catalog is using its
available metadata very efficiently.

Ted Gemberling
UAB Lister Hill Library
(205)934-2461
Not an official statement of the UAB Lister Hill Library

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Doran, Michael D
Sent: Saturday, July 28, 2007 3:13 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] The problem with OPACs [was: New subject keyword
search]


> Your catalog seems to be entirely keyword searches of
> various kinds (basic and "advanced").

With the Voyager ILS we essentially have a choice of making the
"Advanced Search" a guided boolean keyword search, or a combination of
left-anchored/keyword search/browses.  Personally, I would prefer the
combination of left-anchored search/browses, but the consensus in our
library is currently leaning towards the Advanced Search that users see.


> So when I search for "Civil war" in a basic search, I get 4701 hits,
> mostly on the American Civil War, but also Julius Caesar's Civil War,
> Karl Marx's Civil war in France, books on the Spanish Civil War,
> a Chinese civil war, and others. A lot of stuff to sort through.

Our basic search is all about expected search behavior.  And that's
exactly the type of results most people would expect.  I.e. they don't
expect the search engine to be a mind reader and know that what they
were really looking for was (for example) the Spanish Civil War.
Typically what you, or I, or anybody else would do at that point after
seeing all the different results, would be to redo a search on "spanish
civil war" [1].  We do that *all* the time in
Google/Yahoo/Amazon/whatever when we get too many hits that aren't
relevant to what we are looking for.

The important thing (see below) is that the user *got* results, and
those results gave them the information they needed to refine their
search.  Note: My *preference* would have been an OPAC search interface
with facets (ala VUFind/Primo/et al.) that would have allowed a user a
one-click way to narrow the search to Spanish Civil War, but that's not
an option with Voyager.

> Let's say I use your advanced search and type in Spanish civil war,
> as a phrase, in subjects. I get nothing.

If the intention was to make *my* point, then that has been
accomplished.  The exact search (a phrase search for "Spanish civil war"
[1]) that returned 120 relevance ranked hits in the default basic search
interface *fails* in the advanced subject search, precisely because the
user doesn't know the secret handshake of Library of Congress Subject
Heading syntax.

> If I had typed Civil war, "as a phrase," and Spain,
> "all of these words," both in subjects, I would get
> 201 items. But how many users would know you have to
> use Spain on a separate line instead of Spanish civil war?

Again, this is making my exact point!  I whole-heartedly agree that a
user shouldn't *have* to know that.  The whole purpose of the UT
Arlington OPAC default OPAC search is that they don't need to know that
kind of stuff.

> Compare that with a "topic or genre/form search" for Spanish
> civil war on the UCLA archive.

Per my original response, explain to me again how the user knows (sans
bibliographic instruction) that he/she needs/wants to use a "topic or
genre/form search" (out of the twelve search types) when searching for
items on the Spanish civil war?

> There you get a display of headings that show subject
> relationships.

Librarians are *keenly* interested in subject relationships.  Most of
our users are not -- they want 'stuff' (preferably online in full text).
Our task is to make the subject relationships work IN THE BACKGROUND
(and transparently to the user) to return the results that he/she needs.
That's one of the great things about the faceting model -- do a keyword
search, get *hits* (like most people expect) but have the subject facets
available off to the side to use, or not, for refining the search.

> Admittedly, the vocabulary takes some getting used to.

You think?  Does that have usability repercussions?

> If you  search directly for "Spanish Civil War," it will appear
> on a browse screen,

If the user was wanting headings, rather than hits, I suppose that could
be considered a successful search.

> ... but you have to click on the "more info" button, and it'll
> take you to the established heading, Spain--History--Civil
> War, 1936-1939.

And what leads us to believe that users would even click on the "more
info" button?  Even if they do, they get *more* headings, but still no
hits.

> But clicking on that will provide a whole array of headings,
> some with subdivisions like aerial operations, artillery operations,
> campaigns (subdivided by place), casualties, evacuation of civilians,
> and many more.

And even *more* headings -- the average user is probably ready to slit
their wrists at this point.

> You don't have to examine hundreds of titles to find the
> specific ones you need.

When did our user discover that he/she ws looking really looking for
(for example) artillery operations in the Spanish civil war?  If
"spanish civil war artillery" is searched in our basic search, it will
pull up a small number of relevant items (if we have any, natch) since a
keyword anywhere search searches subject headings (and we can
field-weight subjects headings high if we choose).  AND it will pull up
items that have artillery in the title or table of contents, or notes,
or other field, but for whatever reason, was not assigned that LC
subject heading subcategory.  The basic keyword search with relevance
ranked results not only seems a lot easier, it also fits the model that
users are most familiar with.

> Which system is more "user friendly" when you
> factor in that?

I'll leave that for our list readers to decide for themselves.

-- Michael

[1] Without quotes, a UT Arlington Library OPAC basic search on "Spanish
civil war" gets 228 hits; within quotes, "Spanish civil war" gets 120
hits. (see http://pulse.uta.edu/)

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# doran_at_uta.edu
# http://rocky.uta.edu/doran/


________________________________

From: Next generation catalogs for libraries on behalf of Ted P
Gemberling
Sent: Fri 7/27/2007 6:19 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] The problem with OPACs [was: New subject keyword
search]



Michael,
Thanks for the response.

I realize that the UCLA search page is not as "user friendly" as it
could be. Frankly, I don't know what some of the searches are. I
wondered for awhile why there was no author search, and just now
realized that "credits search" retrieves people's names in direct order,
if they are in 5XX notes.
"Credit variants" only picks up names that are in authority records. So
it will pick up Woody Allen (even in direct order) but not Naomi Watts,
since there's no authority in their database for her. You can get hits
for her with the credits search.

Keep in mind, though, that this is a specialized database for people
studying films. So they will probably get bibliographic instruction for
using it. It's not UCLA's general catalog.

Also, it wouldn't be very hard to present the "Topic or genre/form
search" as "subject keyword search (browsable arrays)" alongside
"subject keyword search (titles)." I'm not an expert on labels, so I
don't know how good those are, but it does seem to me that this would be
a relatively easy thing to solve by labeling.

Your catalog seems to be entirely keyword searches of various kinds
(basic and "advanced"). So when I search for "Civil war" in a basic
search, I get 4701 hits, mostly on the American Civil War, but also
Julius Caesar's Civil War, Karl Marx's Civil war in France, books on the
Spanish Civil War, a Chinese civil war, and others. A lot of stuff to
sort through. If I do civil war in an advanced search, looking for words
in subjects, I get 3001 hits.

Let's say I use your advanced search and type in Spanish civil war, as a
phrase, in subjects. I get nothing. If I had typed Civil war, "as a
phrase," and Spain, "all of these words," both in subjects, I would get
201 items. But how many users would know you have to use Spain on a
separate line instead of Spanish civil war?

Compare that with a "topic or genre/form search" for Spanish civil war
on the UCLA archive. There you get a display of headings that show
subject relationships. Admittedly, the vocabulary takes some getting
used to. If you  search directly for "Spanish Civil War," it will appear
on a browse screen, but you have to click on the "more info" button, and
it'll take you to the established heading, Spain--History--Civil War,
1936-1939. But clicking on that will provide a whole array of headings,
some with subdivisions like aerial operations, artillery operations,
campaigns (subdivided by place), casualties, evacuation of civilians,
and many more. You don't have to examine hundreds of titles to find the
specific ones you need. Which system is more "user friendly" when you
factor in that?

I'm puzzled as to why you don't provide any alphabetical (heading)
searches on your catalog when Voyager provides that possibility. They
are not options at all, as near as I can see.

Ted Gemberling
UAB Lister Hill Library
(205)934-2461

-----Original Message-----
From: Next generation catalogs for libraries
[mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Doran, Michael D
Sent: Thursday, July 26, 2007 3:48 PM
To: NGC4LIB_at_LISTSERV.ND.EDU
Subject: Re: [NGC4LIB] The problem with OPACs [was: New subject keyword
search]

> I'm sorry for my outburst about "libraries not being
> educational institutions." That is caricaturing what you
> said.

No offense taken.

> But have you tried the search interface Yee sent?
> http://cinema.library.ucla.edu <http://cinema.library.ucla.edu/>

Yes, I have.

Now, I'm going to put my "usability" hat on and ask y'all to do the
same.  Compare the UCLC Film and Television Archive default OPAC search
interface (http://cinema.library.ucla.edu
<http://cinema.library.ucla.edu/> ) to our University of Texas at
Arlington Library's default OPAC search interface
(http://pulse.uta.edu/).  They are both the same Voyager ILS, so can
stand comparison as to the search interface itself (if not the data).

Look at the search tab labels.  Compare "Recommended Searches" and
"Complex boolean keyword and cross index searching with index
specification" to "Search" and "Advanced Search."  Which do you think
most users would intuitively understand and know when to use one and
when to use the other?  Usability tells us that we don't want the user
to be expending the all their time figuring out how to use the *tool*;
if designed right, they should be able to (figuratively) just pick up
the tool and start using it.

On the UCLA "Recommended Searches" search tab, a user has to decide what
type of search to do (among twelve possibilities) and the reason that
interface is loaded with search tips/help, is that for most of the
choices, the user has to know one of the secret handshakes, vs. the UTA
"Search" tab where no search type choice is necessary.

In the UTA "Search" tab search:
        words within quotation marks are treated as a phrase
        words outside of quotation marks are automatically boolean ANDed
        the search is a keyword everywhere search [1]
        the results are relevance ranked
        either an asterisk ("*") or a question mark ("?") can be used
for truncation

Author names can be searched last name first, or first name first.
Titles can be searched with, or without, a beginning article.  ISBNs can
be searched.  Subject headings can be searched.  Same box, same search
button.  Pretty much any search will pull up relevant hits.  No secret
handshakes required.

If that behavior sounds Google-like, it's because that was the
intention.  <important>NOT because Google is the be-all and end-all of
search functionality, but because Google SETS THE USER EXPECTATIONS
ABOUT HOW A SEARCH INTERFACE SHOULD WORK.</important>  When a search
interface behaves like most users expect it to behave, you don't need to
put all the search tips that explain why they got a "No hits" response
when they searched on "Mark Twain" or "The Color Purple".

Does this approach mean we give up some search precision?  Yes it does.
Are most user's already acclimatized to that with the other search
interfaces they use?  Yes they are.

I would argue that there are also 'discovery' benefits with this search
approach: A user searching on "hamlet shakespeare" will not only find
hits for the play, they will also find works *about* the play.

Are the more precise search types important?  <important>Yes they are --
I'm just saying, move them to the advanced search and for the default
search give users a tool that they can pick right up and start
using.</important>

> The "topic or genre/form" search option provides a very
> user-friendly way for people to get familiar with controlled
> vocabulary.

User-friendly?  How are they going to know to pick that search type out
of the twelve search types listed.  Realistically, how many people do
you think are actually going to choose that search?  If you snatched a
student off of the quad and asked "In the context of an online search of
a Film and Television Archive, what do you think a 'topic or genre/form'
search is?  What do you think is being searched and what type of results
would you expect?"  What kind of answer (if any) do you think you would
get?  Just choosing a search type is going to leave a lot of users
scratching their heads: "What's a credit variance search?"  "I kind of
know what a call number is, but what's an Inventory number and why and
when would I search that?"  "How about a 'pre-existing works' search?"
"What's a holdings search?"  "Or a SPAC search?"  What does xref mean to
the *average* user.

Most users would be saying to themselves, "Why can't I just enter the
search terms I want and hit the search button?"  The same way that they
do in many, if not most, of the search interfaces outside of
library-land.

I'm sorry, but the UCLA Film Archive OPAC is a user interface designed
for librarians, not for end users.  Now that may be what UCLA wants and
needs, but I'm not buying that it is user-friendly.  I would put all
that stuff in an advanced search.

Please re-read the <important></important> parts before flaming!  :-)

-- Michael

[1] In the Voyager ILS, we have some control over relevancy ranking with
field weighting

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# doran_at_uta.edu
# http://rocky.uta.edu/doran/


> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Ted P Gemberling
> Sent: Thursday, July 26, 2007 12:22 PM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] The problem with OPACs [was: New
> subject keyword search]
>
> Michael,
> I'm sorry for my outburst about "libraries not being
> educational institutions." That is caricaturing what you
> said. But have you tried the search interface Yee sent?
> http://cinema.library.ucla.edu <http://cinema.library.ucla.edu/>
>
> The "topic or genre/form" search option provides a very
> user-friendly way for people to get familiar with controlled
> vocabulary. As I pointed out to someone on Autocat, if you
> search for world war battles or battles world war, you get
> these terms:
>
> World War, 1914-1918--Battles, sieges, etc.
> World War, 1939-1945--Battles, sieges, etc.
>
> Those are all see references (terms on authority records that
> are not the established forms). Clicking on the "more info"
> buttons by them, these established headings came up:
> World War, 1914-1918--Aerial operations.
> World War, 1914-1918--Campaigns.
> World War, 1939-1945--Aerial operations.
> World War, 1939-1945--Campaigns.
> World War, 1939-1945--Naval operations.
>
> When you click on the first of those, it is further expanded to:
> World War, 1914-1918--Aerial operations.
> World War, 1914-1918--Aerial operations, American--Drama.
> World War, 1914-1918--Aerial operations, British--Drama.
> World War, 1914-1918--Aerial operations--Caricatures and cartoons.
> World War, 1914-1918--Aerial operations--Drama.
> World War, 1914-1918--Aerial operations, German--Drama.
>
> They take you directly to the titles.
>
> It seems the problem with the position you are advocating is
> that you're missing the distinction between controlled and
> uncontrolled vocabulary.
> If I do a keyword search for world war battles from that same
> search page, I get 18 hits. 18 titles. But there is no
> indication of how the various hits relate to each other as
> subjects. A person has to laboriously go through each one,
> and probably a significant number will not be what she's looking for.
>
> What you said about standardization of design is well taken.
> I don't own a car but rent them occasionally, and it is
> annoying when I can't figure out where the lever for opening
> the gas cap and such like are. But the acquisition of
> knowledge is, I think, a more complex process than driving a
> car. Driving a car is expected to be habitual: once you learn
> the task, it's supposed to be something that requires little
> or no thought. Research isn't that kind of thing. The
> question is, do we want to transfer the effort and expense of
> organizing information entirely to the users, or do we want
> to continue to do some of it for them?
>
> I imagine Selden is right about the transaction logs in her
> system. But more research needs to be done on the research
> behavior and needs of scholarly users. Those are the people
> we want our freshmen to grow into, and if we remove tools
> they need, it may seriously impair our ability as a society
> to produce high-quality scholarship.
>
> Ted Gemberling
> UAB Lister Hill Library
> (205)934-2461
>
>
>
> -----Original Message-----
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Doran, Michael D
> Sent: Thursday, July 26, 2007 9:32 AM
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: [NGC4LIB] The problem with OPACs [was: New subject
> keyword search]
>
> >  Selden Deemer wrote:
> >
> > Whatever the considerable benefits of browse displays (I read, and
> > took to heart Thomas Mann's comments), the fact remains
> that, when I
> > look at our search log stats, users (as opposed to
> librarians) simply
> > do NOT browse (and it's not for lack of instruction).
>
> I'm convinced that the underlying "problem" with our OPACs
> (from a usability perspective) is that they are sold once to
> librarians, rather than many times to end users.  If each
> user was making an individual purchase decision, OPACs would
> have quickly evolved to meet their needs.
> I believe ILS vendors (who we often unfairly blame) are quite
> capable of producing an awesome OPAC.  But the vendors are
> building OPACs to meet our (i.e. librarians) perceived needs,
> because vendors are smart and are in business to make money
> and they understand that *we* are the ones writing that big
> check every 10-15 years or so.  As Selden points out, OPAC
> features that are important/essential to us, are often ones
> that our users could care less about, despite all our
> well-meaning instruction.
>
> And that is assuming that OPAC functionality/usability is
> even a prime consideration in the purchase decision of an
> ILS.  Very often that's not the case, as acquisitions,
> cataloging, or circulation module features drive the decision
> and the OPAC is an afterthought.  If we want to find out
> who's responsible for sucky OPACs, the first place we need to
> look is in the mirror [1].
>
> On the bright side, products like VUFind, Primo, AquaBrowser,
> and Endeca unbundle the OPAC from the ILS, giving us a chance
> to atone for past ILS purchase decisions (which can't easily
> be undone).  One of the problems inherent in an ILS-bundled
> OPAC is that the 10-15 year (give or take) ILS replacement
> cycle does not allow for significant changes to what quickly
> becomes a calcified code base.  I'm particularly excited
> about Andrew Nagy's recently released open-source OPAC; with
> VUFind, the library-land development community has a golden
> opportunity to craft an OPAC that genuinely meets our users
> needs.  However, doing so will require that we resist the
> temptation to create the ideal OPAC for *librarians*, but
> instead focus on creating on OPAC that meets our
> *users'* search needs.  I think that would be an OPAC that
> doesn't require instruction (however well-meaning) or require
> an initial search page that is 80% search tips.
>
> Just my opinion...
>
> -- Michael
>
> [1] Karen Schneider asks: "But the interesting questions are:
> Why don't online catalog vendors offer true search in the
> first place? and Why [don't we] demand it? Save the time of
> the reader!"  I would answer that vendors don't offer it, and
> we don't demand it, because the ILS (OPAC) check-writers have
> other priorities.
> See: Karen Schneider, How OPACs Suck, Part 1
> http://www.techsource.ala.org/blog/2006/03/how-opacs-suck-part
-1-relevan
> ce-rank-or-the-lack-of-it.html
>
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # doran_at_uta.edu
> # http://rocky.uta.edu/doran/
>
Received on Mon Jul 30 2007 - 10:40:55 EDT