Owen,
First, thanks for the nice summing up of the differences between Rob and myself. But, I would like to take issue on the wwi search: when I search "wwi" I get quite different results from searching "great war." If I search "great european war" I get even different results. And I meant that searching "wwi" will get you primary documents *if* someone has tagged them in that way. Therefore, searching "wwi primary" works only because someone somewhere made the link to the BYU page with the text "wwi primary". But this is the point: it's based on text. Also, what are we missing? Will we get the same results with "great war primary" and "great european war primary"? No.
In all, I think that it is difficult to say that three searches that all aim for the same concept, but where each one retrieves a different result could be called a concept search. It seems that by definition, a concept search should retrieve the same results.
Additionally, perhaps if people point to the wikipedia page about Dostoyevsky using "dostoievski," that may work for the wikipedia page, but it still doesn't mean that by searching dostoievski, you will be finding what is available in Google about the Russian author.
Could the information in the authority files be better exploited? Absolutely.
James Weinheimer
> I'm slightly reluctant to get involved with this, as I think this is well
> trodden ground on the list, but it is Friday, and it's almost lunch time, so..
>
> I think the point that Rob was making is that Google does more than just use a
> link as a 'vote' - it also uses the link to infer information about the thing
> being linked to. So, if I link to the Wikipedia page on Dostoyevsky using the
> text "here is some useful information on Dostoievski", then this
> would mean the Wikipedia page would start to appear in Google results under
> searches for Dostoievski as well as Dostoyevsky.
>
> By exploting the 'network effect' Google starts to build up 'concepts' as
> opposed to just text, as each web page is effectively 'tagged' by the pages
> linking to it and the text used to link to it. This is clearly an informal
> mechanism for building concepts as opposed to the more formal authority files
> used in libraries - but if every library in the world with a Web enabled
> catalogue containing references to Dostoyevsky (any spelling) hyperlinked this
> to the wikipedia page, Google would eventually exploit this and get 'better' at
> searching across all alternatives. Also, as noted, this is open to exploitation
> using 'Google Bombs'.
>
> James states that this has 'nothing to do with concept searching' - perhaps
> this is where the disagreement lies. When a librarian adds a subject heading to
> a book they are saying 'it is about this concept'. I believe that when a web
> author links to a page with a text label, they are often also saying 'it is
> about this concept'. When people use meaningless text to link to pages, this is
> lost (hence 'click here' is really bad text to use for a link, and not good
> practice), but over a large enough population, with enough people using
> reasonably good practice, it seems to work.
>
> I'm not saying this works perfectly - it doesn't - just trying to clarify that
> Google can search more than simply the text in the page it is finding.
>
> With the example of WWI, then it is clearly not true that searching for WWI
> won't find primary documentation - although again, I'm not suggesting that
> searching for WWI is a particularly good search, or that the results you get
> from Google are particularly good. The first hit for WWI is the wikipedia
> article (surprise), and it contains two films which I would certainly describe
> as primary material. Note also that a search for the 'great war' also turns up
> the same wikipedia article. Going further and searching for "WWI
> Primary" takes you to http://wwi.lib.byu.edu/index.php/Main_Page which
> contains more 'primary' sources. The point is that this works because of the
> way the web works - it links stuff together.
>
> I should leave comments on the semantic web to Rob, as I'm sure he knows far
> more than me :), but in theory the "Semantic Web" (note
> capitalisation) would allow us to start linking together disparate
> terminologies and formerly say 'this is the same as that', whereas at the
> moment we can only infer it from looking at the network of links and saying
> 'links that link to here are likely to encapsulate the same concept, which is
> represented by this page'.
>
> I'm tempted to launch into a discussion about the 'professional' status of
> librarians vs DRs, censorship and China, and a whole load of other points
> raised in this thread, but now it is lunchtime, so perhaps not today,
>
> Owen
>
> Owen Stephens
> Assistant Director: e-Strategy and Information Resources
> Imperial College London Library
> Imperial College London
> South Kensington
> London SW7 2AZ
>
>
> Tel: 020 7594 8829
> Email: o.stephens_at_imperial.ac.uk
>
>
> > -----Original Message-----
> > From: Next generation catalogs for libraries
> > [mailto:NGC4LIB_at_listserv.nd.edu] On Behalf Of Weinheimer Jim
> > Sent: 04 January 2008 11:49
> > To: NGC4LIB_at_listserv.nd.edu
> > Subject: Re: [NGC4LIB] Relevance ranking: was Aqua Brow
> >
> > > I'm sorry Jim, but you are quite wrong in this assertion that Google
> > > searches only text and that library catalogs represent concepts. The
> > > broad PageRank technology is discussed in detail on
> > wikipedia (http://
> > > en.wikipedia.org/wiki/PageRank). You yourself cite Google Bombs,
> > > which by their very nature show how google is searching exactly the
> > > concepts you suggest. (http://en.wikipedia.org/wiki/Google_bomb for
> > > those wanting more on google bombs)
> >
> > Sorry right back, but page rank has nothing to do with
> > concepts. Here is the Page Rank explained:
> > "PageRank relies on the uniquely democratic nature of the web
> > by using its vast link structure as an indicator of an
> > individual page's value. In essence, Google interprets a link
> > from page A to page B as a vote, by page A, for page B. But,
> > Google looks at more than the sheer volume of votes, or links
> > a page receives; it also analyzes the page that casts the
> > vote. Votes cast by pages that are themselves "important"
> > weigh more heavily and help to make other pages "important""
> >
> > This says nothing about searching 50 different versions of a
> > name, or the term for a concept, although it seems to use
> > fuzzy algorithms, which are also based on text. (See fuzzy
> > string searching: http://en.wikipedia.org/wiki/Fuzzy_string_searching)
> >
> > It is not searching different forms of "Dostoyevsky" in the
> > way it is done in the library catalog. When you search the
> > authorized form of "Dostoyevsky, Fyodor, 1821-1881, you are
> > also retrieving all of the following forms. (This is taken
> > from Bernhard's copy of the LC Authority file)
> > Dostoievski, Fæcopy;dor Mikhailovitch
> > Dostoievski, Fiodor
> > Dostojevski, F. M.
> > Dostojewskij, Fjodor M.
> > To-ssu-to-yeh-fu-ssu-chi
> > Dostoevsky, Fyodor
> > Zuboskal
> > Dostoevskii , Fedor Mikhai lovich
> > Dostoevskii , F. M.
> > Dostojewski, Fjedor Michailowitsch
> > Dustuyafski, Fidur
> > Dostoievsky, F.
> > Dosztojevszkij, Fjodor Mihajlovics
> > Tu- ssu-to-yeh-fu-ssu-chi
> > Dostojewski
> > Dostojewski, Fiodor
> > Dostoevskij, Fedor
> > Dostojewskij, F. M.
> > Dostojevskij, F. M.
> > Dostoevskii , Fedor
> > Dostojevskij, Fjodor
> > D̲ostogiephski, Ph. M.
> > Dostoïevsky, Th. M.
> > D̲ostogiephsky, Phiontor Michaelovits
> > Dostoïevski, Fiodor
> > Dostoiewskij
> > Dostojewski, Fjodor
> > Dostoevsky, Fedor
> > Dostoïevsky, Fæcopy;dor
> > Dostoevsky, F. M.
> > Dostojevskis, F.
> > Dostoevski, F.
> > Dostojewsky
> > Dosṭoyevsḳi, Fyodor Mikhailovits'
> > Dostogephske, Th
> > Dostojewski, Teodor
> > Dastavaski
> > D̲ostogephski
> > Dostoyewski, Fedor
> > Dosztojevszkij, F. M.
> > Dosṭoyeṿsḳi, F. M.
> > Dostojevskij, Fedor Michajlovič
> >
> > Some records are more complex than this.
> >
> > Google cannot do this. Another example that I use (that
> > people probably get tired of) is: WWI. When you search wwi in
> > Google, you get 600,000 hits, with the first one to Wikipedia.
> >
> > So, the information expert immediately asks: What are we
> > looking at? Is this a good search? Someone who doesn't
> > understand the problems will be happy with the search--that
> > is, until you realize that this search *cannot find primary
> > documents about WWI". Why? Because nobody called it world war
> > one until world war two began 20 years later.
> >
> > So, unless someone has gone in and manually added wwi to the
> > primary documents, the text search for "WWI" cannot retrieve
> > primary documents. This can be repeated with literally
> > hundreds of thousands of examples.
> >
> > Google bombs work by citations (text) to specific pages. The
> > famed "miserable failure" example (killed in
> Google--and a
> > discussion could take place whether this is censorship--but
> > it still works in Yahoo) is based on people adding links of
> > "miserable failure" to the White House page of George Bush,
> > but it has nothing to do with concept searching.
> >
> > > I can't help thinking that perhaps you are asking the wrong question
> > > here. You ask google about 'dostoyevsky'. Without any additional
> > > information they infer that you are asking about the russian author
> > > and present you with a page full of results about him - primarily
> > > summaries about him, his writing and the period in history
> > as well as
> > > a lot of detail on where to find more information.
> > >
> > > What question was it that you were trying to answer about
> > Dostoyevsky
> > > when starting the search? When he was born? What he wrote? What
> > > question does it fail to answer in the first page of
> > results? Knowing
> > > that would really help in knowing how to build a better search tool.
> >
> > For this discussion, I just want to know what is available in
> > Google by and about Dostoyevsky. A library catalog is
> > designed to do this, while Google cannot do it. I don't think
> > most people understand this. If we want to decide that the
> > traditional goals of a catalog no longer apply: i.e. to show
> > what a collection has by its authors, titles, and subjects,
> > that would be one thing, but it must be debated first.
> >
> > Again, there is nothing wrong with Google, but it has major
> > weaknesses. I also want to build a better search tool, but it
> > is vital that we all understand the strengths and weaknesses
> > of all the tools we presently have.
> >
> > James Weinheimer
> >
Received on Fri Jan 04 2008 - 08:52:59 EST