This has been an interesting thread.
In these comparisons between Google and libraries or commercial
indexing services / subject databases, the main thing that gets
neglected is that Google is just indexing what's freely available and
not making any concerted effort to curate a collection or to index
comprehensively.
You might find this article interesting. What emerges from this study
is that Google is good for current stuff where searching on topics
yields unambiguous results. It falters, especially, on author searches
because it has no authority control.
Here are some snippets
Finding Chemical Information Using Google Scholar: A Comparision with
Chemical Abstracts Service
Levine-Clark, Michael, Kraus, Joseph
Science & Technology Libraries, 27(4): 3-17 (2007)
doi:10.1300/J122v27n04_02 [subscription required]
Abstract: Since its introduction in November 2004, Google Scholar has
been the subject of considerable discussion among librarians. Though
there has been much concern about the lack of transparency of the
product, there has been relatively little direct comparison between
Google Scholar and traditional library resources. This study compares
Google Scholar and Chemical Abstracts Service (CAS) as resources for
finding chemistry information. Of the 702 records found in six
different searches, 65.1% were in Google Scholar and 45.1% were in
CAS. Of these, 55.0% were unique to Google Scholar, 34.9% were unique
to CAS, and 10.1% overlapped. When each record found was searched by
title in the two databases, the figures change, with 79.5% in Google
Scholar, 85.6% in CAS, and 65.1% overlapping. Based on this,
researchers are more likely to find known published information
through CAS than in Google Scholar. Results vary by type of search,
type of resource, and date. For many types of searching, CAS performs
significantly better than Google Scholar. This is especially true for
searches on compounds or a personal name, both of which take advantage
of advanced search features in CAS. For simple keyword searches,
Google Scholar tends to perform better, most probably because Google
Scholar searches through the full text of journal articles, while a
keyword search through CAS only finds abstract and index terms.
Chemical Abstracts Service (CAS) has long been the major research tool
for chemistry. It is a valuable resource, containing information back
to 1907 from over 40,000 scientific journals, and has complex search
features uniquely suited to chemistry information retrieval (About
CAS). It is also very expensive, so for many libraries, a subscription
to CAS is not an option. With that in mind, it seemed worthwhile to
examine the relative utility of CAS, delivered via the SciFinder
Scholar platform, and Google Scholar for locating chemistry-related
information. Google Scholar, the free and much-hyped Google tool for
locating scholarly information, could be a useful substitute for an
expensive specialized database such as CAS. Five paired searches,
representing different common types of chemistry search strategies,
were conducted in each database, allowing a direct comparison of the
two.
Five searches were conducted over a one-month period in Google Scholar
and CAS. These included two topical searches (hypericin and epr and
pulsed epr and biology and enzymes), two searches on compounds
(hydroxybutyranilide and myrigalone), and one on a personal name
(Andrei Kutateladze). The lists of resources returned by these
searches were compared, giving a quantitative measure of the relative
quality of the two databases in terms of search results.
A second comparison was made by searching each database individually
for each title initially found in only one database. The resulting
list of titles allows a comparison of the content of the two databases
while addressing the quality of the indexing and abstracting of each.
Each of these titles was checked to determine whether all of the
keywords were present. In the search for hydroxybutyranilide, for
example, the initial searches yielded 135 separate records. Of these,
29 were in Google Scholar and 107 in CAS. Searches by keywords in the
title upped the number to 57 in Google Scholar and 135 in CAS.
The five searches returned a total of 702 records, of which 457
(65.1%) were in Google Scholar and 316 (45.1%) were in CAS. Of these,
386 (55.0%) were unique to Google Scholar, 245 (34.9%) were unique to
CAS, and 71 (10.1%) overlapped between the two. Based on the searches
done, Google Scholar returns larger sets of results. However, when the
titles found in these initial searches were searched on keywords from
the titles in the databases in which they had not initially been
located, the totals increased to 558 (79.5%) in Google Scholar and 601
(85.6%) in CAS. The degree of overlap (65.0%) is a little higher than
the 60% in the Neuhaus (2006) study.
CAS, while not as successful a tool for keyword searching in terms of
the numbers of records found, is a little better in terms of overall
numbers of records available. Part of the difference in the searching
may be due to the fact that Google Scholar searches the full text of
materials in some cases, making it more likely that search terms will
be present. The records found in CAS, based on terms in the citations,
abstracts, and index terms, are likely more focused on the topics in
question than those in Google Scholar. Based on a measure of quantity,
Google Scholar is a superior searching tool. Based on a measure of
quality, CAS may be superior.
For aggregate results, Google Scholar performs better than CAS. This
is generally true for results by date as well. For the 170 pre-1989
titles, coverage is significantly better in CAS. After 1990, Google
Scholar performs better.
The three types of searches present very different results. For the
two topical searches (hypericin and epr and pulsed epr and biology and
enzymes), 429 total records were found. Of these, 84.1% were found in
Google Scholar versus 19.8% in CAS. Though there is a range of results
within these searches, it is clear that for topical searches, Google
Scholar outperforms CAS.
For the personal-name search (Andrei Kutateladze), 93 records were
found, with 97.8% in CAS and 52.7% in Google Scholar. For the two
searches on compounds (hydroxybutyranilide and myrigalone), there were
180 total records found, of which 77.8% were in CAS and 26.1% were in
Google Scholar. ... [I]t is clear that scholars searching for
information about compounds will do better with CAS than Google
Scholar as they will for name searches.
Received on Sat Jan 05 2008 - 15:50:16 EST