Information Retrieval List Digest 169 (June 29, 1993) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-169 IRLIST Digest ISSN 1064-6965 June 29, 1993 Volume X, Number 25 Issue 169 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. 2nd call for posters for Hypertext '93 2. 2nd call for papers for Central European Conference & Exhibition for Academic Libraries & Informatics B. Publications 1. Dissertation Available IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts Available ********************************************************** I. NOTICES I.A.1. Fr: Gary Perlman Re: ACM Hypertext'92 Call for POSTER Submissions This is the second call for posters. Another will follow the 4th of July. HYPERTEXT'93 CALL FOR POSTERS 1993 ACM Conference on Hypertext Seattle, Washington, USA November 14-18, 1993 Gary Perlman Poster Session Chair Computer & Information Science 228 Bolz Hall, Ohio State University 2036 Neil Avenue, Columbus, OH 43210-1277 phone: +01-614-292-2566, fax: +01-614-292-2911 email: perlman@cis.ohio-state.edu DEADLINE FOR RECEIPT: Monday, August 2, 1993: The deadline for papers, panels, and technical briefings has passed, but there is still time to submit a poster for presentation at Hypertext'93. Poster presentations will allow researchers to present late-breaking results, significant work in progress, or work that is best presented in conversation. Poster sessions let conference attendees exchange ideas one-on-one with authors and let authors discuss their work in more detail with those attendees most deeply interested in a topic. (There's also time to submit a demonstration, contact william@atc.boeing.com.) Posters will be accepted much later than papers in order to provide an opportunity for presenting and getting feedback on hot new ideas. Posters will be reviewed by a panel of subject-matter experts and will be selected on the basis of their contribution to research or practice. Because of the interactive nature of poster presentations, only one submission will be accepted per author. Submit an EXTENDED abstract of at most two pages emphasizing: * the problem, * what was done, and * why the work is important. Please provide cover information: * the title, * the name and affiliation of the author(s), * a few keyword phrases, * complete contact address for the author to whom correspondence should be addressed. (including telephone, fax, e-mail) For accepted posters, authors will be asked for an ABBREVIATED abstract for dissemination at the conference and in the poster session report for the ACM SIGLINK newsletter. E-mail and fax submissions will be accepted. E-mail submissions are PREFERRED over paper which is preferred over fax. PostScript and RTF are okay. LaTeX, troff, Scribe, Word, WP, etc. are not. Answers to some expected questions follow. Q: When will the posters be displayed? A: Posters will on display almost all the time during the conference. There will be one or two 2-3-hour blocks of time dedicated to the posters and demonstrations during which poster presenters will be expected to be available to answer questions. At other times, conference attendees will be able to view the posters. Q: Will the posters be published as part of the proceedings? A: No, but abstracts of the posters will be available at the conference. Posters will be technical "presentations" but not "publications". Some posters might make good papers for the SIGLINK newsletter, or other outlets. The Hypertext'91 poster abstracts were published in the ACM SIGLINK Newsletter, Vol. 1, No. 2, pp. 17-24. Q: How many posters will be accepted? A: We have room for 30-40 posters, but are prepared to accept fewer. For Hypertext'91 about 50% of submissions were accepted. ********** I.A.2. Fr: Dr Algirdas Pakstas Re: Central European Conference & Exhibition for Academic Libraries & Informatics Second Call for Papers CENTRAL EUROPEAN CONFERENCE AND EXHIBITION FOR ACADEMIC LIBRARIES AND INFORMATICS VILNIUS, LITHUANIA 27-29 September, 1993 EMPOWERING USERS IN THE 21ST CENTURY The patrons of academic libraries constitute the core of the country intellectual power present and future. The development of their countries depends largely on them. Academic libraries of Central and Eastern Europe thus may well become the centres of promotion of new technologies in information infrastructure. An academic library implementing new technologies today will produce thousands of empowered users in the 21st century. Three main themes of the Conference are: 1. New Technologies in Libraries. 2. Status of Academic Libraries in Central and Eastern Europe. 3. Implementation of New Technologies. Theme of concurrent sessions will depend on Your suggestions and abstracts You will send. Our aim is to provide an opportunity for Central and Eastern Europe libraries to get acquainted with new technologies in libraries and companies providing them, to review different projects of library automation, to discuss problems of Central and Eastern Europe libraries with an expectation of future collaboration and possible joint ventures. The major event will be the exhibition, where companies from USA and Europe will demonstrate their latest achievements in information technology. YOUR PAPERS ARE WELCOME! Announcement Papers on the topics of the Conference are invited. Prospective speakers are invited to submit an abstract of 300 words in English. Please mail Abstract by July 31, 1993 to: Vida Maceviciene, Vilnius Technical University Library Ausros Vartu 7a 2600 Vilnius Lithuania Fax: +370 2 765210 E-mail: Vida.Maceviciene@AIA.VTU.LT Authors will be notified of their acceptance by August 31. The Exhibition and the Conference is open to all academic, special, national libraries and documentation centres in ministries, research centres, information centres and producers, enterprises as well as to other interested persons. FOR COMPLETE INFORMATION CONTACT VILNIUS, LITHUIANIA: VIDA.MACEVICIENE@AIA.VTU.LT ********** I.B.1. Fr: Johannes Scholtes Re: Dissertation Available [submission edited for space, ed.] ........REPRINT........ Ph.D. DISSERTATION AVAILABLE on Neural Networks, Natural Language Processing, Information Retrieval 292 pages and over 350 references =================================================================== A Copy of the dissertation "Neural Networks in Natural Language Processing and Information Retrieval" by Johannes C. Scholtes can be obtained by contacting: University of Amsterdam J.C. Scholtes Dufaystraat 1 1075 GR Amsterdam The Netherlands scholtes@alf.let.uva.nl Do not forget to mention a surface shipping address. Please allow 2-4 weeks for delivery. ABSTRACT 1.0 MACHINE INTELLIGENCE: For over fifty years the two main directions in machine intelligence (MI), neural networks (NN) and artificial intelligence (AI), have been studied by various persons with many dfferent backgrounds. NN and AI seemed to conflict with many of the traditional sciences as well as with each other. The lack of a long research history and well defined foundations has always been an obstacle for the general acceptance of machine intelligence by other fields. 2.0 NATURAL LANGUAGE PROCESSING: The study of natural language processing (NLP) exists even longer than that of MI. Already in the beginning of this century people tried to analyse human language with machines. However, serious efforts had to wait until the development of the digital computer in the 1940s, and even then, the possibilities were limited. For over 40 years, symbolic AI has been the most important approach in the study of NLP. That this has not always been the case, may be concluded from the early work on NLP by Harris. As a matter of fact, Chomsky's Syntactic Structures was an attack on the lack of structural proper-ties in the mathematical methods used in those days. But, as the latter's work remained the standard in NLP, the former has been forgotten completely until recently. As the scientific community in NLP devoted all its attention to the symbolic AI-like theories, the only use- ful practical implementation of NLP systems were those that were based on statistics rather than on linguistics. As a result, more and more scientists are redirecting their attention towards the statistical techniques a vailable in NLP. The field of connectionist NLP can be considered as a special case of these mathematical methods in NLP. 3.0 INFORMATION RETRIEVAL: The study of information retrieval (IR) was traditionally related to libraries on the one hand and military applications on the other. However, as PC's grew more popular, most common users loose track of the data they produced over the last couple of years. This, together with the introduction of various "small platform" computer programs made the field of IR relevant to ordinary users. 4.0 THE COMBINATION: As this study is a very multi-disciplinary one, the risk exists that it remains restricted to a surface discussion of many different problems without analyzing one in depth. To avoid this, some central themes, applications and tools are chosen. The themes in this work are self- organization, distributed data representations and context. The applications are NLP and IR, the tools are (variants of) Kohonen feature maps, a well known model from neural network research. ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG92-24130. AU LEUNG, TING YU. TI QUERY PROCESSING AND OPTIMIZATION IN TEMPORAL DATABASE SYSTEMS. IN University of California, Los Angeles Ph.D. 1992, 199 pages. SO DAI V53(04), SecB, pp1928. DE Computer Science. AB There is a tremendous need for keeping evolving history of the "enterprise" of interest online, resulting in a temporal database. In this dissertation, we consider query processing and optimization aspects for temporal databases. Since storing temporal information in databases is a new application area, it is not surprising that the characteristics of temporal queries are much different from those of conventional queries. Conventional relational systems are often inefficient for temporal query processing because the new characteristics are not taken into consideration. As an example, a temporal join often contains a conjunction of several inequalities involving only time attributes. In conventional relational systems, this type of queries is processed using the nested-loop join algorithm, which may not be the most efficient method. However, it can be processed much more efficiently when new processing strategies are used. We present a stream processing approach for temporal query processing. Given properly sorted data, implementation of temporal joins and semijoins as stream processors can be very efficient. We discuss the tradeoffs between sort orderings, the amount of local workspace, and multiple scans over input streams. Stream processing algorithms for various temporal joins and semijoins, and their workspace requirements for various data sort orderings are presented. We propose a novel indexing technique for temporal data streams that is based on the stream processing techniques. The index can be exploited in processing complex multi-way joins that are qualified with snapshot operators (e.g., the "as of" operator). The advantages and limitations of the scheme and a quantitative analysis of the storage requirements are presented. We propose optimization alternatives that can reduce the storage requirements. Multiprocessor database machines are probably more cost-effective at storing a huge volume of temporal data than centralized DBMSs. In this dissertation, we discuss issues involving temporal data fragmentation, temporal query processing, and query optimization in such an environment. We propose parallel strategies for multi-way joins which are based on partitioning relations on time attribute values, and optimizations for processing join queries qualified with comparison predicates involving time attributes. We analyze the schemes quantitatively, and show their advantages in computing complex temporal joins. AN University Microfilms Order Number ADGDX-96527. AU PEPPERRELL, CATHERINE. TI AN INVESTIGATION OF METHODS FOR SIMILARITY SEARCHING IN DATABASES OF THREE-DIMENSIONAL CHEMICAL STRUCTURES. IN University of Sheffield (United Kingdom) Ph.D. 1991, 320 pages. SO DAI V53(04), SecB, pp1932. DE Computer Science. AB Available from UMI in association with The British Library. Recent developments in chemical information systems have included methods for 2-D similarity searching and 3-D substructure searching. However, there has been relatively little attention paid to 3-D similarity searching. The first part of the work described in this thesis consisted of a comparison of four different methods for 3-D similarity searching, varying in sophistication and, correspondingly, in computational requirements. As a result of this comparison, the Atom Mapping method was found to perform better than the other three methods in simulated property prediction experiments. Its performance in these experiments was shown to be significantly different from that of a random ranking. It was also considerably faster than the MCS method, which has previously been considered for use in this context. The Atom Mapping method was also found to perform comparably with a more conventional 2-D similarity method using this criterion, and retrieved different structures in many cases. It performed significantly better than a 2-D method describing the shape of molecules by means of topological indices. Subsequently, the Atom Mapping method was speeded up $\approx$5 times by a combination of basic improvements in efficiency and the use of a simple upperbound calculation to pre-rank the data set. A series of modifications to the basic algorithm was also explored, including the use of different atom types, a different similarity coefficient, using angular rather than distance information and the use of a weighting scheme. None of these modifications significantly altered the performance of the method in the property prediction experiments. The Atom Mapping method was then tested by carrying out a set of sample searches on a larger data set. It appeared to retrieve molecules with a fairly high degree of spatial similarity to the query, and gave very different results from the 2-D similarity method tested. In addition, it successfully retrieved a large number of the active molecules used to 'seed' the data set. The range of atom types available was then extended further to the use of more physiochemical information, which resulted in a wide range of novel types of molecules being retrieved at the cost of increased numbers of 'false drops'. Finally, the integration of all the program developments into a single system was described. This system was found to be sufficiently fast to search databases of around 50,000 structures, but would probably be too slow for interactive use on very large databases (e.g., 500,000 structures). In a brief user-oriented evaluation of the basic Atom Mapping method, this system was found to retrieve molecules that were spatially similar to the query, but was considered to take insufficient account of the chemical and physical properties of the molecules. AN University Microfilms Order Number ADG92-24710. AU CHEN, SHU-HSIEN LAI. TI A STUDY OF ONLINE CATALOG SEARCHING BEHAVIOR OF HIGH SCHOOL STUDENTS. IN University of Georgia Ed.D. 1992, 197 pages. SO DAI V53(04), SecA, pp1133. DE Education, Technology. Library Science. AB The purpose of this study was to examine the search behavior of high school students using the online catalog. The study attempted to investigate: (1) success rate of students accessing information through various search approaches, (2) the types of search errors made by students, (3) the reformulation patterns used during the search process, and (4) the relationships between students' search success and their academic achievement and aptitude. The study was conducted in a high school media center, where an online catalog entitled MacLAP had been recently installed. Thirty-five students randomly selected from four English classes in the eleventh grade were employed as subjects of the study. The students were first given instructions on how to use the online catalog; next, they were required to complete computer search problems. A video camera recorded students' search behaviors as they used the online catalog. The study revealed that among several search approaches, students had greater success with author and title searches than with subject searches. The errors that affected success rate included several types: typographical and spelling errors, errors in using system commands, errors in exploring correct search terms, errors in interpreting information, and errors in recording search results. When students made errors, they reformulated their searches and continued their efforts. They generally reformulated in two ways--they changed the search approach or the search term. As students changed terms to improve search results, three patterns emerged: substituting general terms for specific terms, using synonyms for related concepts, and selecting an entirely different concept to search. The study also found that students' search performance was positively correlated with their academic achievement and aptitude. A moderate correlation existed between successful search performance and high scores on a standardized achievement test and a cognitive test. The findings of this study led to two conclusions: (1) students' difficulties in searching might have attributed to their lack of adequate library skills and English skills and (2) flaws of the system could also have imposed some barriers to effective and efficient searches. Further studies are needed to examine some other factors that might be associated with search difficulties. AN University Microfilms Order Number ADG92-25560. AU DAURIA, JENNIFER PIERSMA. TI A BIBLIOMETRIC ANALYSIS OF PUBLISHED MATERNAL AND CHILD HEALTH NURSING RESEARCH FROM 1976 TO 1990. IN The University of Texas at Austin Ph.D. 1992, 165 pages. SO DAI V53(04), SecB, pp1782. DE Health Sciences, Nursing. Library Science. Information Science. Health Sciences, Obstetrics and Gynecology. AB The two purposes of this bibliometric analysis were to explore and describe (a) the evolving patterns of scholarly activity and (b) the evolving intellectual structure of the scholarly community in the maternal and child health nursing (MCN) subfield as represented in the citation patterns of published nursing research from 1976 to 1990. Research articles in Journal of Advanced Nursing, Nursing Research, Research in Nursing & Health, and Western Journal of Nursing Research (N = 325) were used as the source of citation data for this bibliometric analysis. The research literature of the MCN subfield has a structure resembling that of a scientific literature. The majority of the total number of citations in the MCN research literature were to journals. The median age of the journal citations in the two 3-year sets of data (1979-1981 and 1988-1990) was less than 10 years. Over 60% of the cited journal literature originated within the subject classification of Medical Sciences. The findings of this study demonstrated an increasing trend of recurring authorship patterns (citing authors) among published MCN researchers. The trends in cited author patterns supported the emergence of increasing numbers of nurse authors into the citation networks of MCN research literature. These citation characteristics supported the emergence of a community of nurse researchers with citation and research histories in the MCN subfield. The limited intellectual structure in the MCN subfield was evident in the generally low consensus among citing authors in the referencing of authors' works as well as the relationships (cocitations) among authors' works in the MCN subfield. These data supported the emergence of a network of MCN scholars with increasing citation histories and cocitations in the research specialty area focused on maternal behavior during the prenatal period, postpartum period, and/or early infancy period. Research recommendations included the continued application of bibliometric strategies to selected bodies of the nursing literature to track the evolution of intellectual development, to identify the contributions of nurse authors, and to serve as an adjunct to qualitative analysis of scientific development in subfield and ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet or ncgur@uccmvsa.ucop.edu Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.