Information Retrieval List Digest 134 (October 20, 1992) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-134 ========================================================================= Date: Tue, 20 Oct 1992 11:49:12 PST Reply-To: "Information Retrieval List" Sender: "Information Retrieval List" From: IRLIST Subject: IR-L Digest, Vol.IX, No.38, Issue 134 IRLIST Digest ISSN 1064-6965 October 20, 1992 Volume IX, Number 38 Issue 134 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. Symposium on Case-Based Reasoning & Info Retrieval C. Miscellaneous 1. News from RLG 2. WLN Library Database IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.A.1. Fr: anick@aiag.enet.dec.com Re: Symposium on Case-Based Reasoning and Information Retrieval Call for Papers Symposium on Case-Based Reasoning and Information Retrieval Exploring the Opportunities for Technology Sharing 1993 AAAI Spring Symposium Series 1993 AAAI Spring Symposium Series March 23, 24, 25, 1993 Stanford University Stanford, California DESCRIPTION: The fields of Case-Based Reasoning and Information Retrieval have a shared interest in the indexing of information, the formulation of query expressions suited to retrieving relevant cases, heuristic matching, the measurement of similarity, and the use of domain knowledge to improve search. Whereas Case-Based Reasoning research has historically worked with small collections requiring a fair degree of hand-tailoring of well structured cases, researchers in Information Retrieval have concentrated on indexing and querying over very large collections of primarily textual data, with the aim of minimizing the need for hand-tailoring. Perhaps due to these differences, there has been relatively little interaction to date among practitioners in the respective fields. Case-Based Reasoning researchers are now embarking on an ambitious second phase whose goal is the implementation of systems that use large databases containing a variety of information types. For example, a case base with several thousand cases that contain video data is currently under development at the Institute for the Learning Sciences. Knowledge-based approaches have been introduced into Information Retrieval systems as well, in the form of thesauri, semantic nets, concept frames, etc. There is a recent trend in the IR community toward extracting and utilizing structured information to complement full-text retrieval methods, and to extend textual retrieval systems to encompass multi-media. Not only does the intersection of the two fields appear to be growing rapidly, but many tasks (such as text categorization) and many application domains (such as legal, medical, help-desk) could utilize both IR and CBR, raising the question of how to best integrate the methodologies into a single system with a uniform interface. The purpose of this symposium is to bring researchers from both communities together to discuss issues of common interest, share the results and experiences of their respective research, and seek areas of potential future technology transfer or convergence. Specific topics of interest include, but are not limited to, such questions as: 1. How might IR integrate and take advantage of more structured information, as used in CBR? Can weak and strong retrieval methods be effectively combined? 2. How does CBR scale up to large collections of semi-structured information? How can CBR minimize the need for hand-tailoring of the data in a case base? 3. What similarity assessment methods and metrics have been developed 4. How well do CBR/IR techniques apply to multi-media information bases? 5. How can the construction of viable queries for retrieving desired information be facilitated through CBR methods? 6. What kinds of knowledge representations are needed to support reasoning (over cases) as opposed to retrieval? What is the role of reasoning in retrieval? 7. Can/should the functions of textual information retrieval and case-based reasoning be integrated into a single application? How can the effectiveness of such a hybrid CBR/IR system be evaluated? The symposium will consist of individual presentations and panel discussions with ample opportunity for group discussion. Participation will be by invitation only. We will strive for equal participation from the two communities, as well as between academia and industry. However, we will set higher priority to achieving equal participation from the two communities before trying to achieve equal participation from academia and industry. SUBMISSIONS: Those who wish to make presentations should submit a draft paper, of length at most ten pages. All prospective submitters are encouraged to contact Evangelos Simoudis (simoudis@titan.rdd.lmsc.lockheed.com) or Peter Anick (anick@aiag.enet.dec.com) to discuss how what they wish to present is coordinated with the objectives of the symposium. Those who wish to attend without presenting a paper should send a description of their research interests and a list of related publications. Four copies of submissions should be sent to arrive by October 16 (but see note below) to: Evangelos Simoudis Lockheed AI Center O/96-20 B/254F 3251 Hanover Street Palo Alto, CA 94304 ORGANIZING COMMITTEE: Peter Anick (co-chair), Bruce Croft, William Mark, Chris Riesbeck, Evangelos Simoudis (co-chair) NOTE: This call has been distributed earlier through a number of channels. If this is the first announcement you have seen and you would like to participate but feel you cannot meet the Oct. 16 submission date, please contact Peter Anick at anick@aiag.enet.dec.com to discuss a possible extension. ********** I.C.1. Fr: Jennifer Porro Re: News from RLG (The following is an announcement from the Research Libraries Group. It has been cross-posted to several library-related listservs.) RLG DEVELOPS Z39.50 SERVER FOR INTERNET USE October 6, 1992 -- The Research Libraries Group (RLG) has developed a Z39.50 server for searching its RLIN and CitaDel databases, and 14 institutions nationwide have now successfully tested the server over the Internet. The Z39.50 server, when fully implemented, will make it easier for users of other online catalogs to search RLG's databases. Z39.50 is a national standard (ANSI/NISO Z39.50) for computer-to- computer information retrieval that enables users to search other online library catalogs and information sources using the same commands they use to search their local online catalog. As more and more information providers implement this standard, the goal of global information resource sharing will come closer to attainment. The Z39.50 protocol translates commands back and forth between the requesting system (called a "client") and the system with the database being searched (called a "server"), even if the two systems run on different hardware or use different commands and screen displays. As long as both systems support Z39.50, users of one system can search on the other as if it were their local system. Institutions that have tested RLG's server include AT&T Bell Laboratories, Data Research Associates, Innovative Interfaces, Library of Congress, NOTIS, Pennsylvania State University, Stanford University, University of California at Berkeley, and University of California Division of Library Automation -- all are members of a Z39.50 implementation group established by the Coalition for Networked Information (CNI) and provide feedback to each other on testing. Others who have tested the RLG server are Brown University, Dartmouth College, Gaylord Brothers, OCLC, and University of Tennessee at Knoxville. All testers have their own Z39.50 clients. For more information about RLG's Z39.50 server, please contact: Lennie Stovel, Research Libraries Group, 1200 Villa Street, Mountain View, CA 94041-1100; phone 415/691-2259; FAX 415/964-0943; e-mail BL.MDS@RLG.BITNET or BL.MDS@RLG.STANFORD.EDU (Internet). ********** I.C.2. Fr: brandis@rs6a.wln.com Re: WLN Library Database WLN PROVIDES FREE USE OF THE WLN EASY ACCESS SYSTEM THROUGH INTERNET (October 16, 1992 (Rev.) WLN began providing free use when it inaugurated the new WLN Easy Access System through Internet on August 1, 1992. To access WLN, type TELNET WLN.COM or 192.156.252.2 at your system's prompt. You'll see some copyright information and instructions on WLN. Then just follow the instructions. Easy Access is a new easy-to-use searching interface to the WLN online system. It combines and reduces the dozens of sophisticated WLN commands to a few essential search types. Through online user instructions and context-sensitive help screens, Easy Access assists the untrained or infrequent user in searching the WLN database. The WLN database currently contains nearly 7.5 million bibliographic records in all formats and a broad range of languages, and over 16 million holdings. Records are contributed regularly to the WLN database from the Library of Congress, U.S. Government Printing Office, National Library of Medicine, National Library of Canada, WLN member libraries and several other sources. Easy Access is also available by dial-up and from leased line workstations. For additional information on accessing the WLN database via Internet and Easy Access searching, contact Rush Brandis at 1-800-DIALWLN, (206) 923-4000, or info@wln.com. SEARCH IAC REFERENCE DATABASES FREE ON WLN UNTIL NOVEMBER 1, 1992 Three of the Information Access Company's most popular indexing and abstracting services are now available through WLN free of charge until November 1, 1992. IAC's Magazine Index/Plus, Business Index, and Expanded Academic Index databases are searched through WLN Easy Access, WLN's new easy-to-use searching interface, which is available by dial-up, WLN leased line workstations and the Internet. After the free trial period, libraries can continue to search the IAC databases on WLN after negotiating a flat rate license fee with IAC. MAGAZINE INDEX PLUS: A comprehensive index to over 400 of the most popular and widely read magazines. Subject areas covered include current affairs, consumer information, travel, arts and entertainment. Magazine Index Plus includes the most current 60 days' indexing of the New York Times. BUSINESS INDEX: Business Index includes indexing of over 800 business, management, trade journals and newspapers, along with business-related articles from over 3,000 other publications. It contains abstracts for approximately 150 major management and computer journals. Newspaper coverage includes the business and financial sections of the New York Times, Wall Street Journal, Asian Wall Street Journal and the Financial Times of Canada, as well as business-related articles from other national newspapers. EXPANDED ACADEMIC INDEX: Provides indexing and abstracting of approximately 1,500 scholarly journals covering the humanities, approximately 1,500 scholarly journals covering the humanities, social sciences, and science and technology with emphasis on areas of high academic interest such as communication studies, computer science, engineering, environmental studies and women's studies. Also includes the current six month's indexing of the New York Times. For additional information about searching IAC reference databases on WLN contact Rush Brandis at 1-800-DIALWLN, (206) 923-4000 or info@wln.com. ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADGNN-59542. AU TAKAGAKI, KEN. TI A FORMALISM FOR OBJECT-BASED INFORMATION SYSTEMS DEVELOPMENT. IN The University of British Columbia (Canada) Ph.D. 1990, 328 pages. SO DAI V52(10), SecB, pp5378. DE Computer Science. IS ISBN: 0-315-59542-6. AB Most current approaches to Information Systems Development (ISD) tend to derive from past experience and practice, rules of thumb and technology trends. The lack of theoretical foundations hinders the systematic development and evaluation of new ISD methodologies. The research undertaken in this thesis addresses this issue by proposing a formal, theory-based model, Ontology/Object-Based Conceptual Modelling (OBCM), for conceptually representing IS applications. The formalism is novel in that it is grounded in first principles derived from metaphysics, in particular the system of Ontology developed by Mario Bunge. Underlying this approach is the premise that an Information System is a model of reality and that model should be therefore rooted in a theory of reality, ie. a metaphysics. As a result, basic assumptions in reality such as thing, substance, property, attribute, time, state and change are explicitly and rigorously addressed. OBCM features an ontologically well-defined construct, "object", which is used to directly represent entities in reality, thus lending theoretical credence to the so-called object-oriented paradigm found in recent programming languages and databases. In addition, the thesis presents a framework, Ontology/Object-Based Information System (OBIS), for systems implementation based on this model. This framework directly implements the object construct so that it can be immediately utilized by the information systems user in a "direct manipulation" style of end-user interaction. Further, OBIS strives for a single, homogeneous concept of system operation drawn from ontology rather than in terms of IS or computing technology. In principle, this one concept can be applied to any object in the IS, this simplifying the understanding and use of the Information System. In this way, the model attempts to unify the analysis, implementation and user-interface aspects of Information Systems Development, thereby reducing the so-called "semantic gap" which has often been observed between the reality of the application and its final implementation in an IS. A "proof of concept" prototype is described which illustrates the main principles and explores practical applications of the proposed model. This prototype is implemented as a single, stand-alone "shell" which can be used to support a wide variety of applications as well as providing the basis of a rapid prototyping or CASE tool. The prototype is used to implement sample problems including the well-known IFIP Working Conference problem, thus demonstrating the feasibility of the overall approach. AN This item is not available from University Microfilms International ADG03-84177. AU VALIVETI, RADHAKRISHNA S. TI LEARNING ALGORITHMS FOR DATA RETRIEVAL AND STORAGE. IN Carleton University Ph.D. 1991. SO ADD X1991. DE Engineering, Electronics and Electrical. AN University Microfilms Order Number ADG92-02263. AU ELKALIFA, ELSUNI SIDAHMED. TI THE EFFECT OF COLLECTION HOMOGENEITY ON TERM ASSOCIATION AS A METHOD OF REQUEST EXPANSION IN INFORMATION RETRIEVAL. IN Case Western Reserve University Ph.D. 1991, 122 pages. SO DAI V52(10), SecA, pp3463. DE Information Science. AB Statistical techniques have been proposed as alternatives to traditional methods of request expansion or feedback mechanisms. These statistical measures are derived from formulas which attempt to correlate two given index terms on the basis of their frequency of co-occurrence in the documents of a given collection. These techniques attempt to relax the retrieval requirement that the request terms should exactly match the document descriptors before the documents can be judged relevant to the request. Though simple the concept seems to be, the complexity of the natural language and the irregularities that govern the syntactic and sematic structure make the application of such techniques rather complicated. Due to this most of the previous investigations failed to produce any efficient alternatives to traditional information retrieval systems. A major problem is false or spurious association between semantically and conceptually independent terms. It is believed that the failure of these studies is mainly due to the heterogeneity of the collections used rather than to the inefficiency of the techniques themselves. A combination of two techniques is used to create a more powerful request expansion technique. Cluster analysis techniques are used to subdivide the document collection into small more homogeneous collections; then term association techniques are applied to determine which terms could be used to expand the original request. A method used to compute the degree of association between original request terms and document descriptors is based on the formula,$$\rm R\sb$J1 = $N(W\sb$jW\sb1)\over N(W\sb$j)+N(w\sb1)-N(W\sb$jW\sb1)$$where: R$\sb$\rm j1$ is the coefficient of association between term W$\sb$\rm j$ and term W$\sb1$; N(WPVjPVW$\sb1$) = number of documents in which both term W$\sb$\rm j$ and term W$\sb1$ appeared; N(W$\sb$\rm j$) = number of documents in which term W$\sb$\rm j$ occurred; N(W$\sb1$) = number of documents in which term W$\sb1$ occurred. The document file consisted of the significant words in the titles, abstracts, and identifiers of 150 documents. Three search strategies were formulated for each request: the first consisted of the original search terms, the second included terms extracted from the entire collection while the third consisted of a combination of terms extracted from specific clusters and the original request terms. Results indicate that statistical term association techniques are effective methods of request expansion. (Abstract shortened with permission of author.). ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.