Information Retrieval List Digest 159 (April 20, 1993) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-159 IRLIST Digest ISSN 1064-6965 April 20, 1993 Volume X, Number 15 Issue 159 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. EACL '93 [I'm a little late on this..] II. QUERIES B. Requests for Information 1. Codes for the Calculation of Similarity 2. Automated Aids to Contolled Vocabulary Indexing IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.A.1. Fr: EACL 1993 Re: EACL93: Final Programme and Conference Information Sixth Conference of the European Chapter of the Association for Computational Linguistics 21-23 April 1993, Utrecht Hosted by OTS (Research Institute for Language and Speech) OVERVIEW OF THE PROGRAMME: Invited Speakers: * Ken Church (AT&T Bell Laboratories): ``Termworks: Tools for Human Translators'' * Ivan Sag (CSLI Stanford): ```Extraction' without Traces, Empty COMPs or Function Composition'' * Johan van Benthem (ILLC, University of Amsterdam): ``Grammar as Proof Theory'' Tutorials: * Uses of Dynamic Logic in NL Processing Jeroen Groenendijk and Martin Stokhof (ILLC, University of Amsterdam) * Recent Developments in Unification-based NL Processing Hans Uszkoreit (University of Saarbruecken) * Statistical Methods in NL Processing Mark Liberman (IRCS, University of Pennsylvania) and Yves Schabes (MERL, Cambridge, MA) * Complexity Issues in NL Processing Leen Torenvliet (ILLC, University of Amsterdam) CONFERENCE: Papers will be presented on a wide range of topics in Computational Linguistics. The programme features special sessions on * Data-oriented methods in CL and * Logic and CL. STUDENT SESSION: This year for the first time, the EACL conference will include a student session. This session provides a forum for students to present work in progress. INFORMATION SESSION: An information session will be held on European Infrastructural Organisations (EACL, EAGLES, EAMT, ECI, ELSNET, FoLLI) with the cooperation of Susan Armstrong-Warwick, Norbert Brinkhoff, Roberto Cencioni, Maghi King, Ewan Klein, Erik-Jan van der Linden and Antonio Zampolli. POSTER SESSIONS AND DEMONSTRATIONS: Authors will present and discuss their projects and/or demonstrate NLP-programs. EXHIBITIONS: At the conference there will be a book exhibition by publishers in the field of CL (Walter de Gruyter & Co, Elsevier Science Publishers, Kluwer Academic Publishers Group, Cambridge University Press), demonstrations of commercial linguistic software (Silver Platter Information), and information desks of ACL and OTS. ADDITIONAL MEETINGS: Two meetings will be organised in conjunction with EACL93: * a workshop on MT Lexicons, and * the General Assembly of the EAMT (European Association for Machine Translation). For information on these meetings see below. PARALLEL ACTIVITIES: Workshop on MT Lexicons: The workshop will consist of four moderated discussion sessions, with audience participation encouraged. The discussion topics are: * Lexical Semantics, General Lexicography and MT Lexicography. Bonnie Dorr (moderator), David Farwell, Martha Palmer, Antonio Sanfilippo, Clare Voss. * Economy of Lexicon Acquisition due to Generativity of the Lexicon. Ann Copestake, James Pustejovsky (moderator). * Metalanguages for Meaning Specification. Wilfried Hoetker, Petra Ludewig, Sergei Nirenburg (moderator), Boyan Onyshkevich, Patrick St-Dizier. * Automating MT Lexicon Acquisition James Cowie, Louise Guthrie (moderator), Judith Klavans, Yuji Matsumoto, Evelyne Tsoukermann. The size of the workshop is restricted by the available space. Participant slots are still available. They will be allocated on first come, first served basis. To request participation, please contact: Sergei Nirenburg Center for Machine Translation, School of Computer Science Carnegie Mellon University, 5000 Forbes Avenue Pittsburgh, PA 15213-3890 fax: 1-412-268-6298, e-mail: sergei.nirenburg@cs.cmu.edu General Assembly of the EAMT (European Association for Machine Translation). Non-members are welcome but will not have voting rights. For further information, please contact: Maghi King ISSCO 54 Route des Acacias CH-1227 Geneva Switserland email: king@divsun.unige.ch GENERAL ADDRESSES: If you want more information on the conference, or if you want to leave a message for one of the conference participants, please contact: CONFERENCE OFFICE: Before 19 April: During the conference (19-23 April): EACL93 EACL93 OTS CSB Trans 10 Kromme Nieuwegracht 39 NL-3512 JK Utrecht NL-3512 HD Utrecht The Netherlands The Netherlands Tel: +31 30 53 63 77 Tel: +31 30 364515 Fax: +31 30 53 60 00 Fax: +31 30 53 60 00 Email: eacl93@let.ruu.nl Email: eacl93@let.ruu.nl For information on the ACL in general, contact Don Walker (global), or Mike Rosner (for Europe): Dr. Donald E. Walker (ACL) Dr. Michael Rosner (ACL) Bellcore, MRE 2A379 IDSIA 445 South Street, Box 1910 Corso Elvezia 36 Morristown, NJ 07960-1910, USA CH-6900 Lugano, Switzerland walker@flash.bellcore.com mike@idsia.uu.ch We all wish you a very pleasant conference, Steven Krauwer, Michael Moortgat, Louis des Tombe (Conference Chair) Anne-Marie Mineur, Yvon Wijnen (Student Chair) Renee Pohlmann (Local Coordinator) ********** I.A.2. Fr: Mike Shepherd -- Re: Canadian Association for Information Science 21st Annual Conference PRELIMINARY PROGRAM INFORMATION AS A GLOBAL COMMODITY: COMMUNICATION, PROCESSING AND USE CAIS/ACSI'93 21st Annual Conference Canadian Association for Information Science St. Francis Xavier University Antigonish, Nova Scotia, Canada 12-14 July 1993 The annual CAIS/ACSI conference provides a national and international forum for the presentation and discussion of research and development in information science. The focus of this year's conference is on information as a global commodity. A description of a half-day tutorial on "Accessing Large Amounts of Text", the preliminary program, and conference registration follow: SUNDAY, July 11, 1993 2-5: TUTORIAL: Accessing Large Amounts of Text: Query Languages, Interaction Protocols, and Text Models (Dr. Forbes Burkowski, Univ. of Waterloo) Recent advances in computer technology (LAN interconnected RISC based computers coupled with arrays of small disks) have provided the economic basis for powerful yet inexpensive distributed computing environments. The key system implementation issues revolve around the theme of client/server architectures with a general trend to implement departmental or enterprise-wide solutions that often displace the mainframe platform in a downsizing exercise. When this client/server environment supports a text retrieval system, the user interface (incorporating some type of query language) runs on the client machine and communicates with one or more retrieval engines each running on a server machine. Many system implementation issues (modularity, ease of modification, etc.) will be aided if this client/server communication adheres to the rigorous specification of a well formulated interaction protocol. This tutorial presents a review of various query languages and interaction protocols (CCL, CD-RDx, and SFQL) now being used in the text retrieval industry. This will be followed by a review of research being done in the area (interaction protocols based on hierarchical text algebras, context-free grammars, etc.). The motivation, history, and architecture of each interaction protocol is presented along with a description of the underlying text model. A comparative analysis of each protocol discusses its suitability for various application domains. BIOGRAPHY OF THE PRESENTER: Forbes Burkowski is an Associate Professor in the Computer Science Department at the Unviersity of Waterloo. He has developed text retrieval systems in both the research and commercial sectors. His recent work has concentrated on two areas of research: access methods for structured documents and the modelling of text hierarchies, the objective being to provide a well defined conceptual layer between user interface and lower level access methods. While on sabbatical in 1992, he worked as a Chief Scientist in Text Retrieval for Systemhouse Ltd. While in this position he worked at Dow Jones and Company evaluating software and hardware configurations for their future SuperText news retrieval service. PRELIMINARY CONFERENCE PROGRAM MONDAY, July 12, 1993 9-10:15: Opening Addresses Assesment Indicators and the Impact of Information on Development (Martha Stone, IDRC, Ottawa). 10:30-12: IT in Less-Developed Countries -- IT Landmarks in Less-Developed Countries: The Chilean Case (R. Baeza-Yates, D. Fuller, J.A. Pino, University of Chile); Developing Modern Information Infrastructure in Africa (J. Abawagy, Dalhousie University); CD-ROM for Agricultural Researchers in Egypt (B. Grainger, McGill University). 1:30-3: Information Technology and Libraries -- Inter-Institutional Borrowing in Nova Scotia Higher Education Institutes (L. Beltaos, St. Francis Xavier Univ); Evaluating CD_ROM Software: A Model (T. Richards & C. Robinson, Univ. of Western Ontario); New Approach to IBM-PC for Accessing Library of National Chiao-Tung University, Taiwan). 3:30-5: Communication -- Communication Technologies and Human Subjectivity (B. Frohmann, Univ. of Western Ontario); How NOT to; Market Telecommunications (W. Sheridan, Ottawa); Study of Facilitating Interorganizational Collaboration (R. Inskip, Univ. of Alberta). TUESDAY, July 13, 1993 9-10: Invited Speaker (IT in Nova Scotia, NovaKnowledge). 10:30-12: IT Issues -- X.500 More then a Global Directory (J. Hong, A. Marshall, M. Bauer, Univ. of Western Ont.) A Study of the Effect of Controlling Flow of Information through Imposition of Statutes (M.A. Williamson, Univ. of Western Ontario); Launching the SnoopGuard PC Access-Control (J. Nash & M. Nash, Ottawa); 1:30-3: Informatics -- Assumptions in the Naming of Information (H. Olson, Univ. of Alberta); A Note on Maximum Impact factors (R. Rousseau, Belgium);. Research Methods Used in Information Science (P. Bernhard, Univ.of Montreal). 3:30-5: Annual General Meeting BANQUET WEDNESDAY, July 14, 1993 9-10: Invited Speaker (Multilingual Access to Document Databases, Steven Pollitt, University of Huddersfield, UK) 10:30-12: Information Retrieval - I -- Performance in ART1-like Neural Network for Document Clustering (K. MacLeod, Saint Mary's Univ., Halifax); A Multi-Agent Distributed Retrieval System (J. Nie, N. Anquetil, J. Vaucher, Univ. de Montreal); Multimodal Access to Text Data Streams (C. Watters & M. Shepherd, Dalhousie Univ.). 1:30-3 Information Retrieval - II -- Hypertext Maintenance (R. Robson & T. Guan, Univ. of New Brunswick); Performance Evaluation of Large Text Retrieval Systems (F. Burkowski, Univ. of Waterloo); Fast Adaptive Data Compression for Information Retrieval (M. Nelson, Univ. of Western Ontario). 3:30-5: Information Retrieval -III -- Information Retrieval from MIDI Encoded Music Files (J. Tague-Sutcliffe, Univ.Western Ontario); The Effect of a CD_ROM Interface on Children's Retrieval Performance (J. Beheshti, McGill Univ); VIBE Access to Data Sets (R. Korfhage & B. Pharmanto, Univ. of Pittsburg); Evaluation of Genetic Algorithm Solutions (C. Carrick & K. MacLeod, Saint Mary's Univ., Halifax). FOR COMPLETE INFORMATION CONTACT: Prof. Ernst J. Schuegraf Box 55, St. Francis Xavier University Antigonish, Nova Scotia Canada B2G 1C0 Fax: 902-867-5153 email: schuegraf@essex.stfx.ca ********************************************************** II. QUERIES II.B.2. Fr: Yanhong Li Re: Codes for the Calculation of Similarity I am looking for the codes: (C, C++, Perl, shell scripts, Pascal, etc.) for the calculation of document similarity, especially for Cosin Similarity. Other similarity calculations are also needed. Thanks, Yanhong Li 716-838-6493(h) 645-3198(o) Dept. of Computer Science Good good study SUNY at Buffalo, Buffalo, NY14260 Up up every day ********** II.B.1. Fr: David Lewis Re: Query: Automated Aids to Controlled Vocabulary Indexing In a recent discussion on PACS-L, several people expressed the belief that artificial intelligence software was more likely to be of use as an aid to human indexing than as a replacement. I know of a couple of systems of this sort that have been fielded by Carnegie Group. I would be interested in getting pointers to other such software that currently exists, or to published research in this area. I'm particularly interested in systems that suggest controlled vocabulary categories to be assigned, with the human indexer having the opportunity to confirm or override the machine's suggestion. If you reply to me I will post a summary of replies to this list, and will include your name along with your comment unless otherwise requested. Thanks, Dave ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG92-11645. AU SRINIVASAN, VENKATACHARY. TI ON-LINE PROCESSING IN LARGE-SCALE TRANSACTION SYSTEMS. IN The University of Wisconsin - Madison Ph.D. 1992, 210 pages. SO DAI V53(03), SecB, pp1475. DE Computer Science. AB In this thesis, we provide techniques to adapt current database technology to account for the following trends that can be observed in database management system (DBMS) usage: (1) DBMSs are being increasingly used in applications, like computerized stock trading, that have very high transaction rates. (2) Database sizes are growing rapidly, and future databases are expected to be several orders of magnitude larger than the largest databases in operation today. (3) Next generation DBMSs are expected to gravitate more and more towards what is referred to as 24(hour) $\times$ 7(day) operation. In order to handle high transaction rates, future DBMSs have to use highly concurrent algorithms for managing often-used auxiliary data structures like indices. To better understand the performance of concurrency control algorithms for index access, we first compare the performance of B-tree concurrency control algorithms using a simulation model of a centralized DBMS. Our performance study compares a number of proposed algorithms over a wide range of resource conditions, tree structures, and workloads. Based on the performance results, we characterize how specific details of a concurrency control algorithm can enhance or reduce concurrency. On-line DBMS utilities are an important step towards achieving the goal of 24 $\times$ 7 operation for very large databases. This thesis addresses issues involved in executing on-line utilities by developing several new algorithms for on-line index construction. These algorithms each permit an index to be built while the corresponding data is concurrently accessed for reads and writes. A comprehensive performance study of the proposed on-line index construction algorithms is used to determine the best candidate for use in a DBMS. Applying the techniques used for on-line index construction to query processing leads to a new, highly concurrent method of query execution called compensation-based query processing. In this new approach to query processing, concurrent updates to any data participating in a query are communicated to the query's on-line query processor, which then compensates for these updates so that the final answer reflects changes caused by the updates. Very high concurrency is achieved by locking data only briefly, at the tuple-level, while still delivering transaction-consistent answers to queries. AN University Microfilms Order Number ADG92-22193. AU SU, LIHENG MARK. TI CONCEPTUAL MODELING FRAMEWORK AND QUERY LANGUAGE FOR MULTI-MODEL INFORMATION SYSTEMS. IN Illinois Institute of Technology Ph.D. 1991, 134 pages. SO DAI V53(03), SecB, pp1476. DE Computer Science. Information Science. AB This thesis proposes a general framework for developing a fully integrated Multi-Model Information System (MMIS) environment. Ideas from the database, knowledge-base, object modeling, information retrieval and software engineering disciplines are used to establish a foundation of support for the entity modeling, process modeling and policy modeling aspects of an integrated MMIS environment. A Multi-Model Query Language is presented which allows the expression of multi-model selection criteria for extracting information specified in the multi-model knowledge base. It also serves as a common interface standard for the design methodology it supports. It is the finding of this research that (1) a general conceptual reference framework provides an excellent common ground for providing solutions to MMIS integration issues; (2) the capabilities of the proposed multi-model query/browsing facility far exceed that of conventional query languages, and (3) the proposed modeling and query approaches are easily adaptable to various multi-model information systems such as CASE, and CAD/CAM. AN University Microfilms Order Number ADGMM-62338. AU SHAW, GOUTAM KUMAR. TI DESIGN AND IMPLEMENTATION OF A NATURAL LANGUAGE BASED EXPERT SYSTEM FOR A MULTIMEDIA DATABASE. IN University of Ottawa (Canada) M.A.Sc. 1990, 135 pages. SO MAI V30(03) pp848. DE Engineering, Electronics and Electrical. Artificial Intelligence. Health Sciences, Radiology. IS ISBN: 0-315-62338-1. AB This thesis is concerned with the design and implementation of a Natural Language Based Expert System for a Multimedia Database for radiology applications. The design consists of three independent modules, namely the Natural Language Processor, the Expert System for query analyzing and result outputting, and the Database Management module (query execution module). These modules interact with each other through shared data and storage areas. The thesis emphasizes the integration of an expert system with a database management system to retrieve multimedia data stored in the database. At this stage of the development, all the three modules can interact with each other. The input to the system is a query in English for retrieving the data from the database. The system can check the syntax and semantics of the input queries, generate as well as execute equivalent database queries, and produce adequate results. The Natural Language Interface and the query and result analyzers are written in MPROLOG on a SUN workstation, whereas the database software module is written in C with embedded SQL commands for retrieving the data managed by ORACLE Relational DBMS. Examples of data modeling and the system organization are discussed and some sample runs are illustrated. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet or ncgur@uccmvsa.ucop.edu Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.