Information Retrieval List Digest 196 (January 17, 1994) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-196 IRLIST Digest ISSN 1064-6965 January 17, 1994 Volume XI, Number 3 Issue 196 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. ASIS Mid-Year '94: Advance Notice C. Miscellaneous 1. Optical Disc Archive System/Satellite Images II. QUERIES A. Requests for Information 1. IR Textbooks 2. Data Compression IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.A.1. Fr: American Society for Information Science Re: ASIS - Mid-Year Meeting Info -- ADVANCE NOTICE -- - ASIS CONTINUING EDUCATION OFFERINGS - - AT THE - - 1994 MID-YEAR MEETING - - NAVIGATING THE NETWORKS- - PORTLAND, OREGON - - MAY 21-25,1994 - FOR REGISTRATION INFORMATION -- Call 301/495-0900 or send e-mail to asis@cni.org NOTE: All ASIS members will receive a Conference program and registration form in the mail. Saturday, May 21, 1994 NEW! * BUILDING AN ELECTRONIC NETWORK INFORMATION CENTER: IMPLEMENTING NETWORK TOOLS (Alan Emtage) * CROSSING THE INTERNET THRESHOLD (Roy Tennant) * TELECOMMUNICATIONS FOR DATA TRANSFER AND MANAGEMENT (Jim Rush) Sunday, May 22, 1994 NEW! * BEYOND THE INTERNET THRESHOLD: RESOURCE TOOLS (Roy Tennant) NEW! * LIBRARY AUTOMATION SOFTWARE, SYSTEMS, AND SERVICES: AN UPDATE ON AVAILABLE RESOURCES (Pamela R. Cibbarelli) NEW! * MANAGEMENT AND PRIVACY ISSUES FOR INTERNET SERVICE PROVIDERS (Alan Emtage) NEW! * PRACTICAL INDEXING (Jessica Milstead) NEW! * ENTERTAINMENT TECHNOLOGY AND INFORMATION SERVICES (Tom Kinney) ********** I.C.1. Fr: Nick Kew Re: Optical Disc Archive System/Satellite Images Optical Disc Archive of Satellite Images Optical Disc Archive System (ODAS). *** THIS IS NOT AN OFFICIAL ANNOUNCEMENT *** Hello, I have just joined this list, on the basis of the "ListofLists" description. In case members may be interested, I am posting a brief description of my work on an Optical Disc Archive System, and its context. Further information is available by request. ********************************************************************* ESRIN is the data handling centre of the European Space Agency (ESA). Its functions include the archiving, processing and distribution of satellite images from ESA's and partner organisations missions. In general, several versions of each image are stored. These include different levels of processing, and "quicklook" or "browse" products. The latter are GIF format files, typically of the order of 100K. The full images may in some cases be 100Mb for a single image. ESRIN currently uses an optical disc jukebox for physical storage. This was supplied with low-level driver software and a UNIX-emulation filesystem. Archive management software has been developed at ESRIN, providing interactive and batch access to the archive and a dynamic map thereof, and implementing a full system of protections and privileges. Access to the archive is limited by the hardware (OD mount times, speed of read/write), but the software is configured to give priority to quick and/or privileged requests. The ODAS system is now operational at ESRIN. At the time of writing, the archive includes CZCS (Coastal Zone Color Scanner) and AVHRR (Advanced Very High Resolution Radiometer) products, from the NIMBUS and NOAA satellites respectively. It is planned to bring further product types into the ODA in the near future. The archive currently occupies about 130 9Gb optical discs (not all full) and is growing. Conceptually, the archive extends beyond the jukebox. In future, further hardware and different archive media may be introduced, although this is by no means certain. Limited access to the Archive is available by anonymous ftp. Work in progress is developing much improved facilities for general users. --------------------------------------------------------------------- If anyone is interested, or is involved in similar or related work in which I might be interested, I should be pleased to hear from you. Nick Kew nick@mail.esrin.esa.it ********************************************************** II. QUERIES II.A.1. Fr: Stephen Quirolgico Re: Seeking Good Textbooks Can anyone suggest some good information retrieval textbooks that provide a good foundation of the theory and computation aspects of info retrieval? I have looked at some texts, and they seem to be very non-technical and geared towards library science. Are there any other, more technical books available? Thanks, Stephen Quirolgico ********** II.A.2. Fr: Nick Kew Re: Data Compression I have recently developed archive management software, for an optical disc archive of satellite images. I am now considering data compression for the Archive, and whether we can do better than UNIX "compress": * Files of the order of 100 Mb (uncompressed) * Access times hardware-limited, so unpacking speed unlikely to be critical. * No requirement for "zcat" or equivalent. * Must be reliable I have a mathematical background including some information theory. However, I have no easy access to an academic library or bookshop, so references could be a problem if not available online. (Is there a real online general library yet? Please tell me?) Thank you, Nick Kew nick@mail.esrin.esa.it ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG92-32964. AU VADAPARTY, KUMAR V. TI QUERYING DATABASES WITH NON-DETERMINISTIC INFORMATION. IN Rutgers The State University of New Jersey - New Brunswick Ph.D. 1992, 224 pages. SO DAI V53(06), SecB, pp2995. DE Computer Science. AB An important requirement of the advanced database applications involving planning, designing, scheduling, etc., is the ability to store and reason with "choices" or non-determinism such as "Part #12 can be implemented using either nickel or cobalt," "The conference will be in either NYC or Boston," etc. Most of the current database formalisms do not have this capability. This thesis develops the necessary theory that enables storing non-deterministic data possibly associated with constraints, and evaluating ad hoc queries on such data. First, it develops a data model and a query language to store and reason with non-deterministic data. Second, it addresses the safety issues of the data model and the data complexity and the expressibility of the query language. Next, it identifies a number of database and query language parameters, and shows how the database administrator can use these parameters to restrict the data complexity to PTlME, coNP, different rungs of the exponential hierarchy, etc. Finally, it extends the data model to represent constraints as well, and formalizes a notion of redundant choices and examines the complexity of detecting redundant choices. It also develops algorithms that detect and delete redundant choices. AN University Microfilms Order Number ADG92-30569. AU WAISANEN, ANTHONY. TI A MULTI-PARADIGM APPROACH TO QUERY PROCESSING: INTEGRATING STRUCTURAL, BEHAVIORAL, AND HISTORICAL KNOWLEDGE. IN George Mason University Ph.D. 1992, 132 pages. SO DAI V53(06), SecB, pp2995. DE Computer Science. Engineering, System Science. Information Science. AB This dissertation describes the general problem of integrating structural, behavioral, and historical knowledge sources in terms of the specific problem of processing access queries. These queries are presumed to be submitted to a federation of semantically heterogeneous, physically distributed, autonomous database systems. The processing of these queries is controlled by a centralized processor. The component databases in the federation provide the centralized processor with structural and behavioral knowledge, views of their respective schemas and statistical profiles (such as the number of tuples expected to be retrieved during a SELECT operation, the cardinality of each relation fragment, the size of each attribute in each relation fragment, and the number of distinct values for each attribute in each relation fragment). These views and profiles, however, may be inaccurate at a given moment since the component databases are not tightly coupled with the federation-level query processing system. Thus, an approach which generates access strategies solely on the basis of schemas and statistical profiles risks the possibility of making erroneous calculations. In our approach, access strategies are generated by using multiple paradigms which cooperate throughout the reasoning process. The views of the data dictionaries and statistical profiles of the component databases are used in conjunction with histories of previously executed queries and query fragments. We show how our approach of using multiple knowledge sources and multiple, cooperating systems avoids regenerating inefficient or costly access strategies even in the presence of incomplete or inaccurate information. An architecture for the system that uses this approach is described in detail. Various approaches to query processing are presented and are contrasted with our approach. We also show how our approach may be applied to a related problem, fault diagnosis in long-distance, telecommunications networks. AN University Microfilms Order Number ADG92-29682. AU KIM, YANGWOO. TI DESIGN OF OPTICAL PATTERN MATCHER FOR VERY LARGE FULL-TEXT INFORMATION RETRIEVAL SYSTEM. IN Syracuse University Ph.D. 1992, 181 pages. SO DAI V53(06), SecB, pp3069. DE Engineering, Electronics and Electrical. Information Science. AB Processing of very large unformatted databases or information retrieval requires very large secondary storage, very high I/O bandwidth, and massive parallelism. However, the processing of such large unformatted databases in conventional electronic computers becomes I/O as well as compute bounded. In this research, optics which is known to have inherent parallelism, very high bandwidth, and the noninterfering propagation property is examined as a possible solution to the problem. This research is divided into three parts. The first part is devoted to identifying the problems and studying optical factors related to information retrieval such as optical storage and optical digital data processing. In the second part, the role and impact of optics on information retrieval are studied, and various optical techniques are examined and applied in order to improve the performance of information retrieval systems. Finally the last part consists of a design and evaluation of an optoelectronic hybrid full-text information retrieval system that handles the data in optical form from storage to processing. The design called Optoelectronic Full Text Retrieval System (OPTORETRIEV) includes a two-dimensional optical pattern matcher containing an optical disk based photorefractive joint transform correlator. The design is an optoelectronic hybrid type in order to take advantage of the inherent parallelism and very high bandwidth of optics as well as the advantage of proven reliability and complexity of electronic systems. Detailed design and development of algorithms for various full-text search operations, and performance evaluation of OPTORETRIEV are given. It is estimated that the system can perform pattern matching at a rate which is two or more orders of magnitude faster than current electronic systems, depending on the type of search operation. These preliminary results indicate that significant improvement in performance can be achieved by incorporating optical techniques in information retrieval provided that suitable optical hardware were available. AN University Microfilms Order Number ADG92-30755. AU LI, LIYA. TI A HISTORY OF CHINESE LIBRARY CLASSIFICATION: 1949-1991. IN Southern Illinois University at Carbondale Ph.D. 1992, 132 pages. SO DAI V53(06), SecA, pp1707. DE Information Science. Library Science. AB As the theoretical foundation of Chinese library classification systems, "san xing" had been the focus of study among Chinese library researchers ever since its creation in the early 1950s. In the past, the emphasis in the research was always on the ideological aspect. No one studied the internal relationships among its three components and their influence on the structure and organizations of the Chinese library classification systems created after 1949. By tracing the origin, development and the application of the theory "san xing" in the four Chinese library classification systems, this researcher studied and assessed its influence on the formation of the Library Classification of the People's University of China (LCPUC), the Library Classification for Small-and-Medium-Sized Libraries (LCS), the Library Classification of Chinese Academy of Sciences (LCCAS), and the Chinese Library Classification (CLC). An evaluation of the validity of "san xing" in serving as the theoretical foundation of the four systems was an additional focus of her study. The study was conducted from a historical perspective, because the historical method provided the best approach in probing into the causes for the creation and development of the theory "san xing," in helping her analyze the internal relationships among its three components, in revealing its nature, and in predicting the trend of the development of this theory. A review of related literature was conducted by examining 56 research articles concerning the problems of this study. Analysis of data was accomplished by studying the original copies of the 1980 edition of the LCPUC and the 1980 and 1990 editions of the CLC. Earlier editions of the LCPUC, the LCS, the LCCAS, and the CLC were also examined and analyzed through secondary sources. This researcher found that the four Chinese library classification systems were heavily influenced by Mao Zedong's political and cultural philosophy and that such political influence caused practical problems and instability of the four Chinese library classification systems. The findings also revealed that "san xing," the theoretical foundation of the four Chinese library classification systems, was the product of political needs and was not valid as a theory from either a theoretical or a practical point of view. AN University Microfilms Order Number ADGMM-63229. AU PAULLEY, GLENN N. TI INFORMATION RETRIEVAL USING SIGNATURES IN AN OFFICE ENVIRONMENT. IN The University of Manitoba (Canada) M.Sc. 1990, 174 pages. SO MAI V30(04) pp940. DE Information Science. Computer Science. Business Administration, General. IS ISBN: 0-315-63229-1. AB An implementation of an information retrieval application for an office environment is described. The application uses a relatively new signature method developed by Faloutsos (Doubly-Compressed Bit Slices, or DCBS). DCBS was originally intended for use with Optical Disk technology, but may also be used with standard magnetic media. The implementation of the retrieval application is discussed from a number of perspectives, including a comparison of known signature techniques and implementation trade-offs. Characteristics of the office environment are discussed, and comparisons are made between theoretical search results and actual results in the model office. An evaluation of the use of this application in the model office shows a high degree of success in meeting the requirements of its users. Finally, ideas for future enhancements are presented along with topics for future research. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: NCG@UCCMVSA.UCOP.EDU Send submissions to IRLIST to: NCG@UCCMVSA.UCOP.EDU Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncg@uccmvsa.ucop.edu or nancy.gusack@dla.ucop.edu Mary Engle mee@uccmvsa.ucop.edu or mary.engle@dla.ucop.edu The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.