Information Retrieval List Digest 123 (August 3, 1992) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-123 ========================================================================= Date: Mon, 3 Aug 1992 11:29:37 PST Reply-To: "Information Retrieval List" Sender: "Information Retrieval List" From: IRLIST Subject: IR-L Digest, Vol.IX,No.27,Issue 123 IRLIST Digest August 3, 1992 Volume IX, Number 27 Issue 123 ********************************************************** I. NOTICES C. Miscellaneous 1. New Information Visualization Tools 2. IR-L Policy Questions IV. PROJECT WORK A. Initiatives and Proposals 1. Call for Statement of Interest and Experience: Capture and Storage of Electronic Theses and Dissertations C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.C.1. Fr: Ben Shneiderman Re: New Information Visualization Tools The first of a new generation of information visualization tools is available from the Human-Computer Interaction Laboratory at the University of Maryland. After 2 years of development the algorithms that permit color presentation of hierarchical structures have now been converted into a program for viewing all of the files and directories on a Macintosh hard disk. Each file appears as a rectangle whose size is proportional to the file size, enabling users to spot large files at any level in the hierarchy. TreeViz uses color to show file type, e.g. text, picture, application, etc. By pointing and clicking a rectangle, TreeViz users can bring up detailed information about nodes such as filename, path, creation date, etc. Other options include sound, which offers an additional dimension of data revelation. Users can hear the directories and files as they are displayed or hear patterns by dragging the mouse. Various scaling factors, nesting offsets, depth controls, shape adjustments, shading, and size controls complete the list of TreeViz features. TreeViz features. The original concept for TreeViz was developed by Dr. Ben Shneiderman, Professor of Computer Science, in response to the common problem of a filled hard disk. Since the hard disk in the HCIL was shared by 14 users it was difficult to determine how and where space was used. Finding large files that could be deleted, or even determining which users consumed the largest shares of disk space were difficult tasks. Finding an effective visualization strategy took only a few months but producing a working piece of software took over a year. Brian Johnson implemented the algorithms and refined the presentation strategies while preserving rapid performance even with 5,000 node hierarchies. The TreeViz application runs on all color Macintosh models. TreeViz Orders Office of Technology Liaison 4312 Knox Road University of Maryland College Park, MD 20742. (301) 405-4208 FAX: (301) 314-9871 Related Technical Papers: Brian Johnson and Ben Shneiderman. Tree-maps: A Space-Filling Approach to the Visualization of Hierarchical Information Structures. Proc. IEEE Visualization'91 (San Diego, California, October 1991), 284-291. Ben Shneiderman. Tree visualization with Tree-maps: A 2-d space-filling approach. ACM Transaction on Graphics (11)1 (January 1992), 92-99. Dave Turo and Brian Johnson. Improving the Visualization of Hierarchies with Treemaps: Design Issues and Experimentation. May 1992, CAR-TR-626, CS-TR-2901, Department of Computer Science, University of Maryland, to appear in IEEE Visualization '92. ********** I.C.2. Fr: Nancy Gusack, IR-L Moderator Re: IR-L Policy Questions In IR-L Digest Volume IX, Number 12, Issue 108 we published an announcement for the 13th National Computer Conference and Exhibition in Riyadh, Saudi Arabia. One of our subscribers subsequently pointed out that the Saudi government does not grant visas to Jews, and suggested that the inclusion of the conference announcement was inappropriate. It was not our intent to offend anyone. We try to keep IR-L a low-overhead, timely means of distributing material of interest to the IR community. Our editorial policies have reflected this: We filter material primarily on the basis of relevance and interest to our community, and have never investigated the politics or policies of any contributing or participating nation or organization. The decision to publish or not to publish a submission does not imply endorsement of any politics or policies. Certain types of submissions are considered inappropriate to IR-L: We receive book advertisements, and try to share the information yet remove the enterprise aspect. This way we can keep readers current, without seeming to endorse the purchase of any material. We moderate, rather than edit the submissions to IR-L Digest. Some of our subscribers may be dissatisfied with our moderating policies; and we suffer occasional doubts too. Please send us your opinions on this subject: How far should our moderating powers take us? How should we define or redefine our publication policies? We will publish the feedback, or dialog if such develops, in subsequent issues of IR-L Digest. ********************************************************** IV. PROJECT WORK IV.A.1. Fr: Joan K. Lippincott Re: Call for Statements of Interest and Experience INTRODUCTION: The following is a Call for Statement of Interest and Experience prepared and distributed by the secretariat of the Coalition for Networked Information to announce an initiative in a manner that promotes the widest and fairest possible identification of individuals, institutions, or organizations that are willing and able to contribute to that initiative. The Coalition encourages you to read an reflect upon the following Call and then to contact the person identified below if you are interested in the subject initiative, if you have a contribution that you are willing to make to the subject initiative, and if you would like to discuss this interest and contribution with someone associated with the subject initiative. PROJECT: The Capture and Storage of Electronic Theses and Dissertations. DESCRIPTION: The Coalition for Networked Information, Virginia Polytechnic Institute and State University, the Council of Graduate Schools, and University Microfilms International (UMI) seek your assistance in developing a pilot project in which graduate students will write their theses and dissertations in electronic format to be either stored locally or be submitted in electronic format to University Microfilms International for future distribution over networks. The work of this group will be directed toward the selection and testing of common software and standards for the writing of theses and dissertations, and the requirements for their storage and retrieval from an Internet server. Participants will also participate in discussions of issues of access, copyright, and usage fees in the networked information environment. The intention of this project is to improve the storage of and access to information in theses and dissertations, to acquaint future scholars with publishing electronically, to increase the amount of scholarly information on networks, and to foster development of new products and services which will evolve from electronic theses and dissertations. REQUIREMENTS: Institutions and organizations who are interested in this project are encouraged to contact the person identified below (a) to state their interest in the project, and (b) to briefly describe the relevant experience that they have regarding the purposes and outcomes of the project. The Coalition and its partners in this project are interested in having three representatives from each institution involved with the project: one representing the graduate school, one representing the library, and a third with specific knowledge of text, image, and graphics storage and retrieval and representing institution-wide information technologies. The Coalition and UMI are prepared, if necessary, to defray some of the travel and other expenses incurred by the institutions and organizations who are selected to participate in this process. Please submit a Statement of Interest and Experience for your institution or organization to the person identified below on or before July 31. CONTACT: Paul Evan Peters Director Coalition for Networked Information 1527 New Hampshire Avenue NW Washington, DC 20036 Voice: 202-232-2466 Fax: 202-462-7849 Internet: paul@cni.org BACKGROUND INFORMATION: The Coalition for Networked Information, a joint project of the Association for Research Libraries, CAUSE, and EDUCOM was founded in March, 1990 to promote the creation of and access to information resources in networked environments in order to enrich scholarship and to enhance intellectual productivity. Currently 167 organizations and institutions belong to the Coalition Task Force, a group of institutions and organizations that make special contributions to the Coalition's projects and activities. Included in the Task Force membership are higher education institutions, publishers, network service providers, computer hardware and system companies, library networks and organizations, and public and state libraries. Periodically the Coalition issues Calls for Statements of Interest and Experience as a vehicle for announcing initiatives in a manner that promotes the widest and fairest possible identification of individuals, institutions, or organizations that are willing and able to contribute to those initiatives. Each Call provides a description of the proposed project or study and most include supporting documents. Individuals, institutions, and organizations are encouraged to respond to the Call by submitting (1) a brief (one to two page) statement of the reasons for their particular interest in the project or study, and, if appropriate, a description of how they would pursue the objectives of the project or study, and (2) information about themselves and/or their organization or institution that describes the relevant background and expertise that they have to contribute to the purposes and outcomes of the project. Calls for Statements of Interest and Experience are not requests for proposals (RFPs) and are not intended to solicit detailed, formal documents as responses. Individuals , organizations, and institutions do not have to be affiliated with the Coalition or members of the Coalition Task Force to respond to the Calls. Statements of interest and experience are reviewed by relevant Coalition Working Group leaders with the support of the Coalition staff and guidance from the Coalition Steering Committee. Other parties may be involved as explained by an individual Call. Additional information is sometimes requested during this review process. Reviews of statements of interest and experience are carried out in as expeditious and as flexible a fashion as possible, taking care to balance the benefits of a wide and fair search for individuals, institutions, or organizations willing and able to contribute to Coalition initiatives with the benefits of focused and timely action by those intiatives. Upon selection of individuals and/or groups to contribute to a given project or study, all those who submitted a response to the Call are notified of the outcome by the Coalition secretariat. ********** IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG91-29755. AU MILLER, TODD MARTIN. TI REACTION CLASSIFICATION FROM COMPUTER DATABASES AND THEIR RETRIEVAL FOR THE SYNTHESIS GENERATION PROGRAM. IN Brandeis University Ph.D. 1991, 158 pages. SO DAI V52(06), SecB, pp3071. DE Chemistry, Organic. AB The thesis illustrates that the integration of a synthesis generator program and databases of chemical reactions is a practical idea. Retrieve, a computer program, is described which translates widely used reaction databases such as Reaccs and Synlib into the same terms as are used by the Syngen program. The Syngen program was developed to create a synthesis from a basis of chemical logic to produce synthetic routes which are "theoretically reasonable" but not tied to specific literature examples. Routes that were previously just "paper chemistry" in Syngen are now linked to concrete examples of closely matching reactions from the classified databases. The same logical basis is used in Retrieve again to classify refunctionalization reactions in the databases as well. For both types of reactions of comprehensive reaction retrieval interface was also developed and is described. Within the work database reactions are subdivided into constructions and refunctionalizations by the net structural change. These reactions then provide distinct reaction types which are organized by successive layers of definition for retrieval of analogous successive layers of definition for retrieval of analogous precedents. The result is an interface to over 27,000 construction reaction precedents for Syngen and another 53,000 refunctionalizations accessible by searching with the Retrieve program. Syngen now embodies both the logic of synthesis design and empirical reaction expertise from the literature to automatically provide computer generated synthesis routes with examples of current, analogous reactions from the literature. The classification overlaid upon the databases also has the benefit of expressing the content of the database in terms of generalized reaction types. The resulting programs represent the integration of synthesis design and reaction databases. AN University Microfilms Order Number ADGD--93744. AU NKWENTI-AZEH, BLAISE. TI AN INVESTIGATION INTO THE STRUCTURE OF THE TERMINOLOGICAL INFORMATION CONTAINED IN SPECIAL LANGUAGE DEFINITIONS. (VOLUMES I AND II). IN Univ. of Manchester Inst. of Science and Technology (United Kingdom) Ph.D. 1989, 510 pages. SO DAI V52(06), SecB, pp3153. DE Computer Science. Artificial Intelligence. Information Science. AB Available from UMI in association with The British Library. Requires signed TDF. This thesis is concerned with a description of the structure of special subject definitions, with a view to the representation of specialised terminology for a diverse range of computational applications. The fundamental premise on which the study is based is that special language definitions contain information about the knowledge structure of special subjects; a formalisation of this information would, besides enabling a better understanding of the conceptual structure of each field, provide a new type of information based on inferences about the relationships between the defined term and the defining terms. The thesis falls into three major parts. The first part presents an overview of some theories of concept and term description and reviews the differences that have been posited between general language words and special language terms. The second part of the thesis describes in some detail the existing conventional and computational approaches and tools for representing lexicographic and terminological information, and assesses their potential adequacy for representing relationships between special language concepts. The final part of the thesis contains results of statistical and other analyses carried out on a corpus of approximately 1,000 definitions of standardised Data Processing terms. A partial formalisation of the terminological content of these definitions is presented, to demonstrate the types of formalisation that may be required. The implications of the formalisation for retrieval are outlined and an indication is given of possible areas of application for the methodology and results of the study. AN University Microfilms Order Number ADG91-34704. AU WANG, TSONG-LI. TI QUERY OPTIMIZATION IN DATABASE AND INFORMATION RETRIEVAL SYSTEMS. IN New York University Ph.D. 1991, 168 pages. SO DAI V52(06), SecB, pp3159. DE Computer Science. AB Recently, several prototype and commercial systems based on a loosely-coupled shared-nothing architecture have been proposed and built for database applications. To achieve speed-ups proportional to the number of processors for operations such as selections and joins, such systems often distribute data across storage units using a hashing function. In the first part of this thesis, we investigate ways of minimizing response time for various multi-join queries in such systems. We develop a dynamic programming algorithm for queries whose closures are chains. We next prove the NP-completeness for more general queries and propose four heuristics for them. We then evaluate experimentally the relative performance of these heuristics and their performance relative to optimums. The empirical results show that a hybrid heuristic combining our chain algorithm with a heuristic related to Kruskal's spanning tree algorithm performs well. In the second part of the thesis, we present a scheme to answer best-match queries from a file containing a collection of objects. A best-match query is to find the objects in the file which are closest (according to some (dis)similarity measure) to a given target. Previous work suggested that one can reduce the computational effort required to achieve the desired results using the triangle inequality when starting with a data structure for the file which reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. We then extend our scheme to a class of queries for retrieving similar or dissimilar objects that commonly arise in vision and molecular biology. Artificial data and actual protein sequences are used to illustrate the effectiveness of our scheme for different queries, and to compare its performance with previous algorithms. Finally, we implement our techniques into a tree information system that enables users to retrieve and extract information from trees based on approximate comparison. We expect this system to have applications in pattern recognition, biology, linguistics, and programming languages. The system is implemented in C and X-windows, and is fully operational on SUN workstations. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOG ***, where *** is the month and day on which the issue was mailed, to LISTSERV@UCCVMA.BITNET. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.