Information Retrieval List Digest 142 (December 16, 1992) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-142 IRLIST Digest ISSN 1064-6965 December 16, 1992 Volume IX, Number 46 Issue 142 ********************************************************** IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG92-17472. AU BOOKMAN, LAWRENCE ALAN. TI A TWO-TIER MODEL OF SEMANTIC MEMORY FOR TEXT COMPREHENSION. IN Brandeis University Ph.D. 1992, 293 pages. SO DAI V53(01), SecB, pp387. DE Computer Science. AB How can the background knowledge associated with our everyday concepts be represented in a machine and how can we obtain and encode such information. This thesis presents a new architecture for semantic memory that provides a framework for addressing the "background-knowledge" problem and discusses the implications of this architecture for a model of text comprehension. Semantic memory consists of two tiers: a relational tier that represents the underlying structure of our cognitive world expressed as a set of dependency relationships between concepts, and an analog semantic feature (ASF) tier that represents the common or shared knowledge about the concepts in the relational tier, expressed as a set of statistical associations. I present an information theoretic approach to automatically acquiring and encoding this knowledge from on-line text corpora. In this approach the background knowledge common to a community is encoded using a finite vocabulary of ASFs. The ASFs used are based on the category structure of a thesaurus. The two levels of semantic memory support two complementary views of comprehension. One view, the "fine-grain" view, captures the many details of interaction between context and world knowledge as time-trajectories through concept space. This view permits a deeper understanding of a text. A second view, the "coarse-grain" view, captures in the form of a weighted semantic graph called an interpretation graph, a set of explicit semantic relationships that can be used to reason about the understanding of a text, which includes the ability to summarize the text and extract what is important. This view corresponds to a shallow understanding of the text. Several computational techniques are presented for comparing at two levels--the relational and the ASF--the underlying similarity of two passages. The techniques developed are embodied in two computer programs--LeMICON, a structured connectionist implementation, and SSS, a symbolic implementation--designed to explore the system's comprehension of 16 short texts from the stock market domain. The thesis describes an architecture and a mode of processing in which memory is dynamic, exhibits hysteresis effects, and emphasizes what is new about the effect of a given input on the knowledge represented there. AN University Microfilms Order Number ADG92-16026. AU KNIGHT, KEVIN CRAWFORD. TI INTEGRATING KNOWLEDGE ACQUISITION AND LANGUAGE ACQUISITION. IN Carnegie-Mellon University Ph.D. 1991, 115 pages. SO DAI V53(01), SecB, pp393. DE Computer Science. Artificial Intelligence. Language, Linguistics. AB Very large knowledge bases (KB's) constitute an important step for artificial intelligence and will have significant effects on the field of natural language processing. This thesis addresses the problem of effectively acquiring two large bodies of formalized knowledge: knowledge about the world (a KB), and knowledge about words (a lexicon). The central observation is that these two bodies of knowledge are highly redundant. For example, the syntactic behavior of a noun (or a verb) is highly correlated with certain physical properties of the object (or event) to which it refers. It should be possible to take advantage of this type of redundancy in order to greatly reduce both the time and expertise required to build large KB's and lexicons. This thesis describes LUKE, a software tool that allows a knowledge base builder to create an English language interface by associating words and phrases with KB entities. LUKE assumes no linguistic expertise on the part of the user, because that expertise is built directly into the tool itself. LUKE draws its power from a large set of heuristics about how words are typically used to describe the world. These heuristics exploit the redundancy between linguistic and world knowledge. When a word or phrase is associated with some KB entity, LUKE is able to accurately guess features of the word based on features of the KB entity. LUKE can also hypothesize new words and word senses based on the existence of others. All of LUKE's hypotheses are displayed to the user for verification, using a format designed to tap the user's basic linguistic intuitions. LUKE stores its lexicon in the KB. Truth maintenance links ensure that changes in the KB are automatically propagated to the lexicon. LUKE compiles lexical entries into data structures convenient for natural language parsing and generation programs. Lexicons acquired by LUKE have been used by KBNL, a knowledge-based natural language system, for applications in information retrieval, machine translation, and KB navigation. This work identifies several dozen heuristics that encode redundancies between linguistic representations and representations of world knowledge. It also demonstrates the usefulness of these heuristics in a working lexical acquisition system. AN University Microfilms Order Number ADG92-17860. AU MUMICK, INDERPAL SINGH. TI QUERY OPTIMIZATION IN DEDUCTIVE AND RELATIONAL DATABASES. IN Stanford University Ph.D. 1991, 239 pages. SO DAI V53(01), SecB, pp397. DE Computer Science. Artificial Intelligence. AB Optimization is critical to the success of declarative database systems. We develop a powerful extended magic-sets transformation (EMST) for optimization of complex queries in relational and deductive database systems. EMST works by rewriting database queries so that predicates are applied as early as possible (predicate push down) during a bottom-up evaluation of the rewritten query. The magic-sets transformation has earlier been proposed as an optimization technique for recursive queries in deductive databases. We strengthen the technique into an invaluable optimization for all types of queries in practical database systems. (1) The magic-sets transformation can only use equality predicates to restrict computation. We develop the ground magic-sets transformation to push down arbitrary built-in predicates, such as Salary $>$ 70K. (2) The magic-sets transformation is not applicable in presence of duplicates and aggregates supported by practical query languages such as SQL. We define formal semantics for a language with duplicates, aggregates, and recursion, and define EMST for such a language. (3) We demonstrate the importance of EMST on nonrecursive queries through performance experiments on IBM's DB2 database system. We compare EMST with correlation, a traditional SQL optimization technique for pushing down predicates in nonrecursive queries. The conclusion is that EMST is a stable transformation and should replace correlation. A subset of the extended magic-sets transformation has been implemented in the Starburst extensible database system being developed at the IBM Almaden Research Center. The extended magic-sets transformation furthers the state of the art in query optimization for deductive and relational databases. AN University Microfilms Order Number ADG92-16394. AU SEMMEL, RALPH D. TI A KNOWLEDGE-BASED APPROACH TO AUTOMATED QUERY FORMULATION. IN University of Maryland Baltimore County Ph.D. 1992, 368 pages. SO DAI V53(01), SecB, pp398. DE Computer Science. Artificial Intelligence. AB Formulating queries over a relational database is a complex activity requiring extensive syntactic and semantic knowledge. Syntactic knowledge is required to ensure that a query is well-formed and references existing relations and attributes. Semantic knowledge is required to ensure that a query satisfies user intent. While a user often has a general understanding of the contents of the database, design decisions to partition and store data in optimal ways may conflict with the user's intuition or expectations. Consequently, database specialists are needed to formulate queries, and ad hoc access to the database is impeded. This dissertation presents an approach to automated query formulation that significantly reduces the amount of knowledge required by database users. In particular, database design knowledge is used to formulate queries automatically in response to high-level requests that specify only attributes of interest. By using knowledge of the target query language, syntactically correct queries are assured. Similarly, by using a knowledge-rich conceptual model of the database, the most likely intent of a request can be inferred. This dissertation makes five main contributions. First, it demonstrates how database design knowledge can be used to automate query formulation. Specifically, the benefits of using a knowledge-rich conceptual schema based on the Entity-Relationship (ER) model are presented. Second, it introduces contexts as a means for organizing database design knowledge into overlapping conceptual schema subgraphs that correspond to relations which can be natural joined in a lossless way. By using ER graph dependency knowledge, contexts can be generated automatically. Moreover, once design knowledge has been organized into contexts, formulating queries is straightforward. Third, the dissertation demonstrates that contexts can be used as a database design tool to identify semantic inconsistencies in a conceptual schema. These inconsistencies can be reduced via an iterative process of conceptual schema refinement and context adaptation. Fourth, it establishes how ER graph properties and transformation knowledge can be used for semantic query optimization. Resultant queries will use the fewest relations possible. Finally, it illustrates how contexts and design knowledge can be used to construct intelligent database interfaces. AN University Microfilms Order Number ADG92-17458. AU MENDRINOS, ROXANNE BAXTER. TI APPLICATIONS OF CD-ROM TECHNOLOGY FOR REFERENCE PURPOSES: A SURVEY OF SECONDARY SCHOOL LIBRARY MEDIA SPECIALISTS IN PENNSYLVANIA AND MAINE. IN Boston College Ph.D. 1992, 284 pages. SO DAI V53(01), SecA, pp128. DE Education, Technology. Education, Secondary. Library Science. AB CD-ROM's entry into the secondary school library media center has been referred to as the silent revolution. CD-ROM is a 4.75 inch laser disk capable of holding 250,000 pages of information or 550 megabytes which translates into thirty 20 megabyte hard disk drives, 2,000 high resolution pictures, or 74 minutes of high fidelity sound in any combination. Information on a compact disk is in digital form and can be read and printed directly from the optical disk to the computer and its printer. CD-ROM databases are rapidly appearing in traditionally print-oriented school library media centers requiring school library media specialists to reevaluate practices of information retrieval. There is no empirical research on the applications of CD-ROM as a reference tool within the environment of secondary school library media centers. There is a knowledge void on how the introduction of CD-ROM based data alters reference services in secondary school library media centers. This study has implications for (a) the physical layout of the school library media center, (b) services that are provided by trained staff and (c) services that the library user can conduct on their own. The purpose of this dissertation research is to establish a baseline of data relating to the use of CD-ROM databases for reference purposes in secondary school library media centers. This use will be examined with respect to the knowledge, experience, and attitudes of school library media specialists as well as selected demographic and financial data. The information from this study will allow school library media specialists and administrators contemplating the use of CD-ROM technology: (a) to learn from the experiences of others; (b) to identify the most widely used CD-ROM laser disks; (c) to examine the implications of budget, and staffing on CD-ROM use; (d) to compare patterns of use within the curriculum; (e) to compare services offered by trained staff; (f) to provide data on security issues and (g) to compare methods of evaluation. AN University Microfilms Order Number ADG92-17847. AU LEHMANN, HAROLD PHILIP. TI A BAYESIAN COMPUTER-BASED APPROACH TO THE PHYSICIAN'S USE OF THE CLINICAL RESEARCH LITERATURE. IN Stanford University Ph.D. 1992, 287 pages. SO DAI V53(01), SecB, pp195. DE Health Sciences, Medicine and Surgery. Biology, Biostatistics. Computer Science. Artificial Intelligence. AB To date, automated statistical methods used to help physicians use the clinical research literature for making clinical decisions have been limited in the degree to which they can represent methodological and domain concepts that are crucial to the physician who must take clinical action. In this dissertation, I consider the thesis that Bayesian decision theory can provide the foundation for a computer-based environment that helps physicians to use the research literature. On the basis of a knowledge-level analysis of this problem, I argue for the use of Bayesian statistics over classical statistics. To show that this new paradigm can be implemented in a functioning computer system, I have developed a prototype system, called scTHOMAS, that enables a physician to read a research report, to incorporate her domain knowledge and methodological concerns, and to evaluate their impact on the clinical significance of the conclusion. The system effectively automates the Confidence Profile Method of Eddy, Hasselblad, and Shachter $\sim$ (1991). scTHOMAS operates in the domain of randomized clinical trials that compare the effects of different drugs on a patients' survival. To incorporate any methodological concern, scTHOMAS (1) requires a statistical submodel for the concern, and (2) requires a visual metaphor though which the physician can communicate the particular concern. scTHOMAS contains submodels for the methodological concerns of loss to followup, withdrawal, noncompliance, crossing-over, and measurement unreliability. The system uses the visual metaphor of the patient-flow diagram for physician input. In the course of each consultation, the user implicitly constructs a statistical model. Statistical models are represented as hierarchical, typed influence diagrams, a structure that limits the interactions among parameters in a statistical model. Prespecified construction steps dictate how the primitive methodological submodels are pieced together. A metadata-state diagram, containing basic methodological knowledge assessed from a statistical expert and from the methodological literature, limits the sequence of construction steps the user is allowed. This dissertation puts on the medical-informatics agenda the question of how physicians should act on the basis of research data, and suggests novel methods for storing, using, and retrieving the contents of the biomedical research literature. AN University Microfilms Order Number ADG92-15572. AU JOHNSON, ROBERT RALPH. TI RHETORIC AND USE: TOWARD A THEORY OF USER-CENTERED COMPUTER DOCUMENTATION. IN Purdue University Ph.D. 1991, 309 pages. SO DAI V53(01), SecA, pp135. DE Language, General. Computer Science. Psychology, Industrial. AB This study investigates the nature of user-centered print and online computer documentation, and draws from the disciplines of rhetoric and human factors to argue that print and online user documents are part of a discourse complex that is situationally constrained. A theoretical framework, called the discourse complex of user-centered computer documentation, is offered as a foundation for analyzing user documents within a variety of situational constraints. Specifically, this study focuses on the situational activity of users within their active situations of learning through doing and doing. Theories of human activity from human activity design research are used to develop a vocabulary for analyzing both print and online user documents within these active situations. An analysis is carried out using this vocabulary. In addition, the problems of literacy are addressed, and it is suggested that print and online user documentation will help to redefine the parameters of literacy in the years to come. AN University Microfilms Order Number ADGNN-66124. AU SPACEK, RICHARD ADAM MIROSLAV. TI STAGE ACTION IN TUDOR AND STUART DRAMA: ANALYSIS AND CLASSIFICATION. IN The University of New Brunswick (Canada) Ph.D. 1990, 266 pages. SO DAI V53(01), SecA, pp23. DE Theater. Literature, English. IS ISBN: 0-315-66124-0. AB The interpretation of textual evidence concerning the character of Tudor and Stuart theatrical performance is often treated as a simple operation unworthy of extensive examination. Accordingly, the process is taken for granted, and methodological discussion is rare. This is inappropriate; however obvious the textual indications of stage action may have been to their contemporary audience, the disruption of English theatrical practice caused by the closing of the theatres in 1642 ensured that the modern interpreter must read at a disadvantage. Staging practices must be reconstructed from evidence of various kinds, including that provided by play texts. Thus, in the study of Tudor and Stuart drama there is a particular need for a system of organizing the available information into a manageable form. Such a system can only be produced by the consistent application of logically structured interpretative methods. Computer-generated concordances (such as those prepared by Marvin Spevack) and textual analysis software packages represent useful tools, but they do not by themselves solve the problem of selectively retrieving examples of particular types of staging from the great corpus of Tudor and Stuart drama. For a variety of reasons, stage directions describing both simple and complex actions are not included with any regularity in the primary texts; the mingling of theatrical and dramatic references in such directions as there are further confuses matters. In any case, concordances and simple searches are inadequate to the task of detecting evidence of action contained in dialogue. The present study of sixty plays is an attempt to devise a system of classification which cross-indexes indications of theatrical action with dramatic events in such a way as to allow for flexible comparisons of analogous staging. Following an Introduction outlining aspects of the history of theatrical and dramatic taxonomy, the first chapter provides a brief assessment of the range of staging that has traditionally been the object of study. The second offers techniques of analyzing indications of theatrical and dramatic action, and the final chapter presents the classification system created through the analysis of a group of texts consisting of Shakespeare's Folio plays together with a number of works by other dramatists, including Marlowe, Beaumont, Fletcher, Heywood, and Ford. The ideal product of this study would have been a classification system of universal applicability. Owing to the nature of the material, it was impossible to produce a truly objective scheme which met the criteria of simplicity, conciseness, and utility. However, the orientation towards stage action of the system developed does offer particular advantages to scholars interested in the reconstruction of Tudor and Stuart staging. ********************************************************** ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.