Information Retrieval List Digest 217 (June 13) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-217 6.1 June 13, 1994 Volume XI, Number 24 Issue 217 ********************************************************** IV.PROJECTS A. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** IV. PROJECT WORK IV.A.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG94-02326. AU DAOUD, AMJAD M. TI EFFICIENT DATA STRUCTURES FOR INFORMATION RETRIEVAL. IN Virginia Polytechnic Institute and State University Ph.D. 1993 197 pages. SO DAI v54(08), SecB, pp4244. DE Computer Science. AB This dissertation deals with the application of efficient data structures and hashing algorithms to the problems of textual information storage and retrieval. We have developed static and dynamic techniques for handling large dictionaries, inverted lists, and optimizations applied to ranking algorithms. We have carried out an experiment called REVTOLC that demonstrated the efficiency and applicability of our algorithms and data structures. Also, the REVTOLC experiment revealed the effectiveness and ease of use of advanced information retrieval methods, namely extended Boolean (p-norm), vector, and vector with probabilistic feedback methods. We have developed efficient static and dynamic data structures and linear algorithms to find a class of minimal perfect hash functions for the efficient implementation of dictionaries, inverted lists, and stop lists. Further, we have developed a linear algorithm that produces order preserving minimal perfect hash functions. These data structures and algorithms enable much faster indexing of textual data and faster retrieval of best match documents using advanced information retrieval methods. Finally, we summarize our research findings and some open problems that are worth further investigation. AN University Microfilms Order Number ADG94-03493. AU FUJIHARA, HIROKO. TI KNOWLEDGE ACQUISITION PROCESS MODEL: STRUCTURING CONCEPTS FROM UNSTRUCTURED DATA. IN Texas A&M University Ph.D. 1993, 246 pages. SO DAI v54(08), SecB, pp4246. DE Computer Science. AB Knowledge acquisition is one of the most important and problematic aspects of developing knowledge-based systems. Many automated tools have been introduced in the past, however, manual techniques are still heavily used. Interviewing is one of the most commonly used manual techniques for a KA process, however, few automated support or tools exist to help knowledge engineers enhance their performance. This dissertation proposes a KA process model in which the knowledge engineer can effectively retrieve, structure, and formalize knowledge components, so that the resulting knowledge base is more accurate and complete. The approach proposed in this work is a hybrid of information retrieval and machine learning techniques. Two IR techniques employing best-match strategies are used; the vector space model and the probabilistic ranking principle model. A prototype of the KA model, Knowledge Acquisition process Model - Structuring Knowledge from Unstructured Data (KAM-SCUD), was implemented to demonstrate the concept. The results from KAM-SCUD were compared with the outputs from a manual KA process in terms of amount of information retrieved and the process time spent. An analysis of the results shows that the process time to retrieve knowledge components (e.g., facts, rules, protocols, uncertainty) of KAM-SCUD is about half that of the manual process and the number of knowledge components retrieved from KA activities is four times more than that retrieved through a manual process. KAM-SCUD demonstrates the effectiveness of the KA process model proposed in this dissertation. AN This item is not available from University Microfilms International ADG05-74077. AU KIM, JUN-TAE. TI SEMANTIC KNOWLEDGE ACQUISITION FOR INFORMATION EXTRACTION FROM TEXTS ON PARALLEL MARKER-PASSING COMPUTER. IN University of Southern California Ph.D. 1993. SO DAI v54(08), SecB, pp4250. DE Computer Science. AB Today is known as the information age. The amount of available on-line texts is rapidly increasing, and the need of computerized information processing is more than ever before. For information extraction and retrieval from texts, knowledge based natural language processing approach has been studied for a long time, and has been successfully applied to selected tasks. However, knowledge based text processing always faces difficulty of knowledge base construction when a practical, large scale application is considered. A large knowledge base of domain dependent, semantic and phrase patterns is needed, and manual encoding of such semantic patterns is a major obstacle for a real world application. To overcome the scalability problem, an automated acquisition of knowledge should be provided. This thesis deals with two important issues for practical textual information processing: the scalability and the speed. For the scalability, an automatic knowledge acquisition system for information extraction is developed. For the speed, marker-passing based massively parallel pattern matching for parsing is introduced. Efficiency of parallel marker-passing is also demonstrated through the parallel classification algorithm. A semantic pattern knowledge representation for information extraction, which is suitable for automated acquisition and parallel processing, is provided. Through the experiments with a set of news articles, the feasibility of the representation and the acquisition method are demonstrated. The time to construct semantic patterns is significantly reduced, and the saturation of knowledge base is clearly shown. This thesis shows that an automated semantic pattern acquisition, together with an appropriate representation, can provide scalability to knowledge based information extraction by overcoming the knowledge engineering bottleneck. (Copies available exclusively from Micrographics Department, Doheny Library, USC, Los Angeles, CA 90089-0182.). AN University Microfilms Order Number ADG94-00451. AU FOX, LOUISE WATSON. TI DEVELOPMENT OF A SYSTEM FOR PRODUCING PROPERTY-OWNER TIME LINES WITH EDUCATIONAL APPLICATIONS. IN East Texas State University Ed.D. 1993, 290 pages. SO DAI v54(08), SecA, pp2865. DE Education, Curriculum and Instruction. History, United States. History, Latin American. Sociology, Demography. Information Science. Education, Social Sciences. AB Purpose. The major purpose of the study was to research, define, plan, and devise a system appropriate for educational and commercial purposes, to be used to create an electronic database in which historical data from land records would be stored. From the database, information on geographically and historically related tracts of land would, through the use of unique identifiers, be retrieved as time lines of property owners. Procedure. Current computer storage and retrieval systems were investigated. Criteria for the proposed system were developed, and plans were devised for storing data and retrieving information as time lines of property owners in particular geographic areas and in specific tracts of land. Research was conducted on sample tracts of land and data were entered in a computer database. Results. Using the identification system, three model time lines, one for each of the three most commonly found land configurations, were produced. Models were for land currently or previously lying within (a) one county and one original survey; (b) one county and two or more original surveys; and (c) two or more counties and one original survey. Conclusions. This database or other databases using the same criteria and method can be utilized in the classroom as well as by religious scholars, historians, genealogists, historical novelists, folklorists, organizations that stress the historical significance of individual persons and for demographic, sociological, and anthropological studies. Such databases are also appropriate for use by land title and abstract companies, practicing attorneys, and real estate and petroleum industries. AN University Microfilms Order Number ADGNN-81469. AU NICHOLS, SUSAN E. TI LAND REGISTRATION IN AN INFORMATION MANAGEMENT ENVIRONMENT. IN The University of New Brunswick (Canada) Ph.D. 1992, 357 pages. SO DAI v54(08), SecB, pp4297. DE Engineering, Civil. Computer Science. IS ISBN: 0-315-81469-1. AB A narrow conveyancing perspective in land registration has led to the development of numerous unconnected, specialized registries in most jurisdictions, each maintaining a specific set of land tenure information. A focus on complex legal procedures has also inhibited innovation and system reform. The objective of this research has been to demonstrate how land registration can be more effectively designed to meet broader land management requirements. The conclusion is that this can be accomplished by putting greater emphasis on the information management function of land registration. This thesis provides a synthesis of land registration from an information management perspective. It examines the requirements for tenure information in land management and land administration and develops models for these processes to demonstrate the potential role of land registration. Problems in existing land registration arrangements and recent trends in system development are reviewed. A set of conceptual models has been designed to describe land registration functions, processes, information, and systems from an information management perspective. One of the advantages of the models is that they are independent of specific legal, technical, or administrative arrangements. Using these models, the thesis provides a methodology for evaluating land registration systems and requirements, and a framework for identifying appropriate reform options and developing reform strategies. The research was based on a detailed analysis of requirements in three Canadian jurisdictions, on an evaluation of the Swedish Land Data Bank System, and on site visits in other countries. Although the case studies led to the development of the theoretical models, the research also made practical contributions. The studies became a focus for improved government co-ordination in Newfoundland and the Northwest Territories. In Prince Edward Island the research also contributed to new government policy and departmental reorganization to improve the management of land tenure information. AN University Microfilms Order Number ADG94-01679. AU GLUCK, MYRON HENRY. TI UNDERSTANDING PERFORMANCE IN INFORMATION SYSTEMS: AN INVESTIGATION OF SYSTEM AND USER VIEWS OF GEOGRAPHIC INFORMATION. IN Syracuse University Ph.D. 1993, 388 pages. SO DAI v54(08), SecA, pp2775. DE Information Science. Library Science. Geography. AB The goal of this exploratory study was to discover meaningful relationships between the system's and user's views of performance with geographic information accounting for the role of task and format. The system's view instrumentation tested subjects ability to perform geographic reading, analysis and interpretation tasks with maps and text. The system's view, based upon the concept of cognitive fit, collected and combined time-on-task and accuracy into a single system performance measure of competence. Subjects described a recent situation in which they had a need for geographic information to build the user's view of system performance. The user's view, based on the sense-making metaphor, collected measures of relevance and satisfaction. The analyses of this study generated direct links between the system's view measure of competence and the user's view measures of relevance and satisfaction. The study also exposed task, format, and experience as indirect links between the views of performance in geographic contexts. Significant findings in this study had effect sizes of approximately one half a standard deviation. Specifically, low competence was related to low satisfaction and low relevance while medium to high competence was related to medium to high satisfaction and relevance. If formats 'fit' the task they can be safely ignored from a system view. From a user view both maps and text provide support in resolving geographic information needs. Tasks are pivotal in understanding performance since competence, satisfaction, and relevance all degrade with increasing task demands. The study also found a moderately strong relationship between relevance and satisfaction at the level of the user's concern. Several hypotheses, based on quantitative and qualitative criteria, were generated that are consistent with a merged, no-fault view of performance in information systems. Future work needs to confirm and validate these hypotheses. This study began to explain the consequences and benefits of a no-fault approach to understanding information systems. Much work remains to apply these results within geographic information systems and products as well as to test them in other domains. AN University Microfilms Order Number ADG94-01925. AU SPINK, AMANDA HELEN. TI FEEDBACK IN INFORMATION RETRIEVAL. IN Rutgers The State University of New Jersey - New Brunswick Ph.D. 1993, 311 pages. SO DAI v54(08), SecA, pp2776. DE Information Science. AB This study explores the human aspects of feedback in information retrieval (IR). Feedback is an important concept within information retrieval studies, as an underlying aspect of the interaction between user and information retrieval system. But the problem analysis and literature review reveal that feedback is relatively ambiguous and underresearched in information retrieval studies. This lack of research suggests a systematic study of feedback is required to add to our understanding and modeling of the interaction process. The aim of the study is to add to our understanding of the nature of feedback during information retrieval interaction. The general research question is: What is nature of feedback in IR interaction. The specific questions are: (1) What are the types of feedback. (2) What is the relationship between types of feedback. (3) What is the relationship between feedback and search terminology. and (4) What are the roles of the user, intermediary and information retrieval system in feedback. Research methods include an exploratory grounded theory micro-analysis, log-linear analysis and Markov analysis. A feedback unit of analysis was identified. 885 feedback occurrences and five types of feedback, reflecting user-intermediary concern with magnitude, relevance and strategy, were identified from 40 user-intermediary-information retrieval system interactions. Sequences of feedback occurrences were examined using log-linear and Markov analysis. Quantitative analysis was also conducted into the effectiveness of search terms identified during relevance feedback. The results show that feedback is a more complex process than previously understood in information retrieval research. Magnitude and relevance feedback were found to be major elements in the online search process. The study proposes a conceptual framework for the concept of feedback and a notion of a grammar of information retrieval interactions. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCOP.EDU Send submissions to IRLIST to: IR-L@UCOP.EDU Or send subscription requests and submissions to: NANCY.GUSACK@UCOP.EDU Editorial Staff: Clifford Lynch clifford.lynch@ucop.edu Nancy Gusack nancy.gusack@ucop.edu Mary Engle mary.engle@ucop.edu The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCOP.EDU. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.