Information Retrieval List Digest 109 (April 25, 1992) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-109 ========================================================================= Date: Sun, 26 Apr 1992 21:19:54 PST Reply-To: "Information Retrieval List" Sender: "Information Retrieval List" From: IRLIST Subject: IR-L Digest, Vol.IX, No.13, Issue 109 IRLIST Digest April 25, 1992 Volume IX, Number 13 Issue 109 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. Electronic Texts in the Humanities: Methods and Tools, Summer Seminar, Princeton U., New Jersey, August 9-21, 1992 2. 3rd ASIS SIG/CR Classification Research Workshop, Pittsburgh, PA, October 25, 1992. II. QUERIES B. Requests for Information 1. Response to request for software for IR experimentation III. JOB ANNOUNCEMENTS 1. Lectureship, U. Glasgow, Computer Science Department IV. PROJECT WORK C. Abstracts 1. IR-Related dissertation abstracts ********************************************************** I. NOTICES I.A.1. Fr: CETH@ZODIAC.BITNET Re: Humanities Computing Summer Seminar CENTER FOR ELECTRONIC TEXTS IN THE HUMANITIES Electronic Texts in the Humanities: Methods and Tools August 9-21, 1992 Summer Seminar, Princeton University, New Jersey Co-sponsored by the Centre for Computing in the Humanities, University of Toronto This first Summer Seminar of the Center for Electronic Texts in the Humanities (CETH) will address a wide range of challenges and opportunities that electronic texts and software offer to teachers and scholars in the humanities. Discussions on text creation, markup, retrieval, presentation, and analysis will prepare the participant for extensive hands-on experience with illustrative software packages, such as MTAS, Micro-OCP, WordCruncher, Tact, Collate, Beowulf Workstation, Perseus, and CD-Word. Systems of markup, from ad hoc schemes to the systematic approach of the Text Encoding Initiative, will be surveyed and considered. The focus of the Seminar will be practical and methodological, concerned with the demonstrable benefits of using electronic texts in teaching and research, the typical problems one encounters and how to solve them, and the ways in which software fits or can be adapted to methods common amongst the humanities. Participants will be given the opportunity to work on a coherent project. Those with projects already in progress or preparation will be encouraged to bring them; texts and exercises will be provided for those without a specific project in mind. The seminar is intended for researchers, librarians and computer center advisers who have basic computing experience, but little or no experience of computers in a humanities research environment. The number of participants will be limited to 26. SCHEDULE Week 1, August 9-14, 1992 Sunday, August 9. Registration Monday, August 10. The electronic text a.m. What is an electronic text and where to find them; survey of existing inventories, archives, and other current resources. History of computer-assisted text analysis in the humanities. Introduction to simple concordancing with MTAS, including practical session. p.m. Creating and capturing texts in electronic form; keyboard entry vs. optical scanning. Demonstration of optical character-recognition technology. Introduction to text encoding, surveying ad hoc methods, e.g. COCOA, WordCruncher, TLG beta code; problems of these methods. Systematic approach of the Text Encoding Initiative. Practical exercise in deciding what to encode in typical texts. Tuesday, August 11. Concordancing a.m. A focussed look at computer-assisted concordance generation; types of concordances, their specific advantages and disadvantages. Alphabetization, character sequences, sorting, and forms of presentation. Introduction to Micro-OCP; practical session in its use. p.m. Further work on concordancing with Micro-OCP. Wednesday, August 12. The interactive concordance a.m. Indexed, interactive retrieval vs. batch concordance generation. Textual problems particularly suitable to an interactive system; the continuing use of concordances in hardcopy. Preparation of text for indexed retrieval; differing roles of markup and external "rules"; kinds of displays; post-processing of displays. Introduction to Tact. p.m. Practical work using Tact: simple markup, compilation of a textual database, and methods of inquiry. Thursday, August 13. Stylistics a.m. Stylistic comparisons and authorship studies using concordance tools; basic statistics for lexical and stylistic analysis. Case studies, e.g. Federalist Papers, Kenny on Aristotle, Burrows on Jane Austen. p.m. Practical session using Micro-OCP and/or Tact for stylistic analysis. Friday, August 14. Critical editions a.m. Overview of tools for preparing critical editions. Constructing glossaries and material for commentary; application of Micro-OCP and/or Tact. p.m. Collation; single-text vs. multiple-text methods. Overview of software tools. Introduction to Collate. Week 2, August 17-20, 1992 Monday, August 17. Text analysis a.m. Review of the previous week's work. Discussion on the limitations of existing software. Advanced analytical tools not commonly available, e.g. pattern recognizers, lemmatization systems, morphological analyzers, parsers; overview of these. p.m. Simple, practical morphological analysis and lemmatization with Micro-OCP and/or Tact. Tuesday, August 18. Developing and Extending Current Resources a.m. How far do existing textual databases and software go towards satisfying the needs of teachers and scholars, e.g WordCruncher (ETC) texts, Oxford Electronic Texts, the Thesaurus Linguae Graecae (TLG), the ARTFL database, the Dante Database? How these are accessed and used. p.m. The electronic dictionary; from machine-readable dictionary to computational lexicon. What the New OED and other online dictionaries can do for the scholar. Uses of lexical knowledge bases in text retrieval. Building a simple online lexicon with Tact. Wednesday, August 19. Hypertext a.m. Hypertext and hypermedia: alternative or complementary approaches to text analysis and presentation? Overview of some ongoing hypertextual projects in the humanities: Beowulf Workstation, Perseus, CD-Word. What essential role does hypertext play in these? How might hypertext and concordancing methods be combined? p.m. Practical session in building a hypertextual system, using HyperCard or Guide. A brief look at Annota. Thursday, August 20. Projects (1) a.m. Illustration of how to tackle projects using one of the methods covered earlier in the seminar; beginning of practical work. a.m. Practical work continued. Friday, August 21. Projects (2) a.m. Practical work continued. p.m. Concluding discussion of methodologies and problems. Do the results justify the amount of work involved? How is one's perspective on text changed by using automatic methods? What can one learn from the collision of these methods with intuitive perceptions? How can the machine better assist the educated imagination? CETH: The Center for Electronic Texts in the Humanities was established in October 1991 by Rutgers and Princeton Universities with external support from the Mellon Foundation and the National Endowment for the Humanities. It is intended to become a national focus of interest in the U.S. for those who are involved in the creation, dissemination and use of electronic texts in the humanities, and it will act as a national node on an international network of centers and projects which are actively involved in the handling of electronic texts. Developed from the international inventory of machine-readable texts which was begun at Rutgers in 1983 and is held on RLIN, the Center is now reviewing the records in the inventory and continues to catalog new texts. The acquisition and dissemination of text files to the community is another important activity, concentrating on a selection of good quality texts which can be made available over Internet with suitable retrieval software and with appropriate copyright permission. The Center also acts as a clearinghouse on information related to electronic texts, directing enquirers to other sources of information. INSTRUCTORS: The seminar will be taught by Willard McCarty and Susan Hockey, with assistance from Hannah Kaufman, Toby Paff and Mary Sproule. FOR FULL INFORMATION ABOUT THE SEMINAR AND REGISTRATION, PLEASE CONTACT: Summer Seminar 1992 Center for Electronic Texts in the Humanities 169 College Avenue New Brunswick, NJ 08903 U.S.A. phone: (908) 932-1384 fax: (908) 932-1386 email: ceth@zodiac (bitnet) ceth@zodiac.rutgers.edu (internet) ********** I.A.2. Fr: Phil Smith Re: 3rd ASIS SIG/CR Classification Research Workshop CALL FOR PARTICIPATION 3RD ASIS SIG/CR CLASSIFICATION RESEARCH WORKSHOP The American Society for Information Science Special Interest Group on Classification Research (ASIS SIG/CR) invites submissions for the 3rd ASIS Classification Research Workshop, to be held at the 55th Annual Meeting of ASIS in Pittsburgh,PA. The workshop will take place Sunday, October 25th, 1992, 8:30 a.m. - 5:00 p.m. ASIS `92 continues through Thursday, October 29th. The CR Workshop is designed to be an exchange of ideas among active researchers with interests in the creation, development, management, representation, display, comparison, compatibility, theory, and application of classification schemes. Emphasis will be on semantic classification, in contrast to statistically based schemes. Topics include, but are not limited to: Warrant for concepts in classification schemes * Concept acquisition * Basis for semantic classes * Automated techniques to assist in creating classification schemes * Statistical techniques used for developing explicit semantic classes * Relations and their properties * Inheritance and subsumption * Knowledge representation schemes * Classification algorithms * Procedural knowledge in classification schemes * Reasoning with classification schemes * Software for management of classification schemes * Interfaces for displaying classification schemes * Data structures and programming languages for classification schemes * Image classification * Comparison and compatibility between classification schemes * Applications such as subject analysis, natural language understanding, information retrieval, expert systems. The CR Workshop welcomes submissions from various disciplines. Those interested in participating are invited to submit a short (1-2 page single-spaced) position paper summarizing substantive work that has been conducted in the above areas or other areas related to semantic classification schemes, and a statement briefly outlining the reason for wanting to participate in the workshop. Submissions may include background papers as attachments. Participation will be of two kinds: presenter and regular participant. Those selected as presenters will be invited to submit expanded versions of their position papers and to speak to those papers in brief presentations during the workshop. All position papers (both expanded and short papers) will be published in proceedings to be distributed prior to the workshop. The workshop registration fee is $35.00, which includes lunch and refreshments. Submissions should be made by email, or diskette accompanied by paper copy, or paper copy only (fax or postal) to arrive by May 15, 1992 to: Raya Fidel, Graduate School of Library and Information Science, University of Washington, FM-30, Seattle, WA 98195; Internet: fidelr@u.washington.edu. Phone: 206-543-1888; Fax: 206-685-8049. ********************************************************** II. QUERIES II.B.1. Fr: Mark Zimmerman Re: free indexer/browser software --- reply to IRLIST queries A couple of msgs in IRLIST of 14 Apr 92 ask about free software for IR experimentation; I have some programs I wrote that may help some users, and they're free, with full source code, under GNU General Public License ... in brief, qndxr.c builds inverted index files to large text files at 50-80 MB/hour (on current Sun and NeXT workstations, in my recent experience); brwsr.c is a command-line-driven browser program that lets you see word lists, key-word-in-context displays, and full text on demand, and does very simple proximity search. Each is about 50kB long, mostly comments. Todd Kaufmann of CMU has a very nice GNU Emacs interface to the brwsr which makes it easier to use (if you are a GNU Emacs user). I also have a good Macintosh interface to my indexed text files (via external C functions hiding behind HyperCard); it is about 250kB long in binhex'd stuffit'd form, and is called "Free Text", version 1.02. I've begun rewriting the software to add features such as multifile databases, more flexibility in alphabetization and character mapping, better proximity search, etc.; I'd be happy to share that experimental raw C code too (zndxr.c, zmrgr.c, zbrwsr.c). Liam Quin has another nice UNIX system for free text IR, 'lq-text', which comes with source code. A place to discuss these sorts of things is the PARA group, where a few msgs/week are posted on hypertext, free-text IR, Emacs interfaces, etc. Sign up at 'para-request@cs.cmu.edu' if interested... ^z - Mark Zimmermann - science@oasys.dt.navy.mil ********************************************************** III. JOB ANNOUNCEMENTS III.1. Fr: Prof Keith Van Rijsbergen Re: Lectureship at University of Glasgow, CS Department University of Glasgow, Computing Science Department, Lectureship Applications are invited for a lectureship in this thriving department. We are seeking a computer scientist specialising in Artificial Intelligence, but with an interest in applying AI in at least one of the following research areas: databases, human computer interaction, or information retrieval. The successful applicant will be expected to contribute significantly to the experimental and practical side of large scale computation in at least one of the above application areas. Salary will be within the Lecturer scale ( 12,860 - 23,739 pounds per annum), with placement according to qualifications and experience. The successful applicant will be eligible to join the Universities' Superannuation Scheme and the Universities' Supplementary Dependants' Pension Scheme. Further information regarding these schemes is available from the Superannuation Officer, who is also prepared to advise on questions relating to the transfer of superannuation benefits. Applicants are asked to provide a brief note on the state of their health. The University of Glasgow is an equal opportunities employer. Further particulars may be obtained from the Academic Personnel Office, University of Glasgow, Glasgow, G12 8QQ; or from the Head of the Computing Science Department by phoning 041 330 4463 or by sending e-mail to keith@dcs.glasgow.ac.uk Those who wish to be considered should send to the Academic Personnel Office not later than 30 April 1992, eight copies of a statement of their qualifications and experience. Testimonials are not required but the names and addresses should be given of three persons to whom reference may be made. In reply please quote Ref. No. 7555. ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG91-21157. AU RODGERS, CHERYL WILEY. TI AN INTELLIGENT PATH MECHANISM IN HYPERTEXT: INFORMATION FILTERING USING ARTIFICIAL INTELLIGENCE IN A COOPERATIVE PROBLEM-SOLVING ENVIRONMENT. IN The University of Texas at Arlington Ph.D. 1990, 190 pages. SO DAI V52(02), SecB, pp938. DE Computer Science. Artificial Intelligence. AB In recent years researchers have begun experimenting with coupling executable code with hypertext to transform what was once considered a basically passive medium into an active one. The cooperative problem solving application described in this paper performs two functions. The first is algorithmic problem solving; the second provides online reference material on demand during a problem solving session. The algorithmic problem solver was implemented using an intelligent path embedded in a hypertext document. The path itself represented the basic design algorithm of the application domain; procedures attached to the path nodes complete the problem solving function. Online domain specific reference material is available at any point during a problem solving session to provide assistance to the user. If the reference material is used during a problem solving session, a filtering mechanism is available to control navigational access in the hypertext document. Filtering has the effect of pruning the search space available for browsing thereby focusing the user on material that is pertinent to the task at hand. Hypertext systems have been described as consisting of three components: file access, hypertext representation, and user interface. The research reported in this paper describes the use of a fourth component which applies predicates to node and link attributes to produce indices for the purpose of achieving information filtering as means of managing navigational access in a hypertext network. Both the intelligent path mechanism and the information filtering mechanism are described in this paper. AN University Microfilms Order Number ADG91-21550. AU SCHATZ, BRUCE RAYMOND. TI INTERACTIVE RETRIEVAL IN INFORMATION SPACES DISTRIBUTED ACROSS A WIDE-AREA NETWORK. IN The University of Arizona Ph.D. 1991, 152 pages. SO DAI V52(02), SecB, pp939. DE Computer Science. Information Science. Engineering, Electronics and Electrical. AB The potential to provide interactive data manipulation across high-speed nationwide networks is stimulating development of new database technology. An information space is a data model that can support rapid browsing of large amounts of information contained in a digital library physically distributed across many disparate sources. This dissertation discusses supporting interactive retrieval of objects inside an information space across the nationwide scientific network. Implementing such interactive retrieval requires designing caching policies that enable fetching requested objects into a local user workstation from a remote file server with sufficiently short response time to support effective browsing interaction. An adequate caching policy should utilize properties of user perception and data representation within an information space. This dissertation describes a series of new techniques for caching objects within an information space and gives measurements of their performance across the NSFNET. These policies take advantage of special features of interactive retrieval within information spaces, such as initially fetching only the subset of requested objects that will be immediately displayed and prefetching additional objects during idle time when the user is considering which command to issue next. A prototype built by the author, the Telesophy System, supports interactive retrieval for information spaces across local-area networks and serves as a basis for identification of special features. To consider additional needs for efficient implementation across wide-area networks, the significant parameters and policies in implementing caching are systematically identified. Specific values of these caching parameters are used to evaluate the performance of a range of caching policies under a variety of interactions relevant to browsing information spaces. Finally, an incremental caching policy is proposed, which combines many techniques taking advantage of special features of interacting with information spaces. Measurements of the performance of this policy under a variety of conditions demonstrate that interactive retrieval is possible across wide-area networks and that appropriate optimization of the caching policy can produce performance comparable to that across local-area networks. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle engle@cmsa.berkeley.edu or meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOG ***, where *** is the month and day on which the issue was mailed, to LISTSERV@UCCVMA.BITNET. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.