Information Retrieval List Digest 260 (June 19, 1995) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-260 IRLIST Digest ISSN 1064-6965 June 19, 1995 Volume XII, Number 23 Issue 260 ********************************************************** II. JOBS 1. Int'l. U. of Japan: Information Access Librarian III. NOTICES B. Meetings 1. Information Retrieval & Automatic Construction of Hypermedia 2. Text Encoding for Information Interchange C. Miscellaneous 1. Rutgers U.: Experimental Interface to the Library Catalog IV. PROJECTS D. Initiatives & Proposals 1. Human Language Resources (NSF/ARPA) ********************************************************** II. JOBS II.1. Fr: Kazuto Shibuya Re: International University of Japan: Information Access Librarian International University of Japan, a two-year graduate program offering MAs and MBAs, is seeking an individual who combines computer expertise, public service skills and the desire to have an impact as part of the Reference/Internet service team working to build a library for the 21st century. The available position is the post of Information Access Librarian at the Matsushita Library and Information Center. The Information Access Librarian is responsible for helping library patrons to use the library effectively and find the information they require for their study and research needs. A significant portion of this Librarian's time is spent providing a high level of service to library users - students and faculty. The main working language used is ENGLISH, however candidates with Japanese language skills are encouraged to apply. Good English communication and consultation skills are required. This position demands expertise in the use of electronic (e.g., online database services, CD-ROM systems, and networks) as well as traditional information resources. The successful candidate will be able to conduct library orientations, including classroom sessions, tours, individual training and preparation of user guides designed to increase user success with electronic and traditional information resources. The position requires: information librarianship; a minimum of 3 years experience in public oriented reference or information access services; demonstrated commitment to public service; awareness of SGML and HTML; broad interest in the organization and retrieval of electronic information via the Internet; Some online searching experience with Dialog, LEXIS/NEXIS and others; network information retrieval and resource discovery; English oral and written communications skills; and a working knowledge of DOS, Windows and Macintosh hardware and software (e.g, communications, CD-ROM and LANs). The successful candidate must be flexible, a self-starter, an effective communicator, and be an enthusiastic participant in a team- oriented environment. A background in humanities or social sciences is desired. The International University of Japan is a certified graduate institution with 200 students offering both an MA in international relations and an MBA. Courses are taught in English to students from some 35 countries all around the world averaging 28 years of age. Forty percent of the student body, and the majority of the staff are Japanese nationals. Faculty come from all over the world. The Matsushita Library and Information Center houses 100,000 volumes and over 1,200 periodical titles in several languages, but predominantly English and Japanese. The campus is preparing to install a campus-wide LAN, and the latest library automation technology. Located in rural Niigata Prefecture, the IUJ campus is just 1.5 hours from downtown Tokyo and is close to skiing, tennis, hot springs and hiking. To apply for this unique position, send CV with cover letter and references, to Mr.Shinichiro Oda, Deputy Manager of the Matsushita Library and Information Center, International University of Japan, Yamato- machi, Niigata, 949-72 JAPAN. Or send via e-mail to MLICJOB@JPNIUJ00.BITNET Interviews will be scheduled as suitably as possible with the location of the applicant. The deadline of this application is June 30th, 1995. The position begins September 1, 1995. Contract length is for an initial one- year period with the ability to renew and for the right candidate, holds the potential of becoming a permanent position. Compensation is attractive and commensurate with experience and skills. Benefits include health care coverage. ********************************************************** III. NOTICES III.B.1 Fr: James Allan Re: SIGIR '95 Workshops (Current Information) Research Workshop INFORMATION RETRIEVAL AND AUTOMATIC CONSTRUCTION OF HYPERMEDIA to be held in association with SIGIR '95: 18th International Conference on Research and Development in Information Retrieval Seattle, WA, USA July 13, 1995 8:30 a.m. - 3:30 p.m. The workshop will address IR methods and tools that can be used in the automatic construction of a hypermedia collection, to produce an informative set of documents (nodes) and links that can be searched and browsed by content. For example, typical IR measures of document similarity can provide a motivation for linking documents. Also, recent work with passage retrieval shows that it can be used to structure a collection of "flat" documents for use in a hypermedia. These and other methods for the automatic authoring of hypermedia collections will be presented and discussed in the workshop. Both techniques that construct a hypertext from an unlinked set of data and those that can be applied to an existing hypertext/media (augmenting its set of links) are relevant to the workshop. The workshop will also discuss issues such as static links, dynamic links, automatically assigning types to the generated links, and evaluation of link quality. The following researchers will talk about their work toward automatically constructing or evaluating hypermedia. (This list is accurate as of June 19, 1995, but is subject to change.) * INVITED SPEAKER: Gerard Salton, Cornell University "Text Structure Analysis and its Use for Text Retrieval, Text Traversal, and Text Summarization" * Maristella Agosti, Universita di Padova "Automatic authoring and construction of hypermedia for IR" * James Allan, University of Massachusetts "Automatic Hypertext Construction" * Niels K. Bauer, Texas A&M University "AutoLink: An Automated Link Generator for Building Hypertext" * James Blustein, University of Western Ontario "Using LSI to evaluate the quality of hypertext links" (with Robert E. Webber) * Paul Thistlewaite, Australian National University "The PASTIME project: Hypermedia in the Australian Parliament" The workshop will also include time for general discussion and some small group discussion about specific subtopics of interest. Attendance at SIGIR '95 is not required, though it is necessary to register for the workshop using the conference registration form. Cost of the workshop is $55 which includes a box lunch and workshop documentation. A copy of the registration form plus full information on SIGIR '95, including descriptions of other workshops, several tutorials, all technical sessions, and accommodation, etc. is available via anonymous ftp from: ftp.u.washington.edu (/public/sigir95/program) or via WWW at URL: http://info.sigir.acm.org/sigir/conferences/SIGIR_95_adv.pgm.html; or contact sigir95@u.washington.edu to request a copy of the program by mail. ********** III.B.2. Fr: Eric Dahlin Re: TEI Workshop TEXT ENCODING FOR INFORMATION INTERCHANGE A Tutorial Introduction to the Text Encoding Initiative A workshop to be held at ACH/ALLC '95 in Santa Barbara The organizers of ACH/ALLC '95 are pleased to announce a pre- conference workshop on the Text Encoding Initiative Guidelines. TITLE: Text Encoding for Information Interchange: A Tutorial Introduction to the Text Encoding Initiative DATE: 10 July 1995, 9 a.m. to 4 p.m. PLACE: UCSB Microcomputer Laboratory INSTRUCTORS: C.M. Sperberg-McQueen, Lou Burnard David, Chesnutt REGISTRATION FEE: $50 This workshop will introduce the encoding scheme recommended by the Text Encoding Initiative (TEI) in its Guidelines for Text Encoding and Interchange. The main focus will be on introducing the tag set defined in the Guidelines, but the context within which the TEI Guidelines were developed and general problems of text markup will also be addressed. TOPICS: 1. General Principles of Text Markup: What is markup for? Varieties of markup; effect of markup. What are electronic texts for? Markup and interpretation. Markup as a means of enabling intelligent retrieval. 2. Basics of SGML: What it is and isn't; the case for using it. Basic SGML syntax for the document instance (tags, entity references, comment declarations). Examination and explication of simple examples. 3. Document Analysis: What document analysis is, and why it is an essential part of any e-text project. Phases of document analysis. Group document analysis of a sample text. 4. Basics of the TEI: origins and goals of the TEI, overall organization of the TEI encoding scheme, basic structural notions of the TEI DTD and the pizza model: the base, additional, and core tag sets, and how they may be extended, modified, and documented; group tagging of the sample document. 5. Hands-on Session: introduction to standard commercial or public-domain SGML-aware editor. 6. Putting the TEI into Practice: types of software available for SGML, how the adoption of TEI encoding affects the practical work of an e-text project, and a review of where to go for further information. THE TEXT ENCODING INITIATIVE: The Text Encoding Initiative (TEI) is an international cooperative research effort, the goal of which is to define a set of generic Guidelines for the representation of all kinds of textual materials in electronic form, in such a way as to enable researchers in any discipline to interchange texts and datasets in machine readable form, independently of the software or hardware in use, and also independently of the particular application for which such electronic resources are used. The first full version of the TEI Guidelines was published in May, 1994, after six years of development in Europe and the US. It takes the form of a substantial reference manual, documenting a modular and extensible SGML document type definition (DTD), which can be used to describe electronic encodings of all kinds of texts, of all times and in all languages. It is sometimes said that the Standard Generalized Markup Language (SGML: ISO 8879) provides only the syntax for text markup; the TEI aims to provide a semantics. Computer-aided research now crosses many political, linguistics, temporal, and disciplinary boundaries; the TEI Guidelines have been designed to be applied to texts in any language, from any period, in any genre, encoded for research of any kind. As far as possible, the Guidelines eschew controversy; where consensus has not been established, only very general recommendations are made. The object is to help the researcher make his or her position explicit, not to dictate what that position should be. Viewed as a standard, the TEI scheme attempts to occupy the middle ground. It offers neither a single all-embracing encoding scheme, solving all problems once for all, nor an unstructured collection of tag sets. Rather it offers an extensible framework containing a common core of features, a choice of frameworks or bases, and a wide variety of optional additions for specific application areas. Somewhat light-heartedly, we refer to this as the Chicago Pizza model (in which the customer chooses a particular base -- say deep dish or whole crust -- and adds the toppings of his or her choice), by contrast with both the Chinese menu or laissez-faire approach (which allows for any combinations of dishes, even the ridiculous) and the set meal approach, in which you must have the entire menu. MATERIALS AND PRESENTERS: All participants will be provided with a printed introductory summary guide to the TEI scheme, and supporting materials on PC disks, including full versions of the TEI DTDs, public domain SGML software and sample TEI texts. Subject to availability, participants may be able to acquire the CD-ROM of the TEI Guidelines at a discounted price. The tutorial will be taught by three instructors: C. M. Sperberg-McQueen (Computer Center, University of Illinois at Chicago), Lou Burnard (Oxford University Computing Services), and David Chesnutt (Dept. of History, University of South Carolina). Please register before July 1, 1995 FOR COMPLETE INFORMATION, CONTACT: Sally Vito Phone: (805) 893-3072 E-mail: hr03vito@ucsbvm.ucsb.edu ********** III.C.1. Fr: kantorp@bimacs.cs.biu.ac.il Re: Rutgers U.: Experimental Interface to the Library Catalog An experimental interface to the library catalog at Rutgers University is available over the Internet. This system, called the Adaptive Network Library Interface (ANLI) permits users of the catalog to record, and to browse, anonymously contributed links between items in that catalog. Thus it puts a kind of hypertext layer on top of the existing catalog. In use it appears as a transparent interface to the Rutgers IRIS (a GEAC system). When it recognizes that you are considering a unique bibliographic item it invites you to browse the related items, and to offer suggestions. All interested persons are invited to experiment with it. To access the anli over the Internet follow these steps: telnet mozart.rutgers.edu login: anli password: anli anli ID: (option, you may use your own initials) To complete your session, type end There is a brief 4 question exit interview. To skip a question, enter until the cursor leaves the reply box. The interface was developed by S. Zhao, T. Badics, R. Settergen, L. Nordmann and R. Schwartz working under the direction of Prof. Paul Kantor at the Rutgers, SCILS, Alexandria Project Laboratory. The development was supported in part by a grant from the US Department of Education. For further information about the ANLI project, contact Lorene Reba at lreba@scils.rutgers.edu. ********************************************************** IV. PROJECTS IV.D.1. Fr: Maria Zemankova Re: Human Language Resources Initiative (USA) THIS NOTICE IS HEAVILY CUT FOR SPACE PURPOSES. FOR COMPLETE INFORMATION, CONTACT: Gary W. Strong, Program Director Interactive Systems (703) 306-1928 gstrong@nsf.gov HUMAN LANGUAGE RESOURCES Program Solicitation A JOINT INITIATIVE OF: NATIONAL SCIENCE FOUNDATION COMPUTER AND INFORMATION SCIENCE AND ENGINEERING DIRECTORATE and ADVANCED RESEARCH PROJECTS AGENCY SOFTWARE AND INTELLIGENT SYSTEMS TECHNOLOGY OFFICE DEADLINE: JULY 14, 1995 INTRODUCTION: The Information, Robotics and Intelligent Systems Division (IRIS) and the Cross-Disciplinary Activities Office (CDA) of the Computer, Information Science and Engineering Directorate (CISE) of the National Science Foundation (NSF) and the Software and Intelligent Systems Technology Office (SISTO) of the Advanced Research Projects Agency (ARPA) plan to jointly support research and development devoted to developing linguistic resources for use in human language technology. The aim of this joint initiative between NSF and ARPA is to accelerate the progress in human language technology by supporting the research and development of widely-accessible and affordable language resources and closely related data resources. It is also of interest to encourage access to these resources by exploring alternative delivery mechanisms that the research community may incorporate as requested resources in their proposals. TOPICS OF INTEREST: This initiative has three main foci: (1) the continued improvement and extension of speech, text, and closely related language resources to support research and development in human language technology and associated areas, such as interlanguage communications; (2) focused experimental research and data collection involving multimodal types of human language data resources; and (3) innovative ways to make these resources widely available to potential users for both research and education. The last two foci are described in Type II awards below. TYPE I AWARD. Improvement in Basic Speech and Text Data Resources. Resources of interest are those created, maintained, and distributed to provide broad training and evaluation data for basic research and technological advances in the following areas: - Speech recognition, including the transcription of high-quality continuous speech and other contextual information from talkers unknown to the system. - Speech understanding, in which the focus is primarily on domain-specific database query and update by voice. - Information retrieval, in which the retrieval request is made in terms of speech, text, or other closely associated modalities. - Machine translation, including computer-aided human translation and interlanguage dialog. TYPE II AWARDS. New Approaches and Means of Data Collection and Distribution. While the primary interest of this initiative is resource support for research in speech and text recognition and understanding, related support on a smaller scale is also available for the following areas of innovation: - Development of innovative resources. Examples include: The collection and annotation of video, involving facial gestures and hand movements while speaking to advance research on multi-modal communication using kinesics. Dialogue data collection and annotation to serve as a foundation for the advancement of research on natural language understanding in realistic situations of human-to-human communication. - Novel methods of delivery for multimedia resources to support, for example, such areas as the study of prosody, facial expression understanding, multi-agent dialogues, or others. - Transportable software tools for speech and written language data access and analysis. - Novel mechanisms for language data capture. Means to capture and make available samples such as contrived on-line speech understanding experiments or scenarios for public access and data collection. Experiments using such data to advance language research on speech recognition in noisy environments over telephones by ordinary users. SCOPE OF SUPPORT: This initiative is expected to provide overall a total of approximately $3.5 million, depending on funding availability, to one or more awardees in the following two categories: - One large, standard award in the broad area of data collection, archival and distribution of speech, text, and closely related modalities or supportive annotations (Type I Award above). This award may be in the form of an NSF grant or cooperative agreement, depending on the structure of the project. Funding for this award will begin in late FY95. The total budget should not exceed $2 million over a 30-month period. It's duration may depend on the proposer's method for achieving self-sufficiency. - Several smaller grants in the range of $150K to $250K per year for up to three years toward one or more innovative approaches to language data or its delivery (Type II Awards above). Funding for these awards will be made when FY96 funds are available. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: NCGUR@UCCMVSA.UCOP.EDU Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu Nancy Gusack ncgur@uccmvsa.ucop.edu The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCOP.EDU. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.