Information Retrieval List Digest 191 (December 6, 1993) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-191 IRLIST Digest ISSN 1064-6965 December 6, 1993 Volume X, Number 47 Issue 191 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. ETS Conference on Natural Language Processing Techniques in Assessment and Education 2. ACL-84 Student Call for Papers 3. SIGIR-94 Formats Finalized II. QUERIES B. Requests for Information 1. Responses to Query on Automated Aids to Indexing ********************************************************** I. NOTICES I.A.1. Fr: Jill C Burstein Re: ETS Conference on Natural Language Processing Techniques in Assessment and Education The Educational Testing Service Conference on Natural Language Processing Techniques in Assessment and Education May 18th - 19th, 1994 Chauncey Conference Center Educational Testing Service Rosedale Road Princeton, New Jersey 08541 CONFERENCE PURPOSE: Natural Language Processing Techniques have been found to be increasingly useful in the domains of assessment and education. The purpose of this conference is to bring together researchers from both the NLP, and assessment and education communities and to share ideas about how NLP techniques can be implemented to aid in tasks for assessment and education. Speakers are being invited from industry and academia to discuss their research and applications of NLP in assessment and education. We anticipate that the conference will encourage on-going discussion between the NLP, and assessment and education communities. TOPICS: NLP Techniques for Assessment of Natural Language Responses to Test Items Computer-Aided Design in Education Automatic Spelling Correction for Automated Scoring of Natural Language Responses Intelligent Tutors The conference will be held at the Chauncey Conference Center on ETS' Princeton campus. Chauncey Conference Center has rooms for conference guests who choose to stay overnight. The price of the conference varies depending on the type of accommodations requested. Address inquiries to: Corrine Cohen Mailstop 16-R Educational Testing Service Rosedale Road Princeton, NJ 08541 phone: (609) 734-1108 ********** I.A.2. Fr: Don Walker Re: ACL-94 Student Call for Papers ACL-94 CALL FOR STUDENT PAPERS Student Sessions at the 32nd Annual Meeting of the Association for Computational Linguistics 27 June - 1 July 1994 New Mexico State University Las Cruces, New Mexico, USA PURPOSE: The goal of these sessions is to provide a forum for student members to present WORK IN PROGRESS and receive feedback from other members of the computational linguistics community, particularly senior researchers. The sessions will be workshop-style, consisting of short paper presentations and discussion. The papers will be published in a special section of the conference proceedings. Note that the student sessions in NO way influence the treatment of student-written papers submitted to the main conference. Rather, the student sessions will provide an entirely separate track emphasizing students' work in progress rather than completed work. REQUIREMENTS: Papers should describe original, unpublished work in progress that demonstrates insight, creativity, and promise. Topics of interest are the same as for the main conference. All authors must have ACL Student Membership (or be students even though paying the regular member rate because they earn a regular income) at the time of the conference. Membership information is referred to below in the section on ``ACL and Conference Information.'' Papers submitted to the main conference can not be considered for the student sessions. Students may, of course, submit DIFFERENT papers to BOTH the main conference and the student sessions, or papers on different aspects of a particular problem or project. FORMAT FOR SUBMISSION: Student authors should submit papers limited to 3 pages (including a mandatory abstract, references, figures, and appendices) as well as a title page and identification page in the format described below. Papers outside the specified length and formatting requirements are subject to rejection without review. Papers should be headed by a title page containing the paper title, a short (5 line) summary and a specification of the subject area(s). Since reviewing will be ``blind'', the title page of the paper should omit author names and addresses. Furthermore, self-references that reveal the authors' identity (e.g., ``We previously showed (Smith, 1991) . . '') should be avoided. Instead, use references like ``Smith previously showed (1991) . . .'' To identify each paper, a separate identification page should be supplied, containing the paper's title, the name(s) of the author(s), complete addresses, a short (5 line) summary, and a specification of the subject area(s). MEDIA OF SUBMISSION: Authors must submit their papers by BOTH hardcopy and email if possible or by hardcopy only. Unlike the ACL main session, there is no email only option, but we do encourage you to use the hardcopy and email option. Electronic submissions should be either self-contained LaTeX source or plain text. LaTeX submissions must use the ACL submission style (aclsub.sty) retrievable from the ACL LISTSERV server (write to listserv@cs.columbia.edu for information). Hard copy submissions should consist of four (4) copies of the paper and one (1) copy of the identification page. For both kinds of submissions, if at all possible, a plain text version of the identification page should be sent separately by electronic mail, using the following format: title: author: <name of first student author> address: <address of first student author> ... author: <name of last student author> address: <address of last student author> abstract: < abstract> subject areas: <first area>, ..., <last area> Papers should be submitted to: Beryl Hoffman, Computer and Information Sciences University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104, USA; phone: +1-215-898-5868; fax: +1-215-898-0587 e-mail: hoffman@linc.cis.upenn.edu SCHEDULE: Submissions in either format must be RECEIVED by 1 FEBRUARY 1994. Late papers will not be considered. Receipt of submissions will be acknowledged by 5 FEBRUARY 1994. Authors will be notified of acceptance by 15 MARCH 1994. Authors will then have time to revise their papers, taking the reviews into account. Camera-ready copies of final papers prepared in a double-column format, preferably using a laser printer, must be received by 1 MAY 1994, along with a signed copyright release statement. The ACL LaTeX proceedings format is available through the ACL LISTSERV. STUDENT SESSIONS INFORMATION: contact Beryl Hoffman at the addresses above. ACL AND CONFERENCE INFORMATION: For other information on the conference and on the ACL more generally, contact Judith Klavans (ACL), Columbia University, Computer Science, New York, NY 10027, USA; +1-914-478-1802 phone/fax; acl@cs.columbia.edu. ********** I.A.3. Fr: Alan Smeaton <ASMEATON@COMPAPP.DCU.IE> Re: SIGIR'94 Formats Finalized The format for submission of papers, panel proposals, tutorial proposals and workshop proposals for the SIGIR94 conference in Dublin, Ireland, in July 1994, has been finalised and may be obtained by sending e-mail to sigir-format@compapp.dcu.ie. Remember the deadline for papers is 6 January 1994. You will automatically receive a copy of the format by return e-mail. If there are any problems or inordinate delays contact asmeaton@compapp.dcu.ie ********************************************************** II. QUERIES II.B.1. Fr: David Lewis <lewis@research.att.com> Re: Responses to Query on Automated Aids to Indexing Original Query filed both X-13-157 and X-15-159. I posted a query back in March to PACS-L, INDEX-L, and IR-L asking for information on automated aids to human controlled vocabulary indexing, and I realize that I never posted the responses I got, as promised. Here they are, very tardily. I was ambiguous in the phrasing of my query, and several people thought that I was only interested in "artificial intelligence" aids to indexing, which was not the case, but this influenced some of the answers below. Several people below recommended one or another version of an NFAIS report on such aids. This report is expensive, but I did find it to contain many useful pointers. David D. Lewis AT&T Bell Laboratories email: lewis@research.att.com 600 Mountain Ave.; Room 2C-408 ph. 908-582-3976 Murray Hill, NJ 07974; USA dept. fax. 908-582-7550 ***************************************************************** FROM: JEANNE BOHLEN <JLB@USIP.ORG>: I am new to PACS-L but saw your message about artificial intelliegence and indexing--I remember listening to an audio-tape from an ALA annual conference presentation--probably from 1990 or 1991--I think it was on the subject of thesauri building--it may have been a LITA program. In any case the part I am sure of was the description of thesaurus software being used by National Library of Medicine--it talked about how terms were suggested to the indexer, windows being used and terms being shown in a tree form. I believe there was another presenter talking about the Art and Architecture thesaurus being developed at the Getty Museum in California. The one most closely applicable to your question, though was at NLM. I hope this is helpful. ***************************************************************** FROM: C.G. CHUTE <CHUTE@MAYO.EDU>: Then, you should be aware of Susan Humphrey (humphrey@lhc.nlm.nih.gov) who is using a LISP based tool to create semantic frames for Medical Subject Heading (MeSH) concepts. This tool is intended to be used by journel indexing staff to assist their process. It is called MedIndex I think, and is fairly straight AI. One of her papers appeared in: BH Kwasnik, R Fidel (eds). Advances in Classification Research--Vol. II Proceedings of the 2nd ASIS SIG/CR Workshop on Classification Research. Medford, NJ: Learned Information Inc. 1992 under the title: Use and Management of Classification Systems for Knowledge-Based Indexing. I just happen to know this, because my paper is the one before hers. She probably has many better refs, and has created a 100 page techinical report, that is quite complete. [DDL: Nick Belkin also suggests Susanne Humphrey's work, as do I.] ***************************************************************** FROM: DAN CLARK, MUSIC LIBRARY, JMU <FAC_DCLARK@VAX1.ACS.JMU.EDU>: The new version of INMAGIC (called INMAGIC PLUS) offers what I think you described. They offer a number of data validation features including various types of "masking" which can force such things as making sure that a social security number be in the format ###-##-#### (that's number number number hyphen number number hyphen number number number number -- I'm not sure how the symbols will appear in your message) -- this would reject anything that didn't appear in that format or anything with an alphabetical character instead of numbers. It can do the same for letters, alpha fields only, numeric fields only, etc. It also, and I think this is more germain to your question, can create verification tables for any indexed fields. These verification tables can be single word or character string (up to around 60 or 70 characters, I think) and the creator of the database can either set the validation to "required", i.e., *must* match something in the table, or "over-rideable" or some such phrase, where the databa inputter can over-ride the verification and enter something different. The database can also be set up so that certain fields *must* contain a response. INMAGIC PLUS has been available now for a few months, but because we were waiting for new micros, we did not upgrade to this new version until just a few weeks ago, so what I've listed above is pretty much all I can tell you. Their address is: Inmagic Inc 2067 Massachusetts Ave. Cambridge, MA 02140-1338 Phone (617) 661-8124 Fax (617) 661-6901 ***************************************************************** FROM: CYNTHIA A. HODGSON 412-337-2434 <@MRGATE.AL.ALCOA.COM:HODGSON1%A1@ALFIE>: You may want to take a look at the recent publication: AUTOMATED SUPPORT TO INDEXING Gail M. Hodge 1992. 176 pp. The National Federation of Abstracting and Information Services 1429 Walnut Street Philadelphia, PA 19102 ISBN 0-942308-36-0 Softcover. Over a third of the book is made up of 23 narrative case studies of database producers describing the types of automated support used in their current indexing environments. A chapter on academic and corporate research projects lists representative examples of research projects in expert systems, online reference support, natural language processing, and knowledge base development. A chapter on commercial indexing software is written by another author, Sarah Syen. It's not an excellent book - only fair, and rather costly. Might want to try to borrow it from a library. ***************************************************************** FROM: JESSICA MILSTEAD <76440.2356@COMPUSERVE.COM>: Have you seen my article in Information Processing & Management (28(3)407-431, 1992) which has a good bit on this subject? Gail Hodge built on this work and extended it significantly in her Automated Support to Indexing, an NFAIS report published in 1992. ***************************************************************** FROM: EDWARD PAI <EPAI@CS.UCLA.EDU>: Have you seen the NFAIS report "Automated support to indexing" by Gail M. Hodge (report #3)? It reviews, in case study form, a wide variety of operational "automatic" indexing systems. NFAIS is the National Federation of Abstracting and Indexing Services, located in Phil., PA. You can call them at (215)563-2406. I just finished reading it, and while it really doesn't get into much detail about the techniques, etc. that systems use, it does provide pointers to the systems (i.e., to the people/places where these systems are). Also, as can be expected, some of the more interesting commercial systems (such as Topic) are not included. ***************************************************************** FROM: EDIE RASMUSSEN <EMR1@VMS.CIS.PITT.EDU>: There are a number of such systems, the ones I have seen published reports on are at NLM, NASA, and API (American Petroleum Institute). Some refs: Susanne M. Humphrey, MedIndEx System: Medical Indexing Expert System. IP&M 25(1): 73-88 (1989). Ronald L. Buchan, Computer Aided Indexing at NASA. Reference Librarian 13: 269-277 (Summer '87). E.H. Brenner et al, AMerican Petroleum Institute's Machine-Aided Indexing and Searching Project, Science & Technology Libraries 5: 49-62 (Fall '84). More currently, there is a section on computer aided indexing in a forthcoming ASIS monograph on indexing. Phil Smith (phil+@osu.edu) was coordinating that section and could probably give you more recent information. ***************************************************************** FROM: MICHAEL SCHWANTNER <MSC0H@FIZVAX.KFK.DE>: Although I do not claim the AIR-System (which I think you already know) to be an artificial intelligence product, it is really an aid for indexing with controlled vocabulary: the FIZ Karlsruhe uses it for the input production of the PHYS database since 1985; about 5000 documents are indexed with AIR weekly. Actually, we plan to extend the system, so that it can be used for other bibliographic databases. For this task, we are searching partners for cooperation. I have listed some literature below; the last article will probably be the most helpful for you. I for myself am VERY interested in the (hopefully many) answers you will get! Martinez, C.; Lucey, J.; Linder, E.: An Expert System for Machine-Aided Indexing J. of Chemical Information and Computer Sciences 27(4), 1987, pp 158-62 Klingbiel, P.H. Phrase Structure Rewrite Systems in Information Systems Inf. Proc. & Managment 21, 1985, pp 113-6 [In these two approaches a dictionary is used which consists of replacement rules of the form (text-term is-to-be-replced-by descriptor). The rules are derived from a thesaurus and can be supplemented manually.] Todeschini, C.; Farell, M.P. An Expert System for Quality Control in Bibliographic Databases JASIS 40(1), 1989, pp 1-11 Hamill, K.A.; Zamora, A.: The Use of Titles for Automatic Document Classification JASIS 31(6), 1980, pp 396-402 [In both papers statistical relations between terms and classification codes are described.] Milstead, J.L. Methodologies for Subject Analysis in Bibliographic Databases Paper and Report of meeting sponsored by International Atomic Energy Agency and Energy Technology Data Exchange, 1990. The JELEM Company, PO Box 5063, Brookfield, CT 06804 USA, 66 pp. [Very good survey; lists and describes many systems and approaches. There is probably a revised version in progress...] Finin, T; Silvermann,D. (1986): Interactive Classification as a Knowledge Acquisition Tool. In: Kerschberg, L. (Ed.): Proc. of 1st Int. Workshop on Expert Database Systems, pp.79-80. Benjamin/Cummings Publ. Comp. Linda C. Malone, Julie Wildman-Pepe, James R. Driscoll (1990): Evaluation of an Automated Keywording System. Microcomputers for Information Management. 7(2), June 1990, pp.127-148 Hikomaro Sano (1991): Extraction of Facet Terms from Article Titles and their Display in Tabular Form. J. of Information Science 17 (1991), pp. 43-48. Wei Li; Lee, B.; Krausz, F.; Sahin, K. (1991): Text classification by a neural network. Editor(s): Pace, D. : Proceedings of the 1991 Summer Computer Simulation Conference. Twenty-Third Annual Summer Computer Simulation Conference San Diego, CA, USA, 1991. p.313-18 [System extracts relationships between input data and output classes automatically, runs on VAX.] Hayes, P.J.; Andersen, P.M.; Nirenburg, I.B.; Schmandt, L.M. (1990): TCS: a shell for content-based text categorization Sixth Conference on Artificial Intelligence Applications Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1990. p.320-6 [Commercial system, rule-based.] Giere, W.; Dettmer, H. (1986): Free text classification and retrieval based on a thesaurus: eight years of experience at the Johann-Wolfgang-Goethe University Medical School. Proceedings of the Tenth Annual Symposium on Computer Applications in Medical Care. Washington, DC, USA: IEEE Comput. Soc. Press, 1986. p.85-8 [Autopsy reports are automatically classified using a thesaurus.] Arapov, M. V. (1980): Mathematical models of classification in application to some problems of statistical linguistics. Viks, Ue. (Hrsg.): Computational linguistics and related topics. Summaries of a symposium. Reval, SU, 1980, p. 14-16. ***************************************************************** FROM: YIMING YANG <YANG@MAYO.EDU>: Here are a few lines about our text categorization system. Our system is an example-based approach to automatic classification. The system "learns" an empirical mapping function, according to the likelyhood suggested in a training set, that is, a collection of texts with human assigned categories. A least squares fit (LSF) technique is used the computing such a mapping function. We use the LSF mapping function to determine the relevance scores of likely matches in a search space. This system has been evaluated with both library document retrieval and clinical classification of patient records; superior performance has been observed, compared to alternative approaches [COLING 92], [SIGIR 93]. This system is currently used in the clinical classification at the Medical Information Resources at Mayo, as automated aids for human experts in assigning disease categories to the textual descriptions in patient records. Those categories are used for indexing our clanical database. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: NCG@UCCMVSA.UCOP.EDU Send submissions to IRLIST to: NCG@UCCMVSA.UCOP.EDU Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet or ncgur@uccmvsa.ucop.edu Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.