Information Retrieval List Digest 393 (February 16, 1998) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-393.txt IRLIST Digest ISSN 1064-6965 February 16, 1998 Volume XV, Number 7 Issue 393 ****************************************************************** II. JOBS 1. UW-Milwaukee: SLIS: Two Assistant Professors III. NOTICES A. Publications 1. FARNET's: Washington Update, February 2, 1998 2. Conference Proceedings: The Global Digital Library 3. Statistical Methods for Speech Recognition by Jelinek B. Meetings 1. IRAL'98 2. Wholes and Their Parts 3. 1998 Machine Learning Conference 4. Workshop: Applications of Machine Learning and Data Mining in Finance C. Miscellaneous 1. New CDL Associate Director, Shared Collections and Services 2. 1998 Oxford Program 3. NLP and the Best Theory of Syntax IV. PROJECTS C. Awards, Fellowships, Grants, & Scholarships 1. Peace Digital Library Internship ****************************************************************** II. JOBS II.1. Fr: Dietmar Wolfram Re: UW-Milwaukee: SLIS: Two Assistant Professors University of Wisconsin-Milwaukee School of Library and Information Science Two New Faculty Positions: Assistant Professor, SLIS The School of Library and Information Science (SLIS) at the University of Wisconsin-Milwaukee (UWM) invites applications for two new full-time tenure-track positions at the Assistant Professor level. The individuals will teach courses and conduct research in one or more of the following areas: information processing, human computer interaction, records management, information science, indexing, information or multimedia technology, or database systems design and analysis. The successful applicants will teach courses in the recently approved undergraduate B.S. program in Information Resources and the graduate M.L.I.S. program. A Ph.D. in Information Science or related field is required as is demonstrated ability in research and teaching. Competitive salary for an academic year (9 month) appointment, plus additional compensation for possible summer teaching and generous fringe benefits. The University of Wisconsin-Milwaukee is a major university committed to academic excellence. It is one of the two doctoral degree-granting institutions in the multi-campus University of Wisconsin system, and has a student enrollment of over 22,000. The School of Library and Information Science offers programs leading to a nationally accredited Masters in Library and Information Science, a B.S. program in Information Resources, a certificate in advanced studies, and a multidisciplinary doctorate. The School has a strong research faculty, 350+ students, and state-of-the-art information technology laboratories. UWM is located in the cultural, commercial, and educational hub of the state, in a pleasant residential neighborhood overlooking Lake Michigan. Deadline of application: Postmarked by March 2, 1998 The starting date is August 24, 1998. Send letters of application, resume, and three letters of reference to: Dr. Judith J. Senkevitch, Chair, Executive Committee School of Library and Information Science University of Wisconsin-Milwaukee P.O. Box 413 Milwaukee, WI 53201 Phone: (414) 229-5027 Fax: (414) 229-4848 Email: senkevit@csd.uwm.edu The University of Wisconsin-Milwaukee is an affirmative action, equal opportunity employer with a strong commitment to the diversity of faculty, staff, and student body. ****************************************************************** III. NOTICES III.A.1. Fr: Garret Sern Re: FARNET's: Washington Update, February 2, 1998 FARNET'S WASHINGTON UPDATE ---FEBRUARY 2, 1998 FARNET (http://www.farnet.org) is a non-profit public interest Internetworking organization with a primary focus on the education, research and related communities. IN THIS ISSUE: Federal technology programs may receive financial books from proposed "21st Century Research Fund" White House releases green paper on Privatization of Internet Domain Name System >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Written from FARNET's Washington office, "FARNET's Washington Update" is a service to FARNET members and other interested subscribers. We gratefully acknowledge EDUCOM's NTTF and the Coalition for Networked Information (CNI) for additional support. If you would like more information about the Update or would like to offer comments or suggestions, please contact Garret Sern at garret@farnet.org. ********** III.A.2. Fr: Joan K Lippincott Re: Conference Proceedings: The Global Digital Library A full report of the presentations given at Beyond the Beginning: The Global Digital Library, an international conference organised by the UK Office of Library Networking on behalf of JISC, CNI, BLRIC, CAUSE and CAUL and held on June 16 and 17, 1997 in London, UK, is now available at a mirror site at CNI: A wide variety of speakers from the UK, US, Japan, and Australia gave presentations on topics concerning the global digital library, scholarship, and higher education. According to the compilation editor, a number of themes emerged from the conference, including: "- the developing world of digital information, and its impacts on professionals and infrastructures; - research and development programmes, notably those of the European Union, the UK's eLib, the British Library, Japan, and Die Deutsche Bibliothek; - the changing relationships between information, education and learning, with their fascinating and tantalising glimpses of possible societal futures; - measuring activities in the information field, in real institutions and in research environments; - progress in the essential field of metadata, where efforts continue to make the internet live up to its potential by making its contents easy to navigate; - recent developments in the domain of user authentication; - current issues in the fraught area of intellectual property, source of thorny problems made even sharper when global requirements are taken into account." ********** III.A.3. Fr: Jud Wolfskill Re: Statistical Methods for Speech Recognition by Jelinek The following is a book which readers of this list might find of interest. For more information please visit http://mitpress.mit.edu/promotions/books/JELSHF97 Statistical Methods for Speech Recognition Frederick Jelinek This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author's goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques. Language, Speech, and Communication series. A Bradford Book. January 1998. 300 pp. ISBN 0-262-10066-5. MIT Press * 5 Cambridge Center * Cambridge, MA 02142 * (617) 625-8569. Jud Wolfskill Publicity Assistant Phone: (617) 258-0603 MIT Press Fax: (617) 258-6779 Five Cambridge Center E-mail: wolfskil@mit.edu Cambridge, MA 02142-1493 http://mitpress.mit.edu ********** III.B.1. Fr: K. Rajaraman Re: IRAL'98 Call for Papers The 3rd International Workshop on Information Retrieval with Asian Languages - IRAL'98 15-16 October, 1998 Organized by, and to be held at: Kent Ridge Digital Labs (KRDL) Singapore (KRDL is a new Research Institute incorporating the former Kent Ridge Digital Labs and the Information Technology Institute) In cooperation with: ACM SIGIR (pending) ACM Hong Kong Chapter SIG-NLP, Information Processing Society of Japan SIG-DBS, Information Processing Society of Japan Japanese Association for Natural Language Processing Association for Computational Linguistics and Chinese Language Processing, Taiwan Singapore Computer Societt CONFERENCE WEB SITE: URL: http://sdmc.krdl.org.sg/IRAL98 ABOUT THE WORKSHOP: The purpose of the IRAL workshop is to bring together researchers and developers who are interested in exchanging new ideas and presenting results in the field of information retrieval (IR), with an emphasis on the issues related to Asian languages and multilingual applications. The first International Workshop was held with the name "Information Retrieval with Oriental Languages" in 1996, in Taejon, Korea, and the second, renamed as "Information Retrieval with Asian Languages" to increase the scope, was held in Tsukuba City in Japan in 1997. INSTRUCTIONS FOR CONTRIBUITORS: Papers (4 hardcopies) should be submitted in English only to the Program Chair of the 3rd International Workshop as follows: Dr. Mun-Kew Leong (IRAL'98 submission) Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore 119613 Email: mkleong@krdl.org.sg Electronic submissions WILL NOT be supported because of the difficulty of printing different language fonts. Papers should be at most 5000 words in length, and should be double-spaced. The first page must contain the title of the paper and an abstract of no more than 100 words, and no indication about the author(s) and affliation(s). In addition, authors must attach a separate page with the title, the author name(s) and respective affiliations, plus complete contact information (mailing address, telephone, fax, email) for the author to whom correspondence should be sent. Email will be the default means of communication. IMPORTANT DATES 15 May 1998: The deadline for receipt of papers (4 hardcopies) 13 Jul 1998: Notification of result to authors (by email) 7 Aug 1998: Final manuscript due in camera ready format ********** III.B.2. Fr: Roberto Poli Re: Wholes and Their Parts WHOLES AND THEIR PARTS (W/P) Bolzano, Maretsch Castle, 17-19 June 1998, Italy 1. The list of speakers includes Bill Lawvere, John Bell, Ieke Moerdjik, Colin McLarty, Carlo Cellucci, Steve Vickers, Gonzalo Reyes, John Mayberry, Niles Eldredge, Alberto Peruzzi, Roberto Poli, Ettore Casari, Alf Zimmer, Ron Langacker, George Lakoff, Basil Hiley. 2. Having received a number of requests for adding further talks to the conference's programme, Alberto and I decided to reorganize the schedule of the works as to find time for some short presentations of 20' each.Scholars interested in giving a short presentation (20') should submit before March 31 an extended abstract (< 5,000 words) to the addresses below. Notification will be mailed to the authors for April 15. We are also ready to consider contributions to the volume of proceedings. Information will be mailed to the interested scholars in due time. 3. To keep you updated with more information on the conference this is the URL for the W/P home page: http://www.soc.unitn.it/dsrs/IMC.htm 4. Conference committee: Alberto Peruzzi: peruzzi@dada.it Roberto Poli: poli@risc1.gelso.unitn.it Roberto Poli Department of Sociology and Social Research 26, Verdi street 38100 Trento -- Italy Tel. ++39-461-881-403 Fax: ++39-461-881-348 e-mail: poli@risc1.gelso.unitn.it Axiomathes: http://www.soc.unitn.it/dsrs/axiomathes.htm IMC: http://www.soc.unitn.it/dsrs/IMC.htm ********** III.B.3. Fr: Jude Shavlik Re: 1998 Machine Learning Conference FINAL Call for Papers THE FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING July 24-26, 1998 Madison, Wisconsin, USA The Fifteenth International Conference on Machine Learning (ICML-98) will be held at the University of Wisconsin, Madison from July 24 to July 26, 1998. ICML-98 will be collocated with the Eleventh Annual Conference on Computational Learning Theory (COLT-98) and the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98). Seven additional conferences, including the Fifteenth National Conference on Artificial Intelligence (AAAI-98), will also be held in Madison (see http://www.cs.wisc.edu/icml98/ for a complete list). Submissions are invited that describe empirical, theoretical, and cognitive-modeling research in all areas of machine learning. Submissions that present algorithms for novel learning tasks, interdisciplinary research involving machine learning, or innovative applications of machine learning techniques to challenging, real-world problems are especially encouraged. ***** The deadline for submissions is MARCH 2, 1998. ****** (An electronic version of the title page is due February 27, 1998.) PLEASE NOTE THAT THESE ARE FIRM DEADLINES. See http://www.cs.wisc.edu/icml98/callForPapers.html for submission details. (There also will be three joint ICML/AAAI workshops. The submission deadline for these WORKSHOPS is MARCH 11, 1998. See http://www.cs.wisc.edu/icml98/ for further details.) ********** III.B.4. Fr: Elmar Steurer Re: Workshop: Applications of Machine Learning and Data Mining in Finance Workshop: Application of Machine Learning and Data Mining in Finance 10th European Conference on Machine Learning (ECML-98) Chemnitz, Germany, April 24 1998 General Information: In conjunction with the 10th European Conference on Machine Learning (ECML-98) the workshop "Application of Machine Learning and Data Mining in Finance" will be held in Chemnitz, Germany, on April, 24th 1998. The main conference takes place from April, 21st to 23rd 1998. Motivation: Advanced data analysis and forecasting technologies such as neural networks, symbolic machine learning and genetic algorithms are being increasingly applied to support financial asset management and credit risk management. These methods are considered by many financial management institutions as innovative technologies to support conventional quantitative techniques. Their use in computational finance will have a major impact in the modelling of the currency markets, in tactical asset allocation, bond and stock valuation and portfolio optimisation. In addition the application of these tools for scoring tasks delivers valuable support for the management of client credit risk. Targets: This workshop is designed to bring together researchers in the field of Machine Learning with those practicing financial consulting. The purpose is twofold: - Practitioners should become familiar with the state of the art in machine learning research for predictive modelling and scoring systems. - The research community should receive ideas and requirements from participants from the financial world with the aim to improve the acceptance of Machine Learning applications and to identify future areas of research. Research papers representing new and significant developments in methodology as well as applications of practical use will be presented. Topics include: Application aspects: - Scoring systems: Application and Behavioural Scoring - Trading- and forecasting models - Volatility models - Value at Risk - Financially motivated objective functions Methodological aspects: - Symbolic Learning in financial engineering - Neural Networks for financial applications - Aspects and dependencies of data transformation and model selection - Backtest procedures: Advantages and bottlenecks - Pre-testing as an alternative to backtest - Data Mining process model for financial applications Submission of papers: Authors wishing to present a paper should send an electronic version (uuencoded compressed PostScript) not later than 28 February 98 to: Dr. Elmar Steurer DAIMLER-BENZ AG - Research and Technology Postfach 2360 89013 Ulm Tel.: 0049 - 731 / 505 -2868 Fax: 0049 - 731 / 505 4210 Email: elmar.steurer@dbag.ulm.DaimlerBenz.COM Accepted papers will be published in the workshop notes. Selected papers will be issued in a proceedings. Contributors will be allocated 20 minutes for an oral presentation during the workshop. Further invited talks and a panel discussion are planned. Important Dates: Submission deadline: 28 February 1998 Notification of acceptance: 15 March 1998 Camera ready copy: 28 March 1998 Workshop: 24 April 1998 For further information about the main conference and registration please contact: ecml98@lri.fr ecml98@informatik.tu-chemnitz.de or visit the web site: http://www.tu-chemnitz.de/informatik/ecml98 ********** III.C.1. Fr: Mary Jean Moore Re: New CDL Associate Director, Shared Collections and Services Beverlee French Accepts Post as Associate Director, Shared Collections and Services, for the California Digital Library Richard Lucier, University Librarian and Executive Director of the California Digital Library, is pleased to announce the appointment of Beverlee French as Associate Director, Shared Collections and Services, the California Digital Library. The University of California's President, Richard C. Atkinson, established the California Digital Library in the fall of 1997. As part of its mission, the CDL will provide access to shared digital collections that support the University's research and teaching missions and offer services that facilitate access to those collections. Ms. French will be responsible for planning and implementing the shared collections of the CDL. Her work will involve substantial collaboration with senior librarians and faculty in developing innovative plans and programs and devising implementation strategies for expanding the University's shared digital holdings. Richard Lucier says, "The depth and breadth of Ms. French's experience in both the print and electronic world, and her long history with UC libraries, will be a real asset as the CDL develops a vision to guide the development of UC's digital collections. These collections will include not only digital versions of the traditional published literature but also UC's unique special collections in digitized format, databases available only in digital form, and new forms of scholarly and scientific communication." Ms. French has a wide range of expertise in collection development, library management and technology, and public services. Most recently she has served as the Assistant/Associate University Librarian for Sciences and Systems at UC Davis (1992-present), where she administered the science libraries (Biological and Agricultural Sciences Department, Physical Sciences, Health Sciences, and Medical Center Libraries), Government Information and Maps, and library computing services. She has also been Acting Assistant University Librarian for Collections (1988-89) and Assistant University Librarian for the Sciences (1987-1992) at UC Davis, as well as Chair of two systemwide committees, Heads of Public Services and the Computer Files Committee. Prior to her appointment at UC Davis, she served as Head of the Science and Engineering Library, and as a reference librarian and cataloger at UC San Diego. She holds an A.B. in social sciences and an M.L.S. from UC Berkeley. ********** III.C.2. Fr: Oxford Program Coordinator Re: 1998 Oxford Program LIBRARIANS AND GRADUATE STUDENTS -- STUDY ABROAD AT OXFORD The University of North Carolina at Chapel Hill's School of Information and Library Science (SILS) and Oxford University's world-renowned Bodleian Library are offering "Libraries and Librarianship: Past, Present and Future" for the sixth year. This two-week seminar, held at Oxford University in England, will trace the Bodleian Library's past and chart the future of information and technology. This is a unique opportunity for professionals and graduate students in the fields of information and library science to discuss trends in academic librarianship and to meet professional peers from around the world. Participants have included corporate librarians, special collection librarians, graduate students and retired librarians. Participants learn about developments in library automation in Britain and Europe as well as preservation, conservation and collection policies that represent both the Oxford and British national viewpoints. The program is instructional in nature; UNC-CH SILS students may earn course credit (3 credit hours). There is an additional fee of $200 to gain course credit and a written paper is required. (Non-UNC-CH SILS students should check with their respective institutions to inquire about requirements for receiving credit in their programs.) The series of presentations is supplemented by visits to the Bodleian Library, some Oxford College libraries, the City of Oxford Central Library and to the headquarters of Oxford University Press and Blackwell's, the booksellers. Additional information about course content is available from the SILS webpage at http://ils.unc.edu (in the continuing education section). The dates for the program are May 17-30, 1998. The cost of the program is $2300 for shared accommodations and $2540 for single accommodations, and does NOT include transportation to Oxford, England Participants receive room and board at Rewley House in Oxford. The deadline for registration for the Oxford Seminar is March 31, 1998. If you wish to receive a brochure for the 1998 Oxford program, contact Lucia Zonn, UNC-CH SILS, (919) 962-8366, or oxford@ils.unc.edu. ********** III.C.3. Fr: Philip A. Bralich, Ph.D. Re: NLP and the Best Theory of Syntax On March 17th I will be giving a talk at the University of Hawaii's Linguistic Department Tuesday Seminar called, "The Best Theory of Syntax." In this talk I intend to make the rather non-controversial point that, the best theory of syntax must necessarily be the one that demonstrates itself to be most completely implemented in a programming language. I am writing to the group to ask for references, obscure or otherwise, where this basic proposition has been put forth before in the literature or through personal communications. Comments, criticism, and discussion of this argument are also welcome. I will post a summary of the references to the list. (Be sure and mention if you do not want your name mentioned in the summary). Some might argue that I am merely putting complex arguments into simple language but these arguments have substance and effect in either simple or complex langauge. This is especially true when we are dealing with the application of syntax to a multi-billion dollar industry such as NLP. More specifically, I intend to present the argument that the best independent and objective measure of a theory of syntax' overall effectiveness is its ability to generate, in a computer program, standard grammatical structures and to manipulate these structures in the same way as users of the language being described. That is, I intend to argue that the best theory of syntax is the one that produces the best parsers. Following that I will present a very ordinary set of standards for the evaluation of parsers and then based on the comparison of theories using those standards, I will argue that the theory of syntax that underlies the Ergo Linguistic Technologies' parser is the best theory of syntax and that all others should be relegated to the scrap heap of "wannabe" theories until such time as they can produce equal or better parsers. The logic that I will present to support this is: 1) If there is ever to be a way to determine which of the competing, extant theories of syntax is preferable to the others, there must be an independent and objective means of weighing the relative value and completeness of these theories in terms of their ability to accomplish the tasks they were originally designed for. Specifically, there must be an independent and objective means of verifying which theories are indeed most capable of expressing all and only those generalizations about language that describe and explain the observed facts of their structure. 2) Since computers have the ability to represent and execute binary algorithms, any theory that is composed of binary algorithms should be able to be implemented in a programming language. Thus, any theory of syntax that has reached a level of maturity should be able to represent its generalizations in working parsers. In fact all programming languages and compilers are based on early syntactic discoveries like phrase structure rules and Noam Chomsky is the default reference for much of the early work, and have already demonstrated their aptness for this sort of comparison. 3) The degree to which a theory of syntax and its algorithms cannot be implemented in a programming language is the degree to which that theory and its algorithms have not been completely or correctly worked out and should not be considered a mature enough theory to be included in the discussion of which theory is to be preferred. 4) The theory which is most thoroughly worked out will naturally have the most thorough and comprehensive parsing programs associated with it, and for that reason is to be considered the best theory of syntax as determined by this independent, objective criteria. I will also propose a method for judging which theories have been "best" implemented in a programming language. Specifically, I will argue that the standards described below are the minimum standards that a theory of syntax would have to parse in order to be able say that it had reached some level of maturity and also this same set of criteria would be used to determine exactly which theories of syntax had most effectively accomplished the task of modeling the mechanisms that generate all and only the sentences of a language. In addition, the comparison of individual parses will of course use the Penn Treebank II guidelines established by the Linguistic Data Consortium at the University of Pennsylvania. Of course, any theory of syntax, whatever its assumptions and methods, should be able to translate its structures into the Penn Treebank style if their work is thorough and complete. The ability to generate these labeled brackets and trees in itself constitutes a good test of a theories maturity. The motivation for such comparisons and standards is of course to provide an independent and objective means of evaluation of the merits and relative success of research in this area that can be judged and discussed not only by those with a particular theoretical orientation, but also by those with different theoretical backgrounds, those in different areas of linguistics, and of course those from fields outside of linguistics who need to evaluate and discuss such materials. THE STANDARDS: In addition to using the Penn Treebank II guidelines for the generation of trees and labeled brackets and a dictionary that is at least 35,000 words in size and works in real time and handles sentences up to 15 to 20 words in length, we suggest that NLP parsers should also meet standards in the following seven areas before being considered "complete." The seven areas are: 1) the structural analysis of strings, 2) the evaluation of acceptable strings, 3) the manipulation of strings, 4) question/answer, statement/response repartee, 5) command and control, 6) the recognition of the essential identity of ambiguous structures, and 7) lexicography. (These same criteria have been proposed for the coordination of animations with NLP with the Virtual Reality Modeling Language Consortium--a consortium (whose standards were recently accepted by the ISO) designed to standardize 3D environments. (See http://www.vrml.org/WorkingGroups/NLP- ANIM). It is important to recognize that EAGLES and the MUC conferences, groups that are charged with the responsibility of developing standards for NLP do not mention any of the following criteria and instead limit themselves to largely general characteristics of user acceptance or vague categories such as "rejects ungrammatical input" rather than specific proposals detailed in terms of syntactic and grammatical structures and functions that are to be rejected or accepted. The EAGLES site is made up of hundreds of pages of introductory material that is very confusing and difficult to navigate; however, once you actually find the few standards that are being proposed you will find that they do not come close to the level of precision and depth that is being proposed here and for that reason should be rejected until such time as these higher and more demanding levels of expectation of the NLP systems is included there as well. These are serious matters and a group like EAGLES should not ignore extant NLP tools simply because they are not mainstream or because mainstream parsers cannot meet these requirements (evnthough the Ergo parser is better known than almost all other parsers). Just go through their pages and try to find EXACTLY what a parser is expected to do under these guidelines. There is almost no reference to specific grammatical structures, the Penn Treebank II guidelines, or references to current working parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html). If the EAGLES' standards are ever to gain any credibility and respect they are going to have to be far more specific about grammatical and syntactic phenomena that a system can and cannot support. There should also be some requirement that the systems being judged offer a demonstration of their abilities to generate labeled brackets and trees in the style of the Penn Treebank II guidelines. I suggest the following as a far more exacting and far more demanding test of systems than is offered by EAGLES or any of the MUC conferences. HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS: 1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts of sentence, 3) identify internal clauses (what they are and what their role in the sentence is as well as the parts of speech, parts of sentence and so on of these internal clauses), 4) identify sentence type (without using punctuation), 5) identify tense and voice in main and internal clauses, and 6) do 1-5 for internal clauses. 2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3) give the number of correct parses identified, 4) identify what sort of items succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the number of unacceptable parses that were tried, and 6) give the exact time of the parse in seconds. 3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser should: 1) change yes/no and information questions to statements and statements to yes/no and information questions, 2) change actives to passives in statements and questions and change passives to actives in statements and questions, and 3) change tense in statements and questions. 4. At a minimum, based on the above basic set of abilities, any such device should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE REPARTEE, he parser should: 1) identify whether a string is a yes/no question, wh-word question, command or statement, 2) identify tense (and recognize which tenses would provide appropriate responses, 3) identify relevant parts of sentence in the question or statement and match them with the needed relevant parts in text or databases, 4) return the appropriate response as well as any sound or graphics or other files that are associated with it, and 5) recognize the essential identity between structurally ambiguous sentences (e.g. recognize that either "John was arrested by the police" or "The police arrested John" are appropriate responses to either, "Was John arrested (by the police)" or "Did the police arrest John?"). 5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate structures such as the following: 1) existential "there" sentences with their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on the porch"), 2) passives and actives, 3) questions and related statements (e.g. "What did John give Mary" can be identified with "John gave Mary a book."), 4) Possessives should be recognized in three forms, "John's house is big," "The house of John is big," "The house that John has is big," 5) heads of phrases should be recognized as the same in non-modified and modified versions ("the tall thin man in the office," "the man in the office," the tall man in the office" and the tall thin man in the office" should be recognized as referring to the same man (assuming the text does not include a discussion of another, "short man" or "fat man" in which case the parser should request further information when asked simply about "the man")), and 6) others to be decided by the group. 6. At a minimum from the point of view of COMMAND AND CONTROL, the parser should: 1) recognize commands, 2) recognize the difference between commands for the operating system and commands for characters or objects, and 3) recognize the relevant parts of the commands in order to respond appropriately. 7. At a minimum from the point of view of LEXICOGRAPHY, the parser should: 1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical items, 3) recognize a variety of grammatical features such as singular/plural, person, and so on, 4) recognize a variety of semantic features such as +/-human, +/-jewelry and so on, 5) have tools that facilitate the addition and deletion of lexical entries, 6) have a core vocabulary that is suitable to a wide variety of applications, 7) be extensible to 75,000 words for more complex applications, and 8) be able to mark and link synonyms. THE CONCLUSIONS I WILL DRAW FROM THIS ARE: 1) The theory that underlies the software at Ergo Linguistic Technologies is not only the best theory of syntax, but is the ONLY theory of syntax that has reached a sufficiently developed state to even attempt the standards described here. 2) Those who do not mention this theory in their research proposals, grant applications, publications and so on are guilty of negligence (and could be sued if there are grants, contracts, jobs, or other such items of material value at stake and where the offerer of these jobs, grants, etc has reason to expect that the applicant is an expert in his field and is providing an accurate picture of the competitive environment). In addition, computational linguistics departments who do not mention these tools or use tools of this calibre are remiss in their duty to present the full range of available materials to their students. 3) All current theories of syntax such as Chomsky's latest or even older versions of his theory HPSG, LFG, etc. should all be relegated to the scrap heap of "wannabe" systems until such time as they have been worked out in sufficient detail to allow the creation of programs that can execute their algorithms to the degree required by the above standards. (I do not want to imply that the use of these theories to analyze the worlds' languages cannot or has not contributed greatly to the store of knowledge about the nature of the world's langauges. As a matter of fact the theory that we are working with owes a tremendous debt to all the work that has come before it in the form of these earlier theories. The only problem is that these other theories have not yet completed their basic research and have not yet reached a level of sufficient maturity to work with the standards described above and for that reason can only be considered works in progress or "wannabe" theories.) I will finish my UH talk with a demonstration of the software that has been developed from our theory of syntax focusing on demonstrations from the seven standards described above and handouts from the output of other parsers. In addition to our standard demo as seen on our web site http://www.ergo-ling.com), I will use the tools called "The BracketDoctor" (a device that generates labeled brackets and trees in the style of the Penn Treebank II guidelines) "The English Sentence Enhancer" (an ESL grammar checker) "The Logic Doctor" (a program that handles first order predicate calculus, syllogistic reasoning, inferrencing and basic logic) and "The Q&A Demo" ( a program that shows our ability to handle question/answer, statement/response repartee) to demonstrate our strengths using the Penn Treebank II style trees and labeled brackets as well as practical illustrations to demonstrate the abilities of our theory of syntax in those seven areas. (All these tools except the "Logic Doctor" and the "Q&A Demo" are available for free download from our web site at http://www.ergo-ling.com or by email by writing me at bralich@hawaii.edu. These are Windows 95 programs that fit on one disk and can be installed with a standard setup function from WIN95.) Please be advised that these programs are copyrighted and patent pending. In sum, I would like to know of references and to receive comments in support of or against the following argument: 1) that computers are the ideal devices for comparing different theories' abilities to model the phenomena they seek to describe (all and only the grammatical sentences of a languga); 2) that any theory that can not be fully implemented in a programming language as described in the standards outlined above, is flawed in some way; and 3) that the best independent and objective measure of a theories scope, efficiency, and effectiveness is the degree to which it can be implemented in a programming language. (Of course, the basis for judgment will be the Penn Treebank II guidelines and the standards described above). Then based on the ability of the Ergo Linguistic's tools to compete in all the standards, I suggest that the theories of Brame, Chomsky, Kaplan and Bresnen, Pollard and Sag, Starosta, et al., be set aside until such time as they can be shown to generate programs that are as good or better than those produced at Ergo Linguistic Technologies' offices. Phil Bralich P.S. We recommend that you download these tools and take them with you (on a lap top is best of course) to any linguistics, NLP, Computational Linguistics, MT, or logic conference or workshop that will discuss work in these areas. It should provide you with an interesting source of comparison material as well as with some interesting and challenging questions for the presenters. Of course, this may also be of value for students in their classes. Linguistics and Computer Science departments that are currently not committed to any particular theory of syntax or approach might want to consider collaborative involvements with this theory as a means of producing commercially viable products and as a source of research grants. You may also wish to compare results in published reports with results that these tools provide. You may also want to email copies of one or more of these tools to classmates, teachers, and co-workers (please avoid sending them to competitors like a big bunch of unordered pizzas). P.P.S. As the field of linguistics is dominated by very intelligent, very informed individuals who are also quite competitive, you can measure the success of this argument on the field overall by the reactions of the readers to this post--the smaller the response, the higher the acceptance (begrudging though it may be). That is, people are certainly willing to criticize any argument they can, but they merely keep quiet if they cannot. Praise for a competitor's arguments is not likely. Thus, a lack of criticism should be interpreted as acceptance of these arguments. Philip A. Bralich, President Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 tel:(808)539-3920 fax:(880)539-3924 ****************************************************************** IV. PROJECTS IV.C.1. Fr: Margarita Studemeister Re: Peace Digital Library Internship January 21, 1998 Jeannette Rankin Library Program United States Institute of Peace Internship Opportunity Effective immediately, an internship position is available at the Jeannette Rankin Library Program to support the development of a peace agreements digital collection. This is an unpaid educational experience of short duration designed to advance the professional development of graduate students, primarily, but not exclusively, in the field of library and information science. The United States Institute of Peace is an independent, nonpartisan federal institution created by Congress to promote research, education, and training on the peaceful resolution of international conflicts. Established in 1984, the Institute meets its congressional mandate through an array of programs, including research grants, fellowships, professional training programs, conferences and workshops, library services, publications and other educational activities. The Institute's Jeannette Rankin Library Program seeks to create a digital collection of post-1989 peace agreements ending international conflicts, and to contribute to the development of a federal digital library of international relations resources. Specifically, the initial goals of the digital collection of peace agreements are: * To collect, maintain and provide access via the World Wide Web to the full text in English of peace agreements related to inter- and intrastate conflicts since 1989, as a research and learning tool on peaceful means to end international conflict. * To collaborate with the United States Information Agency (USIA) in the creation of a digital library with a focus on international affairs by contributing metadata, or information about the Institute's digital collection of peace agreements. * To create a valuable digital collection for scholars and practitioners, and the public at large, compatible with the mission of the Institute. To our external audience, the peace agreements will be accessible in several ways: * Via the Institute's World Wide Web site (), and searchable using the site's search engine. * Thru queries on World Wide Web search engines, such as Alta Vista, Lycos, etc. * From the end-user interface of the USIA-sponsored digital library. The unpaid internship position involves approximately 15 hours of work per week during a semester. Students may be able to arrange for academic credit through their university. Interns will be selected on the basis of motivation; an informed interest in international peace negotiations, or other issue in international relations relevant to the Institute's mission; computer literacy; attitude, skills and ability to develop and work in accordance to goals and objectives, to work independently and contribute to a team, and to effectively utilize guidance in the implementation of work plans; and, ability to draft and generate reports and correspondence. Interested individuals should submit a cover letter expressing interest in the unpaid internship position and describing background in international relations and library science, a completed application form (see below) and a resume. All applicants will be considered according to the above criteria, and the needs and priorities of the digital collection project. For more information contact: Margarita S. Studemeister, Director Jeannette Rankin Library Program United States Institute of Peace 1550 M Street NW Suite 700, Washington, D.C. 20005-1708 email: mss@usip.org; phone/voice mail: 202 429 3850; fax: 202 429 6063 ****************************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: nancy.gusack@ucop.edu Editorial Staff: Nancy Gusack nancy.gusack@ucop.edu Cliff Lynch (emeritus) cliff@cni.org The IRLIST Archives is set up for anonymous FTP. Using anonymous FTP via the host ftp.dla.ucop.edu, the files will be found in the directory /data/ftp/pub/irl, stored in subdirectories by year (e.g., data/ftp/pub/irl/1993). Search or browse archived IR-L Digest issues on the Web at: http://www.dcs.gla.ac.uk/idom/irlist/ These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.