Information Retrieval List Digest 396 (March 9, 1998) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-396.txt IRLIST Digest ISSN 1064-6965 March 9, 1998 Volume XV, Number 10 Issue 396 ****************************************************************** I. QUERIES 1. IR Systems and Software 2. My Last NLPL Post (of this thread) II. JOBS 1. RAND: Data Librarian 2. U. Tennessee/Knoxville: Director, SIS III. NOTICES A. Publications 1. Katharine Sharp Review Call for Papers 2. IWE -- El Profesional de la Informacion (formerly Information World en Espanol) 3. Information Retrieval: Special Issue B. Meetings 1. Reminder of COLING-ACL'98 Workshop Deadline (CVIR'98) 2. AAAI-98 Workshop C. Miscellaneous 1. Java Speech API IV. PROJECTS D. Research 1. Study of Information Seeking Behavior on the Web ****************************************************************** I. QUERIES I.1. =46r: Philip A. Bralich Re: IR Systems and Software I would like to purchase, download, or view any and all IR systems that are available either on the web or for my own use. Could someone help me get started in finding all those that are available. Is there a clearing house somewhere? Phil Bralich Philip A. Bralich, Ph.D. President and CEO Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 Tel: (808)539-3920 =46ax: (808)539-3924 ********** I.2. =46r: Philip A. Bralich, Ph.D. Re: My Last NLP Post (of this thread) There have been several more reactions to my post concerning whether or not the ability of a theory of syntax to be implemented in a programming langauge constitutes a fair and accurate indepenend and objective test of a theories scope and efficiency. In order to save bandwidth I will respond one last time to this thread and try and cover the widest range of crticisms as possible. I am sorry to be the one to have to bring to you news of a serious problem in your field, but the fact remains that the theories that you have grown to know and love over the last 30+ years have a dirty little secret: They cannot be programmed to save their lives. This thread has taken on more of a life than I expected, so if all are agreed I will make this the last post for this particular thread (though not this subject I am sure). Please do not see this as an opportunity to let your venom fly as I will respond to posts that I feel must be responded to. I think it is easiest to frame this in terms of arguments that are "out there" and my responses to them. The garden path arguments and my responses: 1. The standards I have proposed have already been met. (They have not). Not by a long shot. Just print out the standards, put a copy of Ergo software in your pocket and then go and compare them with any parsing system anywhere. 2. The standards I propose are idiosyncratic to Ergo's theory or they are somehow unfair. Look at them yourself and ask if you and most of the field hasn't believed they are commonplace expectations for any theory or any parser. 3. Current problems with NLP have to do with working with the last 10%. That is, the pretense is they can already handle 90% of what needs to be done but more is required. This is dead wrong. Parsers outside of Ergo hardly begin to touch the standards we have proposed: few of them doing anything more than part of speech analysis. If you look at the output on speech rec systems you will see their NLP abilities are well under 1% of the task (handling only a few hundred commands). Ergo can improve that by another 60-80% increasing the number of possible commands to many thousands, making the first spoken language operating systems possible. 4. Parsing is not a good test of a theory even though there has never been a theoretical mechanism proposed that in principle could not be programmed. Note that other NLP researchers are not anxious to argue that their theories are better BECAUSE they cannot be programmed. That would end virtually any hope of funding that may exist for them in the NLP arena. Thus, I believe it is safe to say that all other syntactic theoreticians agree wholeheartedly that programming is a good test of a theory. I have yet to see one theoretical syntactician to argue this claim. Though it does seem that there are those in the field who believe parsing is not a good test. (Statisticians probably--the last thing they would want is for a theory of syntax to do better than their number crunching). Perhaps syntacticians with other theories would like to take up the debate. Would a theory of math that could not be programmed to make calculators then be a better theory of math because it was using less mundane criteria than formal consistency? 4. Statistics alone is sufficient to analyze the facts of human language: Wrong: statistics will never provide sufficient information about the internal structure of strings to manipulate structures or to do question/answer, statement/response repartee. (Aside: Does a vote for Ergo equal a vote against statistics? Perhaps.) 5. People will not accept NLP until disfluencies and other gaps are handled. This is more than a little bizarre. By this logic speech recognition should have sold nothing to date and even current products should be stamped as not fit for human consumption. Believe me, when you can type or speak the following to your search engine, people will forget about the disfluencies and gaps. Who was the eighth President of the United States? Hey Mickey, what time is it? 6. Parsers are too cumbersome to be made readily available to the general public. Again not true: Ours is a standard Windows 95 program that fits on one disk (including the 75,000 word dictionary) and will run on any 486 or better PC. If it is NOT superior to the others they should be able to do the same. 7. There is something inherently wrong with the Penn Treebank standard. Doesn't matter: it is a true demonstration of a parsers ability to do part of speech tagging as well as to do a thorough analysis of internal structure. If this is done it Shouldn't take more than a few weeks for the programmers to convert their Parser's output into the Penn Treebank style. That is just not a big programming task. Besides the Penn Treebank II guidelines are the standards accepted by this field. (Of course, we also need equivalent standards for other languages. 8. Changing one structure into another or doing q&a makes untenable theoretical claims about the relationships between structures. Again not so--if you have properly analyzed the internal structure of strings you should be able to change a question to a statement and a statement to a question whether or not you believe this is what goes on in the brain. The structures are so totally predictable, one from the other, that this too should only take a programmer a week or so (if the analysis of internal structure has been done correctly in the first place). 9.People could respond intelligently to my claims, they are just too busy with other things or too put off by my arrogance (accuracy?). Wrong: this is a written record respected in the community and as available as a library book (just type my name in a Net Search if you want to find these arguments): not to respond is to acquiesce. There is still a serious problem underlying the lack of response from people who know this field, For syntacticians, if they say that theories can be tested by their ability to be implemented as a parser they have to produce a parser of at least equaluality to the Ergo parser or concede ours is best; however, if they say that there are more important issues than parsing (thereby demonstrating their theory CANNOT be implemented in a parser) they must forever write off funds for parsing until such time as they have amended their theory or their opinion. =46or statisticians, if they say that a theory of syntax can be parsed at all, they are in danger of admitting there is no particular need for statistical parsers. If they say that theories of syntax cannot create parsers or cannot create parsers equal to statistical parsers they must come up with a statistical parser that can meet or beat those very ordinary standards that I have proposed. This is especially difficult for them because there is no way that a statistical parser will ever analyze internal structure to a significant enough degree to do q&a or manipulate structures (otherwise they would have developed a theory of syntax and would once again remove the need for statistical parsers). =46inally, download a BracketDoctor (perhaps these arguments as wel), take i= t to classes or to presentations or to conferences, and ask questions based on what it can do. If you are given straight answers with evidence of better results from other parsers you will KNOW I am wrong. If anything else occurs (e.g. dead silence, dirty looks, accusations of political incorrectness, shunning, or whatever) you know there is substance in my arguments. Gauge my arguments not by the intellectualized cloudiness of responses, but by the lack or presence of physical evidence (don't go by oral reports alone) from other parsers that can meet the standards I have provided. I have provided very ordinary standards (repeated below) such that anyone should be able to judge this. Look closely at the standards; you will see they are fair and relatively simple. Then, BracketDoctor and arguments in hand, go out and find the physical evidence yourself. Phil Bralich THE STANDARDS: In addition to using the Penn Treebank II guidelines for the generation of trees and labeled brackets and a dictionary that is at least 35,000 words in size and works in real time and handles sentences up to 15 to 20 words in length, we suggest that NLP parsers should also meet standards in the following seven areas before being considered "complete." The seven areas are: 1) the structural analysis of strings, 2) the evaluation of acceptable strings, 3) the manipulation of strings, 4) question/answer, statement/response repartee, 5) command and control, 6) the recognition of the essential identity of ambiguous structures, and 7) lexicography. (These same criteria have been proposed for the coordination of animations with NLP with the Virtual Reality Modeling Language Consortium--a consortium (whose standards were recently accepted by the ISO) designed to standardize 3D environments. (See http://www.vrml.org/WorkingGroups/NLP- ANIM). It is important to recognize that EAGLES and the MUC conferences, groups that are charged with the responsibility of developing standards for NLP do not mention any of the following criteria and instead limit themselves to largely general characteristics of user acceptance or vague categories such as "rejects ungrammatical input" rather than specific proposals detailed in terms of syntactic and grammatical structures and functions that are to be rejected or accepted. The EAGLES site is made up of hundreds of pages of introductory material that is very confusing and difficult to navigate; however, once you actually find the few standards that are being proposed you will find that they do not come close to the level of precision and depth that is being proposed here and for that reason should be rejected until such time as these higher and more demanding levels of expectation of the NLP systems is included there as well. These are serious matters and a group like EAGLES should not ignore extant NLP tools simply because they are not mainstream or because mainstream parsers cannot meet these requirements (evnthough the Ergo parser is better known than almost all other parsers). Just go through their pages and try to find EXACTLY what a parser is expected to do under these guidelines. There is almost no reference to specific grammatical structures, the Penn Treebank II guidelines, or references to current working parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html). If the EAGLES' standards are ever to gain any credibility and respect they are going to have to be far more specific about grammatical and syntactic phenomena that a system can and cannot support. There should also be some requirement that the systems being judged offer a demonstration of their abilities to generate labeled brackets and trees in the style of the Penn Treebank II guidelines. I suggest the following as a far more exacting and far more demanding test of systems than is offered by EAGLES or any of the MUC conferences. HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS: 1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts of sentence, 3) identify internal clauses (what they are and what their role in the sentence is as well as the parts of speech, parts of sentence and so on of these internal clauses), 4) identify sentence type (without using punctuation), 5) identify tense and voice in main and internal clauses, and 6) do 1-5 for internal clauses. 2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3) give the number of correct parses identified, 4) identify what sort of items succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the number of unacceptable parses that were tried, and 6) give the exact time of the parse in seconds. 3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the parser should: 1) change yes/no and information questions to statements and statements to yes/no and information questions, 2) change actives to passives in statements and questions and change passives to actives in statements and questions, and 3) change tense in statements and questions. 4. At a minimum, based on the above basic set of abilities, any such device should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE REPARTEE, he parser should: 1) identify whether a string is a yes/no question, wh-word question, command or statement, 2) identify tense (and recognize which tenses would provide appropriate responses, 3) identify relevant parts of sentence in the question or statement and match them with the needed relevant parts in text or databases, 4) return the appropriate response as well as any sound or graphics or other files that are associated with it, and 5) recognize the essential identity between structurally ambiguous sentences (e.g. recognize that either "John was arrested by the police" or "The police arrested John" are appropriate responses to either, "Was John arrested (by the police)" or "Did the police arrest John?"). 5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate structures such as the following: 1) existential "there" sentences with their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on the porch"), 2) passives and actives, 3) questions and related statements (e.g. "What did John give Mary" can be identified with "John gave Mary a book."), 4) Possessives should be recognized in three forms, "John's house is big," "The house of John is big," "The house that John has is big," 5) heads of phrases should be recognized as the same in non-modified and modified versions ("the tall thin man in the office," "the man in the office," the tall man in the office" and the tall thin man in the office" should be recognized as referring to the same man (assuming the text does not include a discussion of another, "short man" or "fat man" in which case the parser should request further information when asked simply about "the man")), and 6) others to be decided by the group. 6. At a minimum from the point of view of COMMAND AND CONTROL, the parser should: 1) recognize commands, 2) recognize the difference between commands for the operating system and commands for characters or objects, and 3) recognize the relevant parts of the commands in order to respond appropriately. 7. At a minimum from the point of view of LEXICOGRAPHY, the parser should: 1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical items, 3) recognize a variety of grammatical features such as singular/plural, person, and so on, 4) recognize a variety of semantic features such as +/-human, +/-jewelry and so on, 5) have tools that facilitate the addition and deletion of lexical entries, 6) have a core vocabulary that is suitable to a wide variety of applications, 7) be extensible to 75,000 words for more complex applications, and 8) be able to mark and link synonyms. Philip A. Bralich, Ph.D. President and CEO Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 Tel: (808)539-3920 =46ax: (808)539-3924 ****************************************************************** II. JOBS II.1. =46r: Elizabeth Gill Re: RAND: Data Librarian RAND, a not-for-profit multi-disciplinary research organization in Santa Monica, California, has an immediate opening in its Library for the following: Description: Data Collection Librarian Exempt Reports to: Library Assistant Director Position Summary:Administers the overall operation of the RAND Library Data =46acility including the following responsibilities: *Assisting in identifying appropriate data for specific research applications, help users access data files, identify variables for specific research projects. *Acquiring data files and documentation from government agencies, data archives, or other suppliers, locate and acquire significant RAND generated data files. *Verifying, copying and indexing files and documentation received from outside sources and RAND projects. *Verifying data formats, storing data on UNIX, updating, controlling and maintaining the physical tapes and documentation for all supported facility holdings, writing file abstracts, preparing user specification form and keyword indexing. *Disseminating acquired data and documentation and publishing file abstracts internally, disseminating RAND generated data and documentation to outside agencies and data users. *Assisting in file access and consulting on data problems related to file us= age *Representing data user's interests in forums influencing public data policy= =2E *Assisting Library Director in development of relevant policies and procedur= es. *Maintaining close relations with RAND programmers and researchers to assure an effective collection. *Performing other special projects and assignments as directed. Education: B.A. in Computer Science or Social Science field such as Economics, Political Science or statistics. M.L.S. from ALA Accredited program Work Experience: Minimum of three years experience in a data archive or library data facility. Qualifications: Understanding of social science and survey data; Knowledge of data handling procedures and archiving standards; Familiarity with data collection methods; experience with UNIX, Windows and Microsoft Office knowledge of SAS a plus; familiarity with a variety of computer equipment. Attention to detail is extremely important. Should be a self-starter, able to prioritize demands and effectively organize work. Must be very comfortable in a rapidly emerging electronic library environment. Must have strong oral and written communications skills. Expected to exercise fiscal responsibility in running the Data Facility. Must be able to multi-task and possess helpful service-oriented attitude. U.S. Citizenship is required. Salary: Salary very competitive. Contact: Ken Logan RAND 1700 Main Street P.O. Box 2138 Santa Monica, CA 90407-2138 RAND is an Affirmative Action Employer ********** II.2. =46r: Richard Pollard Re: U. Tennessee/Knoxville: Director, SIS Director, The School of Information Sciences The University of Tennessee, Knoxville (UTK) Applications and nominations are invited for the position of Director of the School of Information Sciences (SIS) at the University of Tennessee, Knoxville. The University: Founded in 1794, UTK is Tennessee's comprehensive land-grant university and a Carnegie I institution. UTK enrolls approximately 19,000 undergraduate and 6,000 graduate students and employs 1300 faculty and instructional staff. It offers Master's programs in 77 fields and Doctoral programs in 52, more than any other institution in the state, and attracts nearly $80 million in sponsored research programs. Located at a bend of the Tennessee River, UTK's 532-acre campus is within easy commuting distance of the area's principal residential neighborhoods. The Knoxville metropolitan area is adjacent to the Great Smokey Mountains and lies at the intersection of major north-south and east-west interstate highways. With an area population of 631,000, mild seasons, and a variety of recreational, educational, and cultural opportunities, Knoxville is consistently ranked as one of the country's "most livable" cities. The School: The School is the state's sole ALA-accredited program and reports to the Vice Chancellor for Academic Affairs. SIS has been identified by the University as one of 35 academic programs of excellence (among 400 programs) to receive high priority for protection in times of lean budgets and priority for increased funding as resources permit. The School offers a Master of Science (Information Sciences) recognized as the fastest growing masters degree program in the University. SIS shares a Ph.D. with the College of Communications, offers undergraduate service courses, and houses the Center for Information Studies. There are twelve full-time faculty members, five adjunct faculty, and five full-time staff. Approximately 265 students are enrolled at Knoxville and eight distance education sites in Tennessee and Virginia. The School has an innovative curriculum and works closely with units such as the University Libraries, the College of Communications, and the College of Education. SIS is actively involved in the area's sizable community of information professionals. The value of SIS grants and contracts exceeds $6 million (FY96). Duties and Responsibilities: The School's Director will be appointed at the rank of Professor and is responsible for providing the academic leadership and vision necessary for the development, implementation and evaluation of its educational, research and service programs. This includes commitment to technological advancement, fundraising, and developing stronger relationships with the University, the State, and the international community of information professionals. Some teaching and/or research expected. This is a full-time twelve-month, tenure track administrative appointment. The salary is negotiable based on relevant experience. Required qualifications include a doctorate in one of the information sciences or a closely related field. Positive record of research and teaching at a nationally recognized university. Evidence of leadership experience or potential. Demonstrated organization skills, creativity, and flexibility. Effective communication and collaborative skills. Strong commitment to interdisciplinary research and relations both on and off campus. The energy and strategic vision critical to leading the School in its interdisciplinary approach to the education of information professionals. Commitment to faculty governance, distance and continuing education. Desired qualifications include successful administrative experience in higher education. Knowledge of emerging information technologies. Experience in curriculum planning and evaluation. Success in developing grants, contracts, and/or sponsored projects. The successful candidate will have an understanding and demonstrated commitment to equal employment opportunity and affirmative action. Review of applications will begin on April 10, 1998 and continue until the position is filled. Applicants should submit: 1) a letter of interest relating the applicant's professional qualifications to the requirements and responsibilities of the position, 2) a full curriculum vitae, 3) the names, title/positions, addresses, and telephone numbers of at least three professional references, and 4) supporting materials related to scholarship, teaching, service, and administrative experience. Send application materials to: Professor Fred D. Tompkins, Chair Search Committee for Director of SIS College of Engineering 101 Perkins Hall The University of Tennessee, Knoxville Knoxville, TN 37996-2000 Phone: (423) 974-3609 =46ax: (423) 974-8890 EMail: fred-tompkins@utk.edu http://www.utk.edu (UTK site) http://www.sis.utk.edu (SIS site) The University of Tennessee, Knoxville does not discriminate on the basis of race, sex, color, religion, national origin, age, disability or veteran status in provision of educational programs and services or employment opportunities and benefits. This policy extends to both employment and admission to the University. The University does not discriminate on the basis of race, sex or disability in the education programs and activities pursuant to the requirements of Title VI of the Civil Rights Act of 1964, Title IX of the Education Amendments of 1972, Section 504 of the Rehabilitation Act of 1973, and the Americans with Disabilities Act (ADA) of 1990. Inquiries and charges of violation concerning Title VI, Title IX, Section 504 ADA or Age Discrimination in Employment Act (ADEA) or any of the other above referenced policies should be directed to the Office of Diversity Resources & Educational Services (DRES), 1818 Lake Avenue, Knoxville, TN 37996-3560, telephone (423) 874-2498 (TTY available). Requests for accommodation of a disability should be directed to the ADA Coordinator at the Office of Human Resources Management, 600 Henley Street, Knoxville, TN 37996-4125 ****************************************************************** III. NOTICES III.A.1. =46r: Katharine Sharp Review Re: Katharine Sharp Review Call for Papers Call For Papers Katharine Sharp Review GSLIS, University of Illinois ISSN 1083-5261 http://edfu.lis.uiuc.edu/review) This is the first call for submissions to the Summer 1998 issue of the Katharine Sharp Review, the peer-reviewed e-journal devoted to student scholarship and research within library and information science. Articles can be on any topic that is relevant to LIS--from children's literature to electronic database manipulation to library marketing. Please take a look at previous issues for a sample of what is possible--but do not let that be your only guide! If you care passionately about some facet of LIS or have produced a research paper of which you are proud, consider submitting it to KSR. All submissions should be received by Monday, May 11, 1998. Although it is not required for submission, we would appreciate an abstract (of 150-200 words) or indication of intention to submit. Submitted articles must be accompanied by an abstract of no more than 200 words. =46or more information, including instructions for authors, please see the KSR webpage at either http://edfu.lis.uiuc.edu/review/call.html or http://mirrored.ukoln.ac.uk/lis-journals/review/review/ or you can email us at sharp-review@edfu.lis.uiuc.edu. Kevin Ward Editor Katharine Sharp Review review@edfu.lis.uiuc.edu http://edfu.lis.uiuc.edu/review ********** III.A.2. =46r: Pedro H=CCpola Re: IWE -- El Profesional de la Informacion (formerly Information World en Espanol) Next special topic issue of IWE is scheduled to come out in June 1998 on "Legal databases". IWE editors will be pleased to receive contributions. Further information for contributors is available upon request. El Profesional de la Informacion (formerly Information World en Espanol) is a monthly journal addressed to Spanish languageinformation professionals. Launched in 1992 by Learned Information (Oxford, UK) it is now published by Swets & Zeitlinger Publishers (Lisse, The Netherlands). The IWE team also created in 1993 IweTel, the main email list in Spanish for information professionals (more than 1,500 subscribers). http://www.rediris.es/list/info/iwetel.html Tomas Baiget and Pedro Hipola IWE editors: iwe@sarenet.es IWE suscriptions: orders@swets.nl Advertising in IWE: akeefer@arrakis.es ********** III.A.3. =46r: K.L.Kwok Re: Information Retrieval: Special Issue Journal: Information Retrieval Editors: Paul Kantor and Stephen Robertson Call-for-Paper of Special Issue: "Connectionism, Genetic Algorithms and Regression Techniques for IR" Guest Editors: Norbert Fuhr and Kui Lam Kwok In Artificial Intelligence, the competition between 'hard' symbolic, first order logic based methods and 'soft' connectionist approaches to problems involving intelligent behavior is well-known. To a lesser extent, such a scenario has also been played out in the field of Information Retrieval (IR) between the the classical, well founded models and more heuristic ranking strategies. Soft approaches such as neural networks, genetic algorithms, regression techniques, etc. have strengths of flexibility, robustness and tolerance of imprecision that are well recognized. In real-world commercial web searching or TREC large-scale IR experiments, statistical approaches are also found to be preferable. It is therefore of interest to see how the state-of-the-art soft computing techniques may be applicable to the concept formation and matching problems in IR. Such then is the essence of this special issue call. Soft computing is a rapidly expanding field encompassing not only those disciplines mentioned before, but also important topics such as fuzzy logic and approximate reasoning, machine learning, belief propagation, and others. In order to limit the scope for this particular issue however, we have decided to focus on the topics as given in the title. Other related approaches may well be subjects of future issues. To borrow a page from the objectives of this journal, we list some soft computing methodologies that may be applicable to some IR tasks. We seek original papers - theoretical, experimental or practical - that deal with the intersection of these non- exhaustive lists of topics. Methodologies in Neural Network include but are not limited to: network models and architecture; feedforward or recurrent propagation modes; network learning - supervised or unsupervised and varied learning algorithms; objective functions, gradient descent and other optimization. Methodologies in Genetic Algorithms include but are not limited to: coding schemes; genetic operators; fitness functions, their derivative-free optimization, including other methods such as simulated annealing, random searching, evolutionary strategies. Regression methodologies include but are not limited to: model functions, parametrization, interpolation, error minimization. We prefer contributions which are applicable to at least reasonable-sized collections of texts. In addition, methods of scaling-up these various techniques including but not limited to parallel, distributed computation to handle very large, real-word IR environments will be of particular interest. IR tasks of interest include but are not limited to: retrieval modes such as ad-hoc, routing, filtering, document classification; representation issues such as optimal feature selection and weighting; issues in training such as learning from judged (relevance feedback) or unjudged items, positive or negative, sampling of items for training, generalization problems; optimal combination of representations or retrieval; correspondence of evaluation measures in IR and objective functions; term and document clustering or categorization, collection structural discovery for retrieval, display, interaction or summarization. Submission Deadline: June 30, 1998 COMPLETE submission information can be found at http://www.wkap.nl/journals/ir ********** III.B.1. =46r: James Pustejovsky Re: Reminder of COLING-ACL'98 Workshop Deadline (CVIR'98) CALL FOR PAPERS COLING-ACL 1998 Workshop Content Visualization and Intermedia Representations (CVIR'98) August 15, 1998 University of Montreal Montreal, Quebec, Canada WORKSHOP DESCRIPTION: In the last few years, multimedia systems have become available which integrate text, graphics, sound (speech and non-speech audio), as well as animation. There are many different communities working on such systems (e.g., hypermedia, human-machine interaction, information retrieval, scientific visualization, content extraction, dialog tracking), each with distinct concerns and goals, and often the communities are not aware of each other's research and methods. This workshop aims to bring together these communities to examine the questions of the visual presentation of diverse content through multiple media. The major goal is to explore common intermedia representation languages which are expressive enough to cover diverse modalities yet suitably appropriate for the individual media. With increasing amounts of data, information, and knowledge available to the user, the effective use of visualization is increasingly important in applications. Examples include: * visualization of data in scientific literature, including support for interactive information retrieval; * business and finance data visualization (data profiling); * automated or assisted map,graph, diagram, or image construction from text or data; event, process, and knowledge editing and visualization tools; * and knowledge navigation over databases, texts, and search results. The specific issues addressed by the workshop include but are not limited to= : 1. Definition of Content: different disciplines and applications have distinct perspectives on what content is, e.g., of text, video, graphics, collections of interactions or correspondences. 2. Knowledge Representation: i.e., what it is, how to represent it, reason about it, and present it. 3. Taxonomies of content representations, tasks, and visualization artifacts= =2E 4. Representations for content and how these relate to and/or facilitate visualization tasks. 5. Selection and Organization of Content: Deciding what to present and how to organize the presentation of selected content and why (i.e., effect). 6. Deciding how to coordinate the presentation of content through several media. 7. The relationship of cognitive task to visualization content and style (e.g., visualization structure, properties, form, coherency, interpretability, and accuracy of displays). 8. Deciding how to accept and integrate input from several media. 9. Medium-specific encoding of content. 10. Presentation and interaction techniques of generated results. 11. Tailoring visualizations to specific user and usergroup characteristics, knowledge, and interests. 12. Content visualization evaluation metrics and methods. We encourage submissions of demonstrations and/or videos of working visualizations pertaining to the above topics. The organizers will produce a workshop report and, providing there is sufficient interest and adequate results reported, will consider a special edited journal issue and/or state of the art collection. Authors are encouraged to submit their workshop papers simultaneously for public discussion to the Area Intelligence User Interfaces of the Electronic Transactions on Artificial Intelligence (ETAI). The ETAI is a new kind of electronic journal using open and posteriori reviewing. =46ormally, the rules work as follows. In the ETAI, you first have the article discussed for three months, then you have a chance to revise it based on the feedback, and then you decide whether to submit it for refereeing in the ETAI or in some other journal. For more information, see: http://www.ida.liu.se/ext/etai/. REQUIREMENTS FOR SUBMISSION: Papers are invited that address any of the topics listed above. Maximum length is 8 pages including figures and references. Please use US or A4 letter format and set margins so that the text lies within a rectangle of 6.5 x 9 inches (16.5 x 23 cm). Use classical fonts such as Times Roman or Computer Modern, 11 to 12 points for text, 14 to 16 points for headings and title. LaTeX users are encouraged to use the ACL style file for LateX. MS-Word users should use the ACL style file for MS-Word. Submissions can be made either as hardcopies or electronically in ASCII, PostScript, HTML, or MS-Word format. They should be sent to: James Pustejovsky CVIR'98 Computer Science Department 258 Volen Brandeis University Waltham, MA 02254-9110 voice: 1-781-736-2709 fax: 1-781-736-2741 email: jamesp@cs.brandeis.edu More detailed information on the workshop can be found at: http://www.cs.brandeis.edu/~jamesp/CVIR/ TIMETABLE: * Deadline for electronic submissions: March 11, 1998 * Deadline for hardcopy submissions: March 13 (arrival date) * Notification of acceptance: May 1, 1998 * Final manuscripts due: June 12, 1998 ORGANIZERS: MARK T. MAYBURY, Director Advanced Information Systems Center The MITRE Corporation (MS K308) 202 Burlington Road Bedford, MA 01730 Tel: 1-78-271-7230 =46ax: 1-781-271-2780 maybury@mitre.org James Pustejovsky Associate Professor Computer Science Department and Volen Center for Complex Systems Brandeis University Waltham, MA 02254-9110 USA voice: 1-781-736-2709 fax: 1-781-736-2741 jamesp@cs.brandeis.edu http://www.cs.brandeis.edu/~jamesp http://www.cs.brandeis.edu/~rllc ********** III.B.2. =46r: Kevin D Ashley RE: AAAI-98 Workshop Call for Participation AAAI-98 Workshop "Textual Case-Based Reasoning" DESCRIPTION OF WORKSHOP: In recent years, there has been a growing interest of CBR researchers in dealing with textual representations of cases. In particular, many CBR applications now require the handling of only semi-structured or even full-text cases rather than the highly structured cases of more traditional CBR systems. This workshop aims at bringing together the research groups active in this area in order to identify major problems to be solved, alternative approaches to this task, and specific properties which distinguish "Textual CBR" from other areas, such as Information Retrieval. TOPICS: The overall theme of the workshop will be handling of textual documents within CBR systems. In particular, possible topics include (but are not limited to): * Representation: - How should texts be mapped into cases? - What types of documents exist that contain useful information that should be reused? - How should the different pieces of text in a document be distinguished? - How can non-textual document information be retrieved and reused? * System development and maintenance: - How can domain knowledge be used in a way that might lend CBR an advantage over other technologies? - What kind of knowledge is required to build a textual CBR system? - How should this knowledge be acquired and maintained? * Evaluation: - How should textual CBR systems be evaluated? * Integration issues: - How does "Textual CBR" relate to other technologies? - What can CBR provide for these, and what should be learned? * Case Studies: - Applications built and lessons learned. =46ORMAT OF WORKSHOP:The format of the workshop will combine an invited talk= , short presentations and group discussion. As discussed below, each potential participant is asked to submit a "Position Paper" dealing with one or more of the above topics or related ones. From the papers submitted, the Workshop Chairs will select for oral presentation a relatively small number of papers staking out interesting positions. After the oral presentations, the Workshop will break down into a number of small discussion groups on important topics suggested by the "Position Papers" and oral presentations. After the discussions, the small groups will report back to the workshop as a whole, followed by general discussion. David L. Waltz, Vice President, Computer Science Research, NEC Research Institute and President of the American Association of Artificial Intelligence, will deliver an invited talk on a topic related to the Workshop (title to be announced.) ATTENDANCE: Each potential participant should submit a "Position Paper" dealing with one or more of the above topics. Based on these, the Workshop Chairs will select participants. Papers that stake out positions and make recommendations will facilitate more interesting small group discussions. SUBMISSION REQUIREMENTS: Potential participants are invited to submit "Position Papers" which should be at most 2500 words (five pages) in length. The accepted papers will be made available to the workshop participants as either AAAI Workshop Notes or AAAI Technical Reports. Submissions should preferably be made via electronic mail as UNIX printable PostScript. If this causes problems, please, contact Mario Lenz to clarify what other formats are possible. Only if electronic submission is not possible at all, 3 hard copies should be sent to the address below. ADDITIONAL INFORMATION: Up-to-date workshop information can be found at http://www.informatik.hu-berlin.de/~lenz/AAAI98-WS/workshop.html SUBMISSION DEADLINE: March 11, 1998 NOTIFICATION DATE: April 1, 1998 =46INAL DATE FOR CAMERA-READY COPIES TO ORGANIZERS: April 22, 1998 SUBMIT TO: Mario Lenz Dept. of Computer Science Humboldt University Berlin Unter den Linden 6 D-10099 Berlin Germany Tel. +49 30 20181-212 =46ax +49 30 20181-221 lenz@informatik.hu-berlin.de ********** III.C.1. =46r: Philip A. Bralich Re: Java Speech API The following will be of interest to all those working in speech and/or NLP. You may have heard that Sun is about to release its Javaspeech API that can use all the major speech rec programs, but you may not be aware that there are already test versions available at their web site, and that they have estpablished an email discussion group for those who are interested. WEB SITE http://www.javasoft.com/marketing/collateral/speech.html EMAIL LIST address: javamedia-request@sun.com message: subscribe javaspeech-interest or subscribe javaspeech-announce Here is a brief quote from that site: Speech interfaces will give Java developers the opportunity to implement distinct and engaging personalities for their applications and to differentiate their products. Java developers will be able to access the capabilities of state-of-the-art speech technology from leading speech vendors. With a standard API for speech, users will be able to choose the speech products which best meet their needs and their budget. The Java Speech API will leverage the audio capabilities of other Java Media APIs, and when combined with the Java Telephony API, will support advanced computer telephony integration. On desktop systems, the widespread availability of audio input/output capabilities, the increasing power of CPUs and the growing availability of telephony devices all enable the use of speech technology. Philip A. Bralich, Ph.D. President and CEO Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 Tel: (808)539-3920 =46ax: (808)539-3924 ****************************************************************** IV. PROJECTS IV.D.1. =46r: Robert Roseth Re: Study of Information Seeking Behavior on the Web Pilot study explores how people seek information on the Web A pilot study to be conducted by University of Washington faculty could help them learn how individuals seek information on the World Wide Web. The research will be one of the first in-depth studies of how people use the Web. A grant from The Boeing Co. will enable a team from the UW Graduate School of Library and Information Science to gather detailed information, from interviews and observations, about the "information-seeking" behavior of a group of Boeing engineers, using Boeing's intranet, its own storehouse of information on its own segment of the Web. The researchers, three UW faculty and four graduate students, will be gathering information as a baseline for what they hope will be a larger scale study of how Boeing employees use the company's intranet. "Boeing has a great deal of information on its intranet," says Efthimis Efthimiadis, UW associate professor of library and information science. "And, to the company's credit, it has recognized that organizing complex information for diverse audiences can be a challenging problem. =3D This study is a pioneering effort to improve the effectiveness of information retrieval, which can have important implications for worker productivity." The study will consist of first-hand observation and interviews with a group of Boeing engineers, to learn more about how they go about seeking information and using the company intranet to find answers to questions. The researchers hope that the conclusions from this pilot project can be used to frame questions for a future larger study of Boeing employees and their use of the intranet. Boeing solicited proposals from seven universities. The UW response was unusual, according to Efthimiadis, in suggesting a theoretical framework for analyzing Web use that was developed by Annelise Mark Pejtersen and colleagues at Risoe Labs Denmark, specifically for a work environment. The study is led by Associate Professor Raya Fidel and includes, in addition to Efthimiadis, Assistant Professor Sam Oh, and Risoe's Senior Scientist Annelise Mark Pejterson as a consultant. Results are expected later this spring. Note: Prof. Efthimiadis can be reached at 206-616-6077 or by email at efthimis@u.washington.edu. Bob Roseth Director, News and Information University of Washington Box 351207 Gerberding Hall Phone: (206) 543-2580, Fax: 685-0658 Email: roseth@u.washington.edu http://www.washington.edu/newsroom/ ****************************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: nancy.gusack@ucop.edu Editorial Staff: Nancy Gusack nancy.gusack@ucop.edu Cliff Lynch (emeritus) cliff@cni.org The IRLIST Archives is set up for anonymous FTP. Using anonymous =46TP via the host ftp.dla.ucop.edu, the files will be found in the directory /data/ftp/pub/irl, stored in subdirectories by year (e.g., data/ftp/pub/irl/1993). Search or browse archived IR-L Digest issues on the Web at: http://www.dcs.gla.ac.uk/idom/irlist/ These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.