Information Retrieval List Digest 394 (February 23, 1998) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-394.txt IRLIST Digest ISSN 1064-6965 February 23, 1998 Volume XV, Number 8 Issue 394 I. QUERIES 1. Comparisons of IR Results II. JOBS 1. California State U.- Sacramento: Library System Director III. NOTICES A. Publications 1. EdBytes Issue 2 2. FARNET'S Washington Update, February 2, 1998 3. The Impact of the Internet at Public Libraries B. Meetings 1. Final CFP: COLING-ACL'98 COMPUTERM Workshop 2. Hypermedia & WWW, Information Systems, Software Engineering/SAD, and Organizational Memory 3. Canadian Association for Information Science Annual Conference 4. UKSG 21st Annual Conference IV. PROJECTS C. Awards, Fellowships, Grants, & Scholarships 1. Digital Libraries Initiative - Phase 2 E. Miscellaneous 1. RFC: Proposed Metadata Text Retrieval Conference (METTREC) ****************************************************************** I. QUERIES I.1. Fr: Philip A. Bralich, Ph.D. Re: Comparisons of IR Results I am a little new to the area of IR coming from a background in theoretical syntax. I would like to find a list of references concerning comparative results in IR. That is, I am looking for papers, chapters in books, web pages, and so on that give a good critical review of the sorts of results that are being obtained in IR these days. I will post a summary to the list; please mention if you do not want your name mentioned in the summary. Phil Bralich Philip A. Bralich, Ph.D. President and CEO Ergo Linguistic Technologies 2800 Woodlawn Drive, Suite 175 Honolulu, HI 96822 Tel: (808)539-3920 Fax: (808)539-3924 ****************************************************************** II. JOBS II.1. Fr: William Budge Re: California State U.- Sacramento: Library System Director DIRECTOR, LIBRARY INFORMATION SYSTEMS: The Director of Library Information Systems guides the planning, design and development of library-based information and knowledge systems and services for the Library of California State University, Sacramento. The Director manages the library system. PRIMARY DUTIES AND RESPONSIBILITIES: Leadership and Consultative Role: Works closely with the Library's faculty and staff to plan and implement information systems and library automation projects which enhance and support the Library's mission and strategic goals. Provides advice to the Dean and other department heads on the application of technology to the management of information resources Assists the Dean of the Library in developing a library technology plan which identifies long range budget needs. Serves as a resource and advisor to library staff and faculty in the selection and installation of software and hardware Systems Development and Management: Oversees design, development, testing and maintenance of databases and required software components for assigned projects and ensures the development tasks are adequately specified and proceed according to plan. Ensures that the public systems are reliable and operational during all defined hours of public access; determines and controls priorities for service and ensures that Systems Department staff work effectively with other Library staff and vendors to p Provides for Novell systems administration, networked and stand-alone database applications programming, hardware acquisition, network facilities installation and management, and end-user technical support for networked and stand-alone computer systems (e.g., specialized databases and CD-ROM applications), OCLC cataloging and interlibrary loan system, and Library administrative systems. Ensures proper management and planning for the Library's network and its connection to the campus network, and Library Internet use; serves as resource for and participates in computerized development projects. Ensures that all Library and computerized systems and data are secure from all hazards and that functional operations can be maintained throughout the Library in all but the most extreme circumstances; maintains a formal contingency and disaster recov Recommends major hardware and software to be purchased for the University Library. Systems Department Management: Hires, trains, and manages the library systems team. Provides evaluative performance reviews for System Department Staff to ensure quality; provides and supports development opportunities for staff. Prepares department External Relationships, Entrepreneurship, and Environmental Scanning: Develops and maintains an effective, collaborative working relationship with University Computing Services. Works closely with campus peers and departments to ensure effective in Seeks out opportunities for the University Library to become a partner with other University departments, other libraries, or vendors in the development of new information systems and resources Develops and maintains excellent relationships with vendors of databases, software, and hardware Maintains awareness of California State University system-wide issues regarding computerized management and information resource systems and contributes to identification and resolution of those issues which have implications for library information. MINIMUM qualifications: 1. ALA accredited MLS with a concentration in library systems and information technologies; or an advanced degree in computer or information science. 2. A record of progressively responsible experience in the application of information technology in an academic setting. 3. Ability to manage and provide leadership in a changing environment. 4. Strong interpersonal skills, and the ability to work with faculty, staff, students, and vendors. 5. Demonstrated interest or ability to work with a diverse faculty, student and staff population. 6. Successful experience in staff management and project and operational system management activities in a library setting that makes use of computerized resources. 7. Ability to organize and establish priorities for multiple tasks of various complexities and delegate as appropriate. 8. Demonstrated ability to communicate effectively both orally and in writing. 9. Demonstrated ability to work under the responsibilities associated with sustaining superior support of complex systems which serve large numbers of people. 10. Demonstrated understanding of the computing needs of a complex university library environment. 11. Demonstrated experience in recognizing the need for specifying, designing, and implementing complex and innovative electronic information systems. 12. Demonstrated understanding of systems analysis and programming. 13. Demonstrated understanding of the internal workings of the hardware and software involved in complex computer systems. 14. Knowledge of library processes, procedures, and operations and the requirements for their current and future automation. DESIRABLE QUALIFICATIONS: 1. A record of progressively responsible experience in the application of information technology in an academic library setting. 2. Experience in planning, implementing, and managing a Novell Network. 3. Experience in planning, implementing, and managing a CD-ROM Local Area Network. 4. Experience in planning, implementing, and managing an INNOPAC integrated library system. ANTICIPATED STARTING DATE: May 1, 1998 APPOINTMENT: This position is defined in the Management Personnel Plan of The California State University. It is excluded from the collective bargaining process and does not gain permanent status, nor is it eligible for a faculty appointment. APPLICATION PROCEDURES: Applicants are asked to submit a current resume and cover letter addressing the qualifications for the position, and the names, addresses and phone numbers of three professional references. Send to: Patricia Larsen Director and Dean of the Library The Library California State University, Sacramento 2000 State University Drive East Sacramento, CA 95819-6039 FILING DEADLINE: Applications received by March 15, 1998 will receive first consideration. The position will remain open until filled. California State University, Sacramento is an Affirmative Action/Equal Opportunity Employer, and has a strong institutional commitment to the principle of diversity in all areas. ****************************************************************** III. NOTICES III.A.1. Fr: Marc Mirish Re: EdBytes Issue 2 This is "EdBytes," the bi-monthly e-mail publication of the higher education policy website: EducationCommunity.com. Issue no. 2 2/9/98. http://EducationCommunity.com This Issue's ToC: Online Higher Education Bibliographies Next Issue's ToC: "Transforming Higher Education" _____How to stay current when WWW changes daily?______ THE EDUCATIONCOMMUNITY.COM SUBJECT INDEX! http://EducationCommunity.com/u/library/card_catalog/subject_frameset.htm _____What Does the Future Hold for Your Profession?____ ONE THING WE KNOW--TECH. WILL TRANSFORM ACADEMIA http://EducationCommunity.com/u/library/card_catalog/subject/ed_tech_frame.htm ______From the Knowledge Workers Toolbox_______ BUILD YOUR OWN BIBLIOGRAPHY - IN LESS THAN 60 SECONDS! http://educationcommunity.com/u/search_frameset.htm _____________Look Under the Hood_______________ GOVERNANCE A MODEL FOR ONLINE BIBLIOGRAPHIES http://educationcommunity.com/u/library/card_catalog/Subject/governance_frame.ht m _________Policy Gems -- Our Favorite Pages___________ INDEX OF SITES CATALOGING FULL-TEXT WWW RESOURCES http://educationcommunity.com/u/library/card_catalog/card_cat_frameset.htm To subscribe you can go to: http://educationcommunity.com/u/administration/auto_update/auto-update.htm To make comments or suggestions, go to: http://educationcommunity.com/u/feedback.htm ********** III.A.2. Fr: Garret Sern RE: FARNET'S Washington Update, February 2, 1998 FARNET'S WASHINGTON UPDATE --- FEBRUARY 9, 1998=09 FARNET (http://www.farnet.org) is a non-profit public interest Internetworking organization with a primary focus on the education, research and related communities. IN THIS ISSUE: NSF releases 1999 budget request; expresses optimism for NGI funding despite recent court injunction. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Written from FARNET's Washington office, "FARNET's Washington Update" is a service to FARNET members and other interested subscribers. We gratefully acknowledge EDUCOM's NTTF and the Coalition for Networked Information (CNI) for additional support. If you would like more information about the Update or would like to offer comments or suggestions, please contact Garret Sern at garret@farnet.org. ********** III.A.3. Fr: Terry Kuny Re The Impact of the Internet at Public Libraries "Linking People to the Global Networked Society -- mEvaluation of the OnLine at PA Libraries Project: Public Access to the Internet Through Public Libraries" by Charles R. McClure and John Carlo Bertot October 15, 1997 The report on the impact of the Internet in Pennsylvania public libraries is now available on the World Wide Web at: URL: http://research.umbc.edu/~bertot/OnLinePA.html ********** III.B.1. Fr: Christian Jacquemin Re: Final CFP: COLING-ACL'98 COMPUTERM Workshop Final CFP : DEADLINE March 23, 1998 First Workshop on Computational Terminology COMPUTERM'98 August 15, 1998 (immediately following ACL/COLING-98) University of Montreal, Montreal (Quebec, Canada) DESCRIPTION: The workshop will provide a forum to bring together researchers from the fields of computational linguistics, terminology, automated translation, information retrieval and lexicography who share an interest in computational aspects of terminology processing: acquisition, extraction, indexing, machine-aided thesaurus building, dictionary construction, etc. The aim of the workshop is to stimulate the exchange of innovative ideas and results of diverse aspects of automatic term processing in order to bridge the gap between these fields. TOPICS: The topics of the workshop include (but are not limited to): * Construction of terminology resources * Semi- or automatic acquisition of terms * Semi- or automatic acquisition of conceptual knowledge * Thesaurus construction and maintenance * Use of terminology resources (term banks, thesauri, * specialized lexicons,...) * Terms in information retrieval (stemming, automatic indexing, query expansion, ...) * Multi-lingual terminological resources for cross-language IR * Terminology management in machine-aided translation * Terminology and NLP (parsing, tagging, text understanding, generation,...) * Terminology processing for other applications. SUBMISSIONS: Only hard-copy submissions will be accepted. Authors should submit six (6) copies of their full-length paper (3500-5000 words). Submissions should be sent to: Didier Bourigault Laboratoire de Linguistique Informatique Universite Paris XIII Avenue J.-B. Clement F-93430 Villetaneuse France Style Files and Templates for Preparing Submissions: http://coling-acl98.iro.umontreal.ca/Styles.html The official language of the Conference is English. However, papers can also be submitted in French. The final version of the papers will be accompanied by two long abstracts in two different languages. All the presentation at the workshop will be given in English. IMPORTANT DEADLINES Submission Deadline: March 23, 1998 Notification Date: May 15, 1998 Camera ready copy due: June 15, 1998 EMAIL CONTACT db@lli.univ-paris13.fr, Christian.Jacquemin@iut-nantes.univ-nantes.fr, lhommem@ere.umontreal.ca WEBSITES COMPUTERM WORSHOP: http://tornade.ERE.UMontreal.CA:80/~lhommem/coling/computerm.html COLING-ACL'98: http://coling-acl98.iro.umontreal.ca ********** III.B.2. Fr: Carolyn Watters Re: Hypermedia & WWW, Information Systems, Software Engineering/SAD, and Organizational Memory Call for Participation: Workshops and Tutorials in Hypermedia & WWW, Information Systems, Software Engineering/SAD, and Organizational Memory WORKSHOPS * HTF4: HYPERTEXT FUNCTIONALITY AND THE WWW at WWW7 Conference, Brisbane, Australia, April 14, 1998 http://www.ep.cs.nott.ac.uk/HTF/HTFIV/cfp.html * HTF5: ENGINEERING HYPERTEXT FUNCTIONALITY INTO FUTURE INFORMATION SYSTEMS at ICSE'98 Conference, Kyoto, Japan, April 20, 1998 http://www.ics.uci.edu/pub/kanderso/htf5/cfp.html * HTF6: INCORPORATING HYPERTEXT FUNCTIONALITY INTO SOFTWARE SYSTEMS at ACM Hypertext'98 Conference, Pittsburgh, April 20, 1998 http://www.ep.cs.nott.ac.uk/HTF/HTFVI/Proposal.html * HTF7: ORGANIZATIONAL MEMORY SYSTEMS & HYPERTEXT FUNCTIONALITY at ICIS'98 Conference, Helsinki, Finland, December 12-13, 1998 http://rieska.oulu.fi/~hok/htf7/cfp.htm TUTORIALS * Tutorial: APPLYING HYPERMEDIA TO THE WORLD WIDE WEB at WWW7 Conference, Brisbane, Australia, April 14, 1998 by Michael Bieber (New Jersey Institute of Technology) http://www2.slac.stanford.edu/bebo/www7/tutsdesc.html#bieber and at 4th Hong Kong Symposium, April 8, 1998 http://www.ssrc.hku.hk/sym/98/program.html * Tutorial: STRUCTURED DESIGN OF WWW AND INTRANET APPLICATIONS at WWW7 Conference, Brisbane, Australia, April 14, 1998 by Marios Koufaris (New York U.) & Tomas Isakowitz (U. of Pennsylanvia) http://www2.slac.stanford.edu/bebo/www7/tutsdesc.html#koufaris * Tutorial: WHAT EVERY SOFTWARE ENGINEER SHOULD KNOW ABOUT HYPERMEDIA FOR DESIGNING WORLD WIDE WEB APPLICATIONS at ICSE'98 Conference, Kyoto, Japan, April 21, 1998 by Michael Bieber (New Jersey Institute of Technology) http://icse98.aist-nara.ac.jp/tutorial.html#TH14 CONFERENCE URLs - 4th Hong Kong WWW Symposium, April 9-12, 1998 http://www.ssrc.hku.hk/sym/98/DEFAULT1.HTML - 7th International WWW Conference, Brisbane, Australia, April 14-18, 1998 http://www7.conf.au - 20th Int'l Conference on Software Engineering, Kyoto, Japan, April 19-25, 1998 http://icse98.aist-nara.ac.jp/ - ACM Hypertext'98, Pittsburgh, USA, June 20-24, 1998 http://www.ks.com/ht98 - ICIS'98, Helsinki, Finland, December 13-16, 1998 http://cs-nt.jyu.fi/index.htm ********** III.B.3. Fr: Elaine G. Toms Re: Canadian Association for Information Science Annual Conference ** Final ** Call for Participation [NEW DEADLINE] 26th Annual Conference of the Canadian Association for Information Science/ Association canadienne des sciences de l'information 3-5 June 1998 Universite d'Ottawa, Ottawa, Ontario Information Science at the Dawn of the Millennium http://www.mgmt.dal.ca/slis/cais98 For more than a quarter of century Canadian information scientists have met to discuss the access, retrieval, production, organization, distribution, value, use and management of information. From those early days of examining computational ways of manipulating information through to investigations of information as communication, CAIS has provided a forum for presentation, discussion and debate. CAIS/ACSI '98 continues this noteworthy tradition. CAIS/ACSI '98 will be held at the Universite d'Ottawa, in Canada's national capital, Ottawa, Ontario. CAIS will be meeting with the 1998 Congress of Social Sciences and Humanities which will offer exceptional opportunities for creative contacts and fruitful between CAIS delegates and members of the other 80 learned societies that will meet concurrently. We seek submissions related to any aspect of information science, particularly those which exemplify the leading edge of our discipline. Submissions must include a 500 word extended abstract of the proposed paper. The author(s) name, complete address, phone, fax and email should be included on a separate sheet. Abstracts will be refereed; final papers will be published in the proceedings and presented at the conference. Preference will be given to papers that report research or debate underlying methodological/philosophical issues, rather than those that report on plans yet to be implemented. Deadlines >>> for abstracts: January 30, 1998. {New Date} >>> notification of acceptance: February 20, 1998. >>> for final papers (3,000-4,000 words) in electronic form: April 15, 1998. Doctoral candidates are especially invited to submit to the conference. CAIS will be awarding a full conference registration and one year membership to the best student submission. Student submissions must be single-authored. Please indicate student status on your submission. Initial submissions in print or electronic form (ASCII, Word or Wordperfect) should be sent to: Elaine Toms CAIS '98 Program Chair School of Library and Information Studies Dalhousie University Halifax, NS B3H 3J5 Voice: (902)494-2452 Fax: (902)494-2451 E-Mail: etoms@is.dal.ca ********** III.B.4. Fr: GEDYE Richard Re: UKSG 21st Annual Conference 21st Annual Conference and Exhibition & Annual General Meeting 30 March - 1 April 1998 University of Exeter Whether we are on the brink of a revolution in the way serials are produced and disseminated or simply witnessing a period of gentle transition and evolution no one can say. What UKSG can do, via the Annual Conference, is provide a forum for the discussion of the major issues surrounding this period of transition. A full and varied programme has now been finalised. Some excellent speakers and industry-experts will be giving papers and running workshops, and the exhibition will include all of the organisations involved in the information chain. Please take a moment to view the Conference Programme below. Further details and an booking form can be found at . For further information, please contact: Jill Tolson UK Serials Group Business Manager 114 Woodstock Road Witney OX8 6DY UK Tel: +44 (0)1993 703466 Fax: +44 (0)1993 778879 E-mail: uksg@dial.pipex.com Monday 30 March Opening of Conference: Welcome, Richard Hodson, Chair, UKSG and Alasdair Paterson, University Librarian, University of Exeter Keynote session: Mapping the futures Serials happenings: the information industry in transition James T Stephens, President, EBSCO Industries Inc, USA The next five years: a publisher's ambition: Robert Kiernan, Chairman and Chief Executive, Routledge Publishers Holdings Ltd Signposts to the future: the librarian's direction Alan MacDougall, Director of Library Services, Dublin City University, Ireland Knowledge management Managing information as a corporate asset: Nigel Horne, Director, KPMG IMPACT Programme Sharing expertise in practice: the way forward for knowledge management: Jacqueline Cropley, Consultant, formerly of Clifford Chance The long road to information integration: suggestions for the way forward: Suzie Alexander, European Sales Manager, Ovid Technologies Ltd Workshops Tuesday 31 March Product reviews, Newman Lecture Theatre Acquiring electronic products in the hybrid library: prices, licences, platforms and users: Peter Leggate, Keeper of Scientific Books, Radcliffe Science Library, University of Oxford Dataset purchasing options: united we save, divided we pay Mike Johnson, Director of CHEST & NISS Developments in the UK Pilot Site Licence: John Fielden, Director, CHEMS Consortial purchasing: the US experience with electronic products: Julia Gammon, Head, Acquisitions Department, University of Akron, USA Switching on serials: the British Library's Electronic Serials in Public Libraries project: Margaret Evans, Loughborough University MagNET and EARL: Internet access to newspapers and journals in public libraries: Hugh Marks, Technical Services Manager, Westminster Libraries & Archives, and EARL Serials Task Group convenor Workshops Scientific publication and the UK Research Assessment Exercise: an assessor's view: W F Vinen, University of Birmingham and Chair of the Physics Assessment Panel Journals: what makes the added value: Griffith Edwards, Editor-in-Chief, 'Addiction' and Emeritus Professor of Addiction Behaviour, University of London AGM, Newman Lecture Theatre including reports from Claus Pedersen, Chair, European Federation of Serials Groups, and Susan Davies, President, NASIG Wednesday 1 April Product reviews Exhibition viewing SuperJournal: the publishers' perspective: Michael Mabe, Director, Material Science Publishing, Elsevier Science Ltd Session chair: Richard Hodson, Blackwell's Information Services HEDS: accessing for the future, preserving the past: Simon Tanner, Digitisation Consultant, Higher Education Digitisation Service Hanging on to what we have got: economic and management issues in providing perpetual access in an electronic environment: Malcolm Smith, Director, British Library Bibliographic Services & Document Supply The world of 'Hello!' Sally Cartwright, Publishing Director, 'Hello!' Magazine Workshops 1. Serials pricing issues 2. A beginner's guide to electronic library formats 3. What next for organisational libraries? 4. Managing the electronic journal 5. Document delivery options 6. Bibliographic control of serials 7. Understanding licensing agreements 8. Evaluating and measuring usage of e-journals 9. Tendering for library services and supplies 10. Web design, structure and management 11. Electronic copyright permissions 12. Outsourcing 13. Linking quality information resources on the Web ****************************************************************** IV. PROJECTS IV.C.1. Fr: Maria Zemankova Re: Digital Libraries Initiative - Phase 2 Digital Libraries Initiative - Phase 2 Announcement Number NSF 98-63 (NEW) See: http://www.nsf.gov/pubs/1998/nsf9863/nsf9863.htm DUE DATES: FY 1998 Competition Letters of Intent: April 15, 1998 Full Proposals: July 15, 1998 FY 1999 Competition Letters of Intent: February 15, 1999 Full Proposals: May 17, 1999 INTRODUCTION: Innovative digital libraries research and applications will be jointly supported by the National Science Foundation (NSF), the Defense Advanced Research Projects Agency (DARPA), the National Library of Medicine (NLM), the Library of Congress (LoC), the National Aeronautics and Space Administration (NASA), the National Endowment for the Humanities (NEH) and others. This announcement describes the goals and features of Digital Libraries Initiative - Phase 2 (DLI-2), with particular attention on NSF interests and requirements. More detailed information on the domain-specific interests of the partnering agencies may be obtained from them. Within NSF, DLI-2 is administered by the Division of Information and Intelligent Systems (IIS) of the Directorate for Computer and Information Science and Engineering (CISE). Supporting Directorates include the Directorate for Education and Human Resources and the Directorate for Social, Behavioral and Economic Sciences. Contacts for these and related activities at other agencies are referenced at the end of this announcement. The current effort extends the joint NSF/DARPA/NASA "Research on Digital Libraries Initiative". Since announcement of that initiative, digital libraries research and applications efforts have proliferated; new communities of researchers, information providers and users have become engaged; the definition of a digital library has evolved; technologies have advanced; stores of digital content have increased dramatically; and new research directions have emerged. These advances point to a future in which vast amounts of digital information will be easily accessible to and usable by large segments of the world's population. To help achieve this, the Digital Libraries Initiative - Phase 2 plans to: Selectively build on and extend research and testbed activities in promising digital libraries areas; Accelerate development, management and accessibility of digital content and collections; Create new capabilities and opportunities for digital libraries to serve existing and new user communities, including all levels of education; Encourage the study of interactions between humans and digital libraries in various social and organizational contexts. Electronic information is being created by many people and data gathering instruments in many forms and formats, stored in many repositories around the world, and becoming increasingly interconnected via electronic networks. Digital libraries research is faced with the challenge of applying increasing computational capacity and network bandwidth to manage and bring coherence, usability, and accessibility to very large amounts of distributed complex data and transform it into information and knowledge. Since digital libraries are meant to provide intellectual access to stores of information, research in this initiative is concerned with developing concepts, technologies and tools to gain use of the fuller knowledge and meaning inherent in digital collections. For example, for users this means intelligent search, retrieval, organization and presentation tools and interfaces; for content and collections providers this means new information types, structures, document encoding and metadata for enhancing context; for system builders this means designing hardware and software systems capable of interpreting and implementing users' requests by locating, federating and querying collections to provide the user with the structured information sought. PROGRAM GOALS: The primary purposes of this initiative are to provide leadership in research fundamental to the development of the next generation of digital libraries, to advance the use and usability of globally distributed, networked information resources, and to encourage existing and new communities to focus on innovative applications areas. Since digital libraries can serve as intellectual infrastructure, this Initiative looks to stimulate partnering arrangements necessary to create next-generation operational systems in such areas as education, engineering and design, earth and space sciences, biosciences, geography, economics, and the arts and humanities. It will address the digital libraries life cycle from information creation, access and use, to archiving and preservation. Research to gain a better understanding of the long term social, behavioral and economic implications of and effects of new digital libraries capabilities in such areas of human activity as research, education, commerce, defense, health services and recreation is an important part of this initiative. Collaboration between academic, industry, non-profit and other organizations is strongly encouraged to establish better linkages between fundamental science and technologies development and use, through partnerships among researchers, applications developers and users. CATEGORIES OF SUPPORT: All awards for this announcement made by NSF will be as grants or cooperative agreements to academic institutions and qualified non-profit research organizations. Partnership arrangements with other groups are encouraged, including subcontracts with the single proposing organization. NSF expects to fund two general types of projects under this initiative: 1. Individual investigator research grants. Awards will not exceed $200,000 per year, for 1 to 3 years. 2. Multi-disciplinary group research projects. Awards will not exceed $1,200,000 per year, for 1 to 5 years. The number of awards will depend on the quality of proposals received, the availability of funds, and considerations for creating a balanced overall program. Total support for the initiative from federal sponsors is projected to be $40-$50 million over the 5 year Initiative. Awards will not exceed $1,200,000 per year, except in exceptional circumstances. Ideas for projects requiring support above this level should be discussed with the NSF program officer before proposal preparation. Please see the full announcement for additional information. NOTES: 1. We are seeking CREATIVE proposals at ALL levels that will significantly advance digital libraries research. I would like to encourage the Information and Data Management Program community to play an active role in the conception of innovative DLI proposals. 2. Although there is overlap between Knowledge and Distibuted Intelligence (KDI), Program Annoucement NSF 98-55 (http://www.nsf.gov/cgi-bin/getpub?nsf9855), in particular its Knowledge Networking component, and Digital Libraries, note that the Digital Libraries Initiative (DLI) is strongly interested in collections and users. Proposals for DLI should involve people making use of information (or make it clear that users' needs are the driving motivation of the proposed research). KDI is a fundamental research support program which does not stress collections of information. All KDI proposals, however, must be interdisciplinary. If the focus of a proposal is on information or user communities which now exist, it may be better in DLI; if the focus is on the creation of new information or communities, it is perhaps better in KDI. 3. Inquiries: Stephen M. Griffin Division of Information and Intelligent Systems (IIS) Program Director: Special Projects Digital Libraries Initiative Mail: National Science Foundation | e-mail: sgriffin@nsf.gov 4201 Wilson Boulevard, Room 1115 | phone: (703) 306-1930 Arlington, VA 22230 | fax: (703) 306-0599 4. Information on DLI - Phase I projects, see: http://www.cise.nsf.gov/iris/DLHome.html ********** IV.E.1. Fr: Mike Garris Re: RFC: Proposed Metadata Text Retrieval Conference (METTREC) REQUEST FOR COMMENT February 18, 1998 PROPOSED METADATA TEXT RETRIEVAL CONFERENCE (METTREC) This is a request for comments pertaining to possible participation in an upcoming METTREC conference. METTREC (Metadata Text Retrieval Conference) is a technology evaluation project created to examine the interfacing of Document Recognition and Information Retrieval technologies. The project is cosponsored by the National Institute of Standards and Technology (NIST) and the Department of the Defense (DoD). PART I - PROPOSED METTREC CONFERENCE This section proposes a set of tasks and the schedule necessary to initiate METTREC evaluations. The conference will be comprised of two tracks: Track 1.) Text-based recognition and retrieval evaluation Track 2.) Metadata demonstration TRACK 1. Text-based Recognition and Retrieval Evaluation The purpose of this track is to evaluate the impact of Optical Character Recognition (OCR) errors on Information Retrieval (IR). The OCR and IR systems participating in this track are not limited to commercially available products. However, participants must agree to the public release of their resulting scores and performance. This track can be described in terms of 1. data, 2. evaluation methodology, 3. design of experiment, 4. proposed size of data sets, and 5. schedule. 1. DATA A portion of the 1994 Federal Register (FR94) will be used in Track 1. FR94 documents are hierarchically structured, and FR94 pages are primarily 3-column text mixed with occasional figures, graphs, tables, maps, etc. The FR94 has been scanned at 400 dpi binary, totaling 249 daily issues, accounting for more than 67,000 pages. Images from one issue are available via anonymous FTP at "sequoyah.nist.gov" undersubdirectory "pub/mettrec." There will be ground truth files associated with each FR94 page. This ground truth will include the page's text in reading order and various metadata tags. A multi-lingual set of documents is also being prepared for potential use in this and future evaluations. 2. EVALUATION METHODOLOGY In order for these types of experiments to be economically feasible and able to scale up, automated scoring methods will be used. 2a. Evaluating OCR: A Scoring Package developed by NIST for METTREC will be used to align OCR text results with a page's ground truth text and compute word error rates. 2b. Evaluating IR: Information Retrieval performance will be evaluated using "known-item search." A set of queries will be used that are carefully constructed to reference a specific FR94 page. That specific page will be judged to be most relevant (of rank 1). By controlling query composition and which pages should be retrieved, various IR and OCR factors may be isolated for analysis. IR performance will be analyzed by computing statistics on how distant a query's known page is ranked from position 1. This type of analysis was used in the Spoken Document Retrieval Track in the TREC-6 conference. 3. DESIGN OF EXPERIMENT Proposed tasks do not limit participation to those with both OCR and IR capabilities. Furthermore, the tasks are being designed so as not to require "teaming" between participants. Instead the interface between OCR and IR will be open for analysis and refinement (a primary aim of the project). OCR results will be reported by multiple participants and subsequently disseminated to multiple IR participants. Participants who can conduct the retrieval tasks by means other than traditional OCR are also encouraged. A subset of queries will be crafted to represent specific IR-related issues, and another subset will be used to study the impact of document image quality on OCR and subsequently on IR. A training set (including a defined evaluation subset) containing FR94 page images, corresponding ground truth files, and a number of queries (with top-ranked pages identified) will be disseminated to all participants. Upon commencement of a testing period, testing material containing a new set of FR94 page images will be disseminated to OCR participants. OCR results reported back by a specified deadline will be disseminated to the IR participants along with a new set of test queries and the ground truth text for the page images. IR results will then be required by a specified deadline. All OCR and IR results will be scored and analyzed and a conference will be hosted by NIST. During the conference each participant will be given opportunity to discuss their experience and provide a description of their system. Participants are free to determine the level of system disclosure, but a discussion related to approaches, methods, and algorithms is strongly encouraged. 4. PROPOSED SIZE OF DATA SETS Training Set: 2,000 FR94 page images 2,000 FR94 ground truth files 100 of the 2,000 designated for evaluation 5 known-item queries Testing Set: 10,000 FR94 page images 10,000 FR94 ground truth files (IR participants only) 100 known-item queries 5. SCHEDULE The first METTREC conference is scheduled for the end of September. Working back from there, the following schedule is proposed: February - Request for comment March - Call for participation April - Training data disseminated June - Testing data disseminated for OCR July - Testing data and OCR results disseminated for IR August - Results reported to NIST September - Conference hosted by NIST TRACK 2. Metadata Demonstration In order to explore the use of metadata in METTREC, a secondary demonstration track is proposed. Track 2 will include a number of additional known item queries along with their top-ranked pages taken from the FR94 pages disseminated for Track 1. As an example, a query might be: "Find me the table containing the EPA guidelines on safe drinking water." where "table" is metadata and "EPA" (being found in an agency heading) is metadata. Participants, who are able, will be encouraged to demonstrate there ability to detect and/or utilize the metadata included in this track. Participants, who are not able to detect or utilize this metadata, will be asked to comment on how the metadata might be utilized by their systems and what steps would need to be taken to utilize the metadata. Information gathered from this demonstration track will be used to organize future metadata-based evaluations. PART II - QUESTIONS Responses are solicited to the following questions. Comments received will be used to determine the level of participatory interest within the Document Recognition and Information Retrieval communities, and they will be used to assess the feasibility of the proposed tasks and schedule. Please return responses within two weeks of the posting/receipt of this notice. All comments and any subsequent questions related to METTREC should be directed to: Michael D. Garris NIST 225/A216 Gaithersburg, MD 20899 mgarris@nist.gov 1. REGARDING YOUR PARTICIPATION 1a. Are you able/interested in participating in this evaluation? (Why or why not?) 1b. Can you recommend someone else who may be able/interested in participating? (If so, who?) Note: Please redistribute this notice liberally. 1c. Are you able to participate in the proposed OCR tasks, IR tasks, or both? 1d. If you are capable of OCR tasks, but not IR tasks, do you require "teaming" with a specific IR provider or will you consent to having your results disseminated to many IR participants? (Why, and to what extent?) 1e. Are you able to do the IR tasks in Track 1 (especially on very low quality document images) with input from other than traditional text-based OCR? (If so, are you able to demonstrate this in the evaluation?) 2. REGARDING THE PROPOSED EVALUATION SCHEDULE AND SCALE 2a. Are you willing to comply with the proposed schedule culminating with a conference at the end of September? (Why or why not?) 2b. If unable to comply with the proposed schedule, in what time frame might you be ready to participate? 2c. Will you have difficulty with the scale of the evaluation? (If so, why?) 2d. Is 10,000 pages too much for OCR participants to process? (If yes, how many pages are you willing to process?) 2e. Is 10,000 pages too little for IR experiments? (If yes, how many pages at a minimum should be used?) 2f. Are 50-100 known-item test queries sufficient? (Why or why not?) 2g. Are the size of proposed training sets sufficient? (If not, how many pages or queries would you prefer to have?) 2h. Is 100 pages of the training set sufficient for an evaluation set? (If not, how many pages?) 2i. What factors would you like to see analyzed in these types of evaluations? 3. REGARDING THE USE OF METADATA 3a. What metadata are you able to automatically detect? (What metadata are you able to detect in the FR94 pages?) 3b. If metadata were provided, how would you use it in your IR system? 3c. If you have the capability of using metadata in an IR system, what metadata can you use? (What metadata can you use from the FR94 pages?) 4. REGARDING LANGUAGE CAPABILITIES 4a. Does your OCR technology handle languages other than English? (If so, what languages?) 4b. Does your IR technology handle languages other than English? (If so, what languages?) 5. REGARDING THE USE OF MULTIMEDIA DOCUMENTS 5a. What multimedia capabilities does your technology support? 5b. What multimedia capabilities might your technology support in the future? 6. GENERAL COMMENTS AND SUGGESTIONS 6a. Do you have any other general comments or suggestions? ****************************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: nancy.gusack@ucop.edu Editorial Staff: Nancy Gusack nancy.gusack@ucop.edu Cliff Lynch (emeritus) cliff@cni.org The IRLIST Archives is set up for anonymous FTP. Using anonymous FTP via the host ftp.dla.ucop.edu, the files will be found in the directory /data/ftp/pub/irl, stored in subdirectories by year (e.g., data/ftp/pub/irl/1993). Search or browse archived IR-L Digest issues on the Web at: http://www.dcs.gla.ac.uk/idom/irlist/ These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.