Information Retrieval List Digest 365 (July 28, 1997) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-365 IRLIST Digest ISSN 1064-6965 July 28, 1997 Volume XIV, Number 27 Issue 365 ********************************************************** I. QUERIES 1. _Data Mining and Knowledge Discovery in MARC Databases_ 2. Candidates for _The Big Picture: Visual Browsing in Web and non-Web Databases_ 3. Neo-Conventional' Thesauri 4. NLP for LCSH 5. As You See It: Visualization of Thesauri Structure, Term Associations, and Relationships II. JOBS 1. Medical College of Ohio: Digital Services Librarian III. NOTICES A. Publications 1. JASIS B ooks Awaiting Reviewers 2. UNESCO's World Information Report 3. CORE Project Final Report B. Meetings 1. 1st CfP: ECML-98 C. Miscellaneous 1. The International Meanings of Color IV. PROJECTS E. Miscellaneous 1. AMICO Update ********************************************************** I. QUERIES I.1. Fr: Gerry McKiernan Re: _Data Mining and Knowledge Discovery in MARC Databases_ _Data Mining and Knowledge Discovery in MARC Databases_ I am in the process of preparing a review article on the application of data mining and knowledge discovery in databases (KDD) to MARC record databases. These techniques are efforts to identify 'hidden' information within large data sets. It is my belief that there exists important, yet overlooked, relationships within MARC records created through the descriptive and subject cataloging process that have not been as fully exploited as they might. A good example would be to identify significant works on a subject based upon associations within records of a given publisher, author(s) and subject heading and call number. I am particularly interested in the application of Data Mining and KDD as potential enhancement to future online public access systems (e.g OPACs). For a description of an associated project, folk are invited to review my 4T9R(sm) URL. It contains not only a project description but links to an excellent review article from DBMS magazine and to the outstanding _KDNuggets Data Mining and Knowledge Discovery Resource center at its new URL. The URL for 4T9R9(sm) is http://www.public.iastate.edu/~CYBERSTACKS/4T9R.htm As always, any and all suggestions, leads, critiques, opinion and/or positive (or negative) thoughts will be much appreciated. ********** I.2. Fr: Gerry McKiernan Re: Candidates for _The Big Picture: Visual Browsing in Web and non-Web Databases_ Candidates for _The Big Picture: Visual Browsing in Web and non-Web Databases_ With the Web publication of a news items in the Wall Street Journal late last month on Information Visualization, as well as the Web publication of a review article on Info Viz I wrote last year for the South African Internet magazine _Intelligence_, and the excellent paper prepared by Renee Davis for her _Internet Information Services_, I believe the time has come to solicit additional nominations of projects, services, products and/or research devoted to Information Visualization technologies that have been applied to enhance access and navigation of Information Spaces in both Web as well as non-Web databases, particularly MARC and other bibliographic formats. For a good overview, interested folk are invited to visit _The Big Picture_ at URL: http://www.public.iastate.edu/~CYBERSTACKS/BigPic.htm For a broad view on the application of Information Visualization for enhanced subject, interested folk may also wish to read the summaries of the presentations given at this year's University of Illinois Clinic on Data Processing in Libraries published in the May 1977 issue of _Library Hi Tech News_ (no. 142) I would appreciate learning about _any_ additional relevant efforts not currently profiled in _The Big Picture_ or under consideration and listed at the following related URL http://www.public.iastate.edu/~CYBERSTACKS/BigPic1.htm To facilitate communication about Information Visualization and its potential application to enhancing access to subject as well as structure within Web and non-Web databases, I will be establishing a marjordomo listserv later this summer. Once established and tested, I will let the universe know of its existence. ********** I.3. Fr: Gerry McKiernan Re: Neo-Conventional' Thesauri _'Neo-Conventional' Thesauri_ In my recent posting on structured browsing for 'user-controlled' information retrieval [Is Precision Too Precise?] and a follow-up on the application of Natural Language Processing (NLP) for Library of Congress Subject Headings (LCSH), I make note of the (potential) usefulness of such 'neo-conventional' approaches for facilitating browsing of Information Spaces, most notably with a database of MARC records (e.g. an OPAC). In considering the extensive of thesauri for bibliographic databases, it also seems relevant to consider the potential usefulness of 'navigating subjects' by applying NLP to controlled vocabularies. The hope here is to reveal [i.e. to permit users to discover] 'neo-conventional' relationships among terms and phrases within this structured vocabulary that is not offered by the syndetic structure of the cross references. Certainly Data Mining and Knowledge Discovery in Database technologies could also identify possible relationships with these controlled vocabularies as well. Latent Semantic Indexing (LSI) might also permit one to discover other kinds of associations not unearthed by NLP or DM or KDD. For my never-ending review of Data Mining and Knowledge Discovery in Databases (KDD), I would very appreciate learning of efforts or systems that have applied such methods as NLP LSI, DM/KDD, etc. to thesausri. I am aware of the highly-innovative work of Harter (Indiana), Jones et al. (City University, UK), Johnson (Illinois) and of course, doszkocs at NLM. As always, any leads, suggestions, citations, opinions, comments, critiques, criticisms, door prizes [:-], etc. are most welcome! ********** I.4 Fr: Gerry McKiernan Re: NLP for LCSH _Natural Language Processing (NLP) and Library of Congress Subject Headings_ In a recent posting on Structured Browsing for 'user-controlled' information retrieval in Web and non-Web databases, I briefly sketched an alternative to information access to that would make use of the associated subject headings associated with a given Library of Congress Subject Heading. One form of this 'neo- conventional' functionalities certainly is the 'Related' records and 'Sort' listing provided in the Library of Congress Experimental Search System [accessible via OnionPatch(sm) at: http://www.public.iastate.edu/~CYBERSTACKS/Onion.htm Yesterday I have learned about a highly innovative information system developed by folk at EOS International. Their systems respectively, Information Quest and their Q series OPAC, have among the most sophisticated 'neo-conventional functionality' of which I am aware [Of course, there are others profiled in Onion Patch (sm) [:-]. The URL for EOS International is: http://www.eosintl.com/ Access to details on their Information Quest (IQ) and Q series systems is accessible direct from http://www.eosintl.com/htdocs/products-services.html or from the Products and Services link on the base homepage. Among the 'robust searching' technologies used in their IQ system are Natural Language Processing (NLP) and a Word Expansion feature that uses NLP to search for 'concepts, word relationships, and semantic meaning' and a 'Related terms' function and 'Query-By-Example'. In consideration the obvious benefit of NLP for identifying and providing navigation of conceptual and semantic information spaces, it occurred to me that NLP would be the ideal method by which one could create the kind of non-syndectic associated associations of LC subject headings that I seek in a structured browsing environment. [BTW: I'm calling this function 'Explore' It will permit users to explore the relationships of subjects authors, publishers, series, etc. that are not facilitated with conventional search options and which are not pre-defined by conventional systems (e.g narrower, broader, terms). I would greatly appreciate learning other systems that have applied (or are considering applying) Natural Language Processing to MARC and other bibliographic databases, particularly in providing enhanced 'subject navigation' by NLP of LC subject headings. As always, any leads, suggestions, citations, opinions, suggestions, comments, criticisms, critiques, etc., etc., etc. are most welcome. ********** I.5. From: Gerry McKiernan Re: As You See It: Visualization of Thesauri Structure, Term Associations, and Relationships _As You See It: Visualization of Thesauri Structure, Term Associations, and Relationships_ In considering alternatives to the current syndetic relationship provided within conventional thesauri (including the LCSH), it has occurred to me that visualization of both the conventional structure of the thesauri as well as its 'neo-conventional' structure would greatly enhance the understanding and use of the thesauri in either mode. For my never-ending review [Yes, it' still never-ending], I'd very much appreciate learning about effort that have applied either conventional visualization techniques [whatever they may be?] to thesauri as well as one's ideas about the potential value of applying the Information Visualization technologies profiled in _The Big Picture_ to either or both conventional or 'neo-conventional' thesauri structures. _The Big Picture_ is accessible at: http://www.public.iastate.edu/~CYBERSTACKS/BigPic.htm By 'neo-conventional' here I mean the non-explicit relationships that exist between thesauri descriptors/ terms that are _not_ offered by the thesauri itself. Harter's and Cheng's (Indiana) work on co-linked descriptors is a good example of what I would consider 'neo-conventional. For details, see their article in JASIS 47 (1996):311-325 and/or an abstract at http://ezinfo.ucs.indiana.edu/~harter/colinked.html IMHO This is quite an important work for two reasons: 1) It confirms my feeling (how scientific [:-] and belief [how non-scientific [:-] that users don't make use of the syndetic as much as they could/should.might. The study documents user preference for 'associated' terms or phrases that in some way relate to their worldview of Information Space. [This is very good because it supports my belief in neo-conventional structured browsing [:-] and 2) It provides a good general critique of the highly-subjective and (shall we say it [Yes Gerry say it!] highly-idiosyncratic and highly- inconsistent nature of what defines the scope of a Broader Term, A Narrower Terms and the ever-mysterious [:-] Related Term structure, in many (but certainly not all) widely-used and/or applied thesauri. [I'd very much appreciate learning about other critiques of thesauri structure; any and all relevant citations (particularly review articles) are most welcome] In my first phase literature review I've identified one (perhaps three) key articles on visualization of thesauri. It's: Arents, Hans C. and Bogaerts, Walter F.L. Concept-based retrieval of hypermedia information: from term indexing to semantic hyperindexing. _Information Processing and Management_ 29(3) (1993): 373-86 . As always, any leads, citations, suggestions, comments, critiques, criticisms, campaign contributions (oops! [:-]!) would be very much appreciated. Certainly, any work relating to the visualization of LCSH in OPACs would be of great interest! Regards, Gerry McKiernan Curator, CyberStacks(sm) Iowa State University Ames IA 50011 gerrymck@iastate.edu http://www.public.iastate.edu/~CYBERSTACKS/ ********************************************************** II. JOBS II.1. Fr: BAMCNAMEE@MAGNUM.MCO.EDU> Re: Medical College of Ohio: Digital Services Librarian Medical College of Ohio Raymon H. Mulford Library POSITION: Digital Services Librarian DESCRIPTION: The Raymon H. Mulford Library seeks a team-oriented, flexible, innovative individual with a strong commitment to customer service and keeping pace with rapid technological change. Provides technical development and support for the Library's digital resources and services. This includes work as lead implementor for the Library's participation in OhioLINK and serving as principle system administrator for the Library's Innovative Interfaces integrated library system. Works with colleagues to design and implement programming solutions to support digital services. Works with faculty on informatics projects. Serves as the Library's Webmaster. Builds collaborative project partnerships with the College's Academic Informatics and Information Systems departments. Troubleshoots software and hardware problems in user service areas. Maintains awareness of user access issues by assisting in reference services. This position is slated for future faculty appointment. ENVIRONMENT: The Mulford Library has a Library Services division with a staff of 19FTE and an Educational Technology Services division with a staff of 15FTE. The Mulford Library is the home of "Instructions to Authors" on the WWW. The Medical College of Ohio is a free-standing, state supported institution with schools of medicine, nursing, allied health and graduate studies, a hospital and five Area Health Education Centers. It is located in a metropolitan area with a population of over 800,000. QUALIFICATIONS: Required: -Masters degree in library and information science from ALA accredited library school -Working knowledge of integrated library systems -Familiarity with Internet architecture -Knowledge of HTML -Willingness to work with multiple platforms, e.g. UNIX, PC & Mac; Novell networks -Strong client service focus -Demonstrated ability to work well with colleagues, faculty, students and staff -Programming skills, e.g. Visual Basic, C++ -Ability to train both library staff at all skill levels -Effective written and oral communication skills -Project leadership or proven ability to initiate, plan and complete projects. Preferred but not required: -Experience with OhioLINK and/or Innovative Interfaces systems -Knowledge of SGML, Perl/CGI or Java/JavaScript -Familiarity with digital image databases -2 or 3 years professional experience in an academic library -Experience in a health sciences environment. COMPENSATION: Salary range is $41,600 - $47,600 APPLICATIONS: Review of applications will begin September 2, 1997 and will continue until the position is filled. Send application letter, resume and the names, addresses and phone numbers of three references to: Barbara McNamee, Assistant Director of Library Services; R.H. Mulford Library; Medical College of Ohio; 3045 Arlington Avenue; Toledo, Ohio 43614-5805. AA/EEO ********************************************************** III. NOTICES III.A.1. Fr: Terrence Brooks Re: JASIS Books Awaiting Reviewers Potential book reviewers for JASIS - Journal of the American Society for Information Science are invited to visit http://weber.u.washington.edu/~tabrooks/review.html Terrence A. Brooks Graduate School of Library and Information Science University of Washington Box 352930 Seattle, WA 98195-2930 Voice:(206) 543-2646 Fax: (206) 616-3152 Email: tabrooks@u.washington.edu WWW: http://weber.u.washington.edu/~tabrooks/ ********** III.A.2. Fr: Michel Menou Re: UNESCO's World Information Report The first World Information Report 1997/1998, published by UNESCO, has just been released. This 390 A4 pages book is meant to provide a _comprehensive and topical worldwide picture of archive, library and information services on the five continents_. In Part 1, libraries and informatin services on the one hand and archives on the other, are presented in 13 chapters, each devoted to a particular region; audiovisual archives are the subject of one worldwide chapter. Part 2 reviews the infrastructures for information work with 5 chapters devoted to Computer developments, Multimedia technologies, Telecommunication technologies, The Internet, and Design criteria for large library buildings. Part 3 offers 8 chapters in which a number of issues and trends are discussed: The information society, Information highways, Economic intelligence, Book publishing, Access to archival holdings and unique library materials, Presentation of archival holdings and unique library materials, Copyright in the electronic age, International co-operation and assistance. The report is edited by Yves Courrier (UNESCO) and ASIS member Andrew Large (GSLIS, McGill). The 32 authors were drawn from a variety of countries, but most authors in part 2 and 3 are from the industrialized countries. As one may expect, ASIS is well represented with Josephine Sison, Ching-chih Chen, Blaise Cronin, Geoffrey McKim and Charles Oppenheim. The report is also available in French and soon in Spanish. A few chapters in English may be read at http://www.unesco.org/cii/wirerpt/vers-web.htm Orders through UNESCO Publishing (http://www.unesco.org/publishing) or through local distributors of UNESCO publications (ISBN 92-3-103341-7; price 275FF plus 30FF surface mail). Dr. Michel J. Menou CIDEGI Conseil et Formation en gestion de l'information Consulting and Training in information management Mail: 13 rue Nationale, F-49350 Les Rosiers sur Loire Email: mmenou@imaginet.fr ********** III.A.3. Fr: Richard Entlich Re: CORE Project Final Report The final report of the CORE (Chemistry Online Retrieval Experiment) Project has been published in the form of a complementary pair of articles. Though CORE preceded the Web, its findings should be of more than just historical interest. CORE addressed many of the technical problems still being faced by today's digital libraries, especially libraries of retrospectively digitized materials. Also, the CORE system collected individual user data of a type and on a level of detail not usually possible with Web-based systems. Note: Despite the cover dates of the journals mentioned below, both articles were actually published last month. The "later article" referred to in the first abstract is in fact the article described in the second abstract, which nevertheless was published in a journal bearing an earlier cover date. Citations and abstracts follow: 1) "Making a Digital Library: The Contents of the CORE Project," ACM Transactions on Information Systems, Vol. 15, No. 2, April 1997, pages 103-123. Abstract: The CORE (Chemical Online Retrieval Experiment) project is a library of primary journal articles in chemistry. Any library has an inside and an outside; in this article we describe the inside of the library and the methods for building the system and accumulating the database. A later article will describe the outside (user experiences). Among electronic-library projects, the CORE project is unusual in that it has both ASCII derived from typesetting and image data for all its pages, and among experimental-library projects, it is unusually large. We describe here (a) the process of scanning and analyzing about 400,000 pages of primary journal material, (b) the conversion of a similar amount of textual database material, (c) the linking of these two data sources, and (d) the indexing of the text material. 2) "Testing a Digital Library: User Response to the CORE Project," Library Hi Tech, Vol. 14, No. 4, consecutive issue #56, 1996, pages 99-118. Abstract: The CORE Project was one of the first large-scale electronic journal efforts to use the Internet to deliver articles with graphics and complex typography directly to end-users. Utilizing journals of the American Chemical Society (ACS), project collaborators ACS, Bellcore, OCLC, Chemical Abstracts Service and Mann Library at Cornell University developed a proprietary system which allowed users to view and print articles as well as search the full-text from over four years of 20 journals. Analysis of transactions logs and other user data reveals behavioral trends and problems which have relevance for the developers of today's World Wide Web based scholarly electronic journals. Both articles were published under the authorship of Richard Entlich, Lorrin Garson, Michael Lesk, Lorraine Normore, Jan Olsen and Stuart Weibel. Comments and criticism are welcome. Please send to: Richard Entlich Mann Library, Cornell University Ithaca, NY 14853-4301 rge1@cornell.edu ********** III.B.1. Fr: Johannes Fuernkranz Re: 1st CfP: ECML-98 First announcement TENTH EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML-98) Chemnitz, Germany, April 21-24 1998 GENERAL INFORMATION: The 10th European Conference on Machine Learning (ECML-98) will be held in Chemnitz (ex- Karl Marx Stadt, near Dresden), Germany, from April, 21st to 24th 1998. Submissions are invited that describe empirical, theoretical research in all areas of machine learning. In addition, papers from related disciplines (for instance, information retrieval, pattern recognition, cognitive modeling, evolutionary computation, artificial neural networks, grammatical inference, reinforcement learning, etc.) that deal with adaptive intelligence, (semi-)automated knowledge acquisition, or (semi-)automated knowledge organization are welcome. Submissions that describe the application of machine learning methods to real-world problems are encouraged (for instance, natural language processing, robotics, data mining, etc.), but such submissions should speak of general issues of machine learning, perhaps illustrating novel learning methods or demonstrating the utility of established methods in previously unexplored settings. IMPORTANT DATES: Submission deadline: 31 October 1997 Conference: 21-24 April 1998 IMPORTANT ADDRESS: Submitted papers should be sent to : Claire Nedellec and Celine Rouveirol LRI, Bat 490 e-mail: cn/celine@lri.fr Universite Paris-Sud Tel: +33 (0)1 69 15 66 26 F-91405 Orsay Fax: +33 (0)1 69 15 65 86 FRANCE ********** III.C.1. Fr: Surya Vanka Re: The International Meanings of Color I thought my research might be of interest to your publication. Allow me to introduce myself and my research in a little more detail. I teach industrial design at the University of Illinois at Urbana-Champaign, and practice as a color design consultant. Over years of practicing and teaching design in North America, Europe, Asia and Australia, I have been struck by the dramatically different reactions that consumers and markets in different countries can have to the same color. This is hardly surprising because colors have very different meanings in different cultures, and these meanings are often transferred on to products. Yet, designers who are increasingly designing for culturally diverse markets, have little access to information on the culture specific color meanings, and virtually no methods or tools to assist in this complex task. In response, for the last several years, I have been researching the relationship between design and cross-cultural meanings of color, and developing tools to assist designers address color issues in an informed manner. First, I have documented cases studies from around the world where color has played a primary role in product success or failure in the international marketplace. Next, drawing heavily from research methods and data from anthropology and ethnography, I have developed a database of color semantics in 35 major cultures. Further, I have translated this information into an interactive multimedia reference software called 'ColorTool: The International Meanings of Color'. Finally, based on studies of corporate and consulting design offices, I have developed methods that color designers can use in conjunction with ColorTool to effectively select colors for globally marketed products. Regarding ColorTool itself - it's role is primarily that of a handy CD-ROM based reference tool that designers can use in a few ways: browsing for inspiration, doing rapid directed searches in a number of ways, and for evaluating color decisions. This software is currently in pre-release form, and is being beta tested in educational settings around the world. Some of the larger corporations and design consultancies using this software in beta form are are Fitch (Columbus), IDEO (Palo Alto), ITT (Indianapolis), Microsoft (Seattle), and Samsung (Seoul). BIOGRAPHICAL SKETCH: Professor Surya Vanka teaches in the Department of Industrial Design at the University of Illinois at Urbana-Champaign, and is a Fellow at the Center for Advanced Study, University of Illinois. His professional history includes commercial success as a designer of numerous software and hardware products. Over the last number of years, he has devoted himself to developing methods and software tools that assist designers to develop globally competitive products. He has published widely in international publications of the Human Factors Society, Industrial Designers Society of America, Environmental Design and Research Association, ACM SIGGRAPH, Aspen Design Conference, and ICSID InterDesign, and has lectured in Australia, Canada, Finland, India, Korea, Taiwan, United Kingdom, and the United States. His research has received major grants from the Alfred Sloan Foundation, Apple Education, and IBM Educational Grants. His work has appeared in Form, ID, WIRED, Popular Science, Futurist, USA Today magazine, Interactions, Instructional Microcomputing, BBC Radio, National Public Radio, and Channel 15 Television. Regards, Surya Vanka Professor Surya Vanka Department of Industrial Design University of Illinois at Urbana-Champaign 143 Art & Design Building Champaign IL 61820 USA Office: +217.333.1796 Studio: + 217.359.3459 Fax: +217.244.7688 E-mail: s-vanka1@staff.uiuc.edu ********************************************************** IV. PROJECTS IV.E.1. Fr: Craig A Summerhill Re: AMICO Update ART MUSEUM IMAGE CONSORTIUM (AMICO) UPDATE The members of the Association of Art Museum Directors are investigating the formation of a consortium to make their digital documentation collectively available to the educational community. Representatives of major North American Museums are meeting to define the terms of their collaboration and to outline the nature of a common digital library of text, image and multimedia data. Details about this emerging organisation and the formation of the consortium can be found at http://www.amn.org/AMICO. The report of the group's most recent meeting is also available at this site. Questions regarding AMICO can be directed to: Maxwell Anderson Liason for Information Technology Association of Art Museum Directors max_anderson@ago.net or Jennifer Trant or David Bearman Archives & Museum Informatics jtrant@archimuse.com or dbear@archimuse.com ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: NCGUR@UCCMVSA.UCOP.EDU Editorial Staff: Nancy Gusack ncgur@uccmvsa.ucop.edu The IRLIST Archives is set up for anonymous FTP. Using anonymous FTP via the host ftp.dla.ucop.edu, the files will be found in the directory /data/ftp/pub/irl, stored in subdirectories by year (e.g., data/ftp/pub/irl/1993). Search or browse archived IR-L Digest issues on the Web at: http://www.dcs.gla.ac.uk/idom/irlist/ These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.