Information Retrieval List Digest 116 (June 15, 1992) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-116 ========================================================================= Date: Mon, 15 Jun 1992 16:49:51 PST Reply-To: "Information Retrieval List" Sender: "Information Retrieval List" From: IRLIST Subject: IR-L Digest, Vol.IX,No.20, Issue 116 IRLIST Digest June 15, 1992 Volume IX, Number 20 Issue 116 ********************************************************** I. NOTICES B. Publications Announcements 1. Book on RLIN Database Available 2. The Global Jewish Database C. Miscellaneous 1. Demonstrations and Exhibits at ACL-92 II. QUERIES A. Questions and Answers 1. Question about IR-System Performance B. Requests for Information 1. Generalized Optimal Binary Search Tree IV. PROJECT WORK C. Miscellaneous 1. Refereeing Request ********************************************************** I. NOTICES I.B.1. Fr: Jennifer Porro Re: Book on RLIN Database Available NEW ILLUSTRATED BOOK ON RLIN DATABASE AVAILABLE April 29, 1992 -- The Research Libraries Group (RLG) has just published DISCOVERING RLIN, a 52-page illustrated booklet designed to introduce reference librarians and scholars to contents of the Research Libraries Information Network (RLIN) database. The booklet includes sections on materials ranging from current journal citations to incunables and rare books. It covers unique archival materials cited in RLIN (such as letters and manuscripts), photographs and other still images, film and videotape, sound recordings, non-Roman script material, computer files, and realia. For more information, contact the Distribution Services Center, The Research Libraries Group, Inc., 1200 Villa Street, Mountain View, CA 94041-1100. ********** I.B.2. Fr: Schild Uri Re: The Global Jewish Database BAR-ILAN UNIVERSITY The Global Jewish Database (The Responsa Project) Bar-Ilan University has begun the worldwide marketing of its Global Jewish Database on CD-ROM with an advanced retrieval program which runs on a PC. The "Taklit-Torah" contains the Tanach, Midrash, Babylonian Talmud with Rashi's Commentary, Jerusalemite Talmud and Rambam. The "Taklit-Shoot" contains the Tanach, Midrash, Babylonian Talmud with Rashi's Commentary, Rambam and 253 books of Responsa covering a period of over a 1000 years. For complete information, our marketing representative in North- and South-America is: Ofrer Inc. 1 Executive Dr. Fort Lee, NJ 07024 Tel: 201-947-5090 Fax: 201-947-1780 / 516-295-4196 e-mail: 0005332241@MCIMAIL.COM Marketing in the rest of the world (including Israel): The Responsa Project Bar-Ilan University Ramat-Gan 52900 Israel Tel: 972-3-5318411 (24 hours) Fax: 972-3-344-622 e-mail: R70018@BARILAN.BITNET ********** I.C.1. Fr: Don Walker Re: Demonstrations and Exhibits at ACL-92 We are encouraging exhibits and demonstrations at ACL-92, the 30th Annual Meeting of the Association for Computational Linguistics, which will be held from 28 June through 2 July in Newark, Delaware, in the United States. Authors of papers or academics without grants or contract support may present their demonstrations without making a donation, but universities and research labs that demo research rather than commercial software, and small entrepreneurs, are requested to donate $125 to cover expenses. Commercial software and hardware enterprises are requested to donate $350. If you are interested or can suggest names to contact, please send email addresses and/or phone numbers to: Daniel Chester University of Delaware Computer and Information Sciences Newark, DE 19716, USA chester@dewey.udel.edu +1-302 831-1955 ********************************************************** II. QUERIES II.A.1. Fr: PAAI@KUB.NL Re: Question about IR-System Performance We are currently preparing tests that should say something about the relative performance of some IR-systems. One of the systems under scrutiny is TOPIC from Verity Inc. Verity is very vague about matters, not to say downright unhelpful, but we have found a really close resemblance between TOPIC and the well-published RUBRIC system. What we would like to know is, if TOPIC is just a 'nom de guerre' of RUBRIC or that the Verity crowd copied RUBRIC for commercial use without the commitment of the people that developed RUBRIC. In other words: does anybody out there know what exactly are the connections between the two systems? Thanks in advance, Hans Paijmans PAAI@KUB.NL Tilburg ********** II.B.1. Fr: USENET News System Re: Generalized Optimal Binary Search Tree I'm looking for references on a generalization of Knuths' optimal binary search tree; namely he mentions on p450, Exercise 33 in Vol 3 that the optimal tree can be generalized to different costs on left and right branches of the tree. He refers to [Stanfel, 1970, JACM, p508-517]. Does anyone have any more recent references? Thanks in advance David Spuler James Cook University of North Queensland, Australia ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Stevan Harnad Re: Electronic Archiving of Raw Scientific Data J. Skoyles on Public Electronic Archiving and Retrieval of Raw Scientific Data The article below has just been published in PSYCOLOQUY. Commentary is now invited. Commentaries should not exceed 100 lines. Each should have a keyword-indexable title and the commentator's full name and affiliation. Please submit commentaries to sci.psychology.digest or to: psyc@pucc.bitnet or psyc@pucc.princeton.edu ----------------------------------------------------------------------- psycoloquy.92.3.29.data-archive.1.skoyles Friday May 29 1992 ISSN 1055-0143 (21 paragraphs, 2 references, 182 lines) Copyright 1992 John R. Skoyles FTP INTERNET DATA ARCHIVING: A Cousin for PSYCOLOQUY John R. Skoyles Department of Psychology University College London WC1E 6BT, UK ucjtprs@ucl.ac.uk 1.0 ABSTRACT: American Psychological Association (APA) journals do not publish raw data, hence data are effectively inaccessible. I propose that authors of research papers should transfer their data to an Internet site so it can be accessed over Internet by anonymous ftp. I suggest that such data archiving would (1) make fraud easier to detect, (2) encourage scientific criticism and (3) aid the scientific process in general. Nor should it be difficult to implement. KEYWORDS: data archiving, deception, electronic retrieval, error detection, ftp, fraud, meta-analysis, statistics 1.1 Experimental data are rarely published. Usually we are happy with their author's own statistical treatment. But not always. Researchers do not always fully analyse their data; sometimes editors restrict their publication space; and sometimes we have an idea we would like to try out on those data. It would be nice if the experimental data we read about were easy to access. I suggest that the approaching-universal use of computers and the Internet mail and file transfer system have made this possible. PSYCOLOQUY is archived and easily accessed through anonymous ftp: There is no reason why archived research data should not be equally accessible. Though there are several potential problems with ftp archiving of published data, the benefits would, I believe, vastly outweigh them. 2.1 Here follows a case for the ftp archiving of data published in APA (American Psychological Association) journals. I raise a few objections and last consider how it might be implemented. Note that when I refer to ftp this also applies to other forms of electronic data transfer. 3.1 First, electronic data archiving should be easy to implement and will become increasingly so. Most researchers now (unlike, say, even two years ago) would have little trouble archiving their data upon publication. Most Results sections are based upon computer analyzed ASCII data files (usually by a statistical package such as SPSS or BMDP). Most researchers should have their raw data stored in a form (i.e. file and subdirectory names) which makes it easy for other researchers to use. The commands and procedures for transferring it to a central data archive will be familiar to most psychologists (if not, most departments have people who will help). Of course, all the details about the research will be contained in the published paper, so these need not be stored. Indeed, the names of journals, their volume and issue numbers, make a convenient directory and subdirectory structure for organising the archive. There is something self evident about what data are contained in /JEPHPP/18/1/SMITH/EXP1. And just as it is easy to MSEND data to an archive so it is easy to MGET them for reanalysis. 3.2.1 Second, the scientific ethic is to make error correction as easy as possible. Scientists are not always entirely competent or honest. Numerous cases of fraud and intellectual dishonesty have occurred in psychology (as elsewhere in science). Researchers are subject to enormous pressures to publish but unfortunately this normally requires positive findings. This puts pressure on researchers to rerun analyses (changing criteria for categorising data, excluding subjects, treating missing data, etc.) when only negative findings turn up. It is not clear how many researchers resist these pressures on the integrity of data analysis. At present, it is difficult to check. In a recent case reported in *Science*, two psychologists were only able to check the data analysis of another psychologist through the intervention of lawyers (Palca 1991). 3.2.2 There is public disquiet in the US Congress (notably, on the part of Congressman John Dingell) concerning fraud and intellectual dishonesty in science. Research on published fraudulent papers has revealed many defects (Stewart & Feder 1987). It is likely that any archived data would contain even more accessible and noticeable defects (in their data distributions, treatment and analysis). Archiving data would thus make it easier to detect both fraud and intellectual dishonesty. 3.3 Third, much honestly obtained and analyzed data is incompetently handled, yut many legitimate criticisms never arise because of difficulties accessing data. At present, if you suspect that a researcher's own analysis gives only part of the story or is misleading, you face an involved process of contacting them for the original data (something inconvenient to all concerned). Archiving data would increase the opportunities for legitimate criticism of published work. 3.4 Fourth, researchers ask different questions. Sometimes a researcher may wish to reanalyse data to answer questions the original authors ignored. People carrying out meta-analyses will often want to check the quality of the work they are using. At present this is not possible. 3.5 Fifth, students could gain much by examining real research papers and then "playing around" with their data, seeing the affects of different data-analytic strategies. They might even even find things overlooked by their authors. 3.6 Sixth, much data is accidentally lost (despite APA's requirement that authors retain their data for a number of years). An ftp archive would make a convenient data backup. 3.7 Seventh, scientific papers are printed on paper -- this, not the nature of science, is the reason data are not normally made accessible at this time. Science is about open communication that maximally exposes ideas and arguments to criticism (one legitimate criticism of an idea is the way its data are handled). Printed paper is a convenient means for opening written ideas to criticism, but it is unsuitable for making data accessible to criticism (it limits the quantity which can be published and communicates in a form that is inconvenient for computer reanalysis). Print has until recently been the only means for disseminating scientific ideas and data. Hence the tradition has arisen of limiting the dissemination of data. We should recognise the opportunity that electronic archives provide for breaking with this. 4.0 There are some reasons against ftp archiving: 4.1 Certain classes of data (e.g., clinical data) may have to be excluded to preserve the confidentiality and privacy of those from whom it is collected. This constraint does not apply to large portions of psychology, however, such as research on animals, reaction time studies on student subjects, or computer simulations. 4.2 Researchers certainly have the right to the "first go" at their data. However, the fact of publication, unless contrary notice is given, usually signifies that the data have already been substantially analyzed, and frequently no further analysis is intended. 4.3 There is another entirely invalid objection. Many researchers will be uncomfortable with their data being ftp archived because none of us are perfect. If our data can be reanalyzed we may be shown to have carried out, quite unintentionally, inappropriate or misleading analysis. To some extent the present state of affairs is quite convenient for hiding the fact that many researchers could be better statisticians and could keep better records. 5.0 Since impracticability may be an objection, I describe how an ftp archive might work: 5.1 The archive would have to be moderated by an archivist. Journal editors, for example, could contact the archivist, who would in turn contact the paper's chief author, providing a password and a temporary directory into which raw data files could be transferred. Researchers would be free to create the subdirectories they felt best organised the data and to write a brief contents file. The archivist would transfer the files to a permanent directory. A standard note on the front page of the published paper would state whether its data had been archived. 5.2 I suggest that not only the raw data be stored but also the statistical and data analysis programs (SPSS or BMDP; or uncomplied Basic, Pascal or C) used to analyse them. Without these programs, tracing the transformation of the raw data into the reported statistical findings would be much more difficult. 5.3 Parallel to the archive there should be a directory for comments by people who have accessed the data, to record their findings. Anyone wanting to reexamine anyone's data would be interested in any previous reanalyses, good and bad. 5.4 There is no reason such a data archive could not grow to cover non-APA journals, theses, and nonpublished data (for example, unpublished negative findings). 5.5 Such a system would of course involve some cost and effort, perhaps even some inconvenience. However, with the public and congressional concern about whether scientists are maximally ensuring the integrity of their data, a ftp archive would show a commitment from the psychological community to ensuring honesty in published psychological research. REFERENCES. Palca, J. (1991). News and Comment: Get-the-lead-out guru challenged. Science 253: 842-844. Stewart, W. W. & Feder, N. (1987). The integrity of the scientific literature. Nature 325: 207-214. ------------------------------------------------------ PSYCOLOQUY is a refereed electronic journal (ISSN 1044-0143) sponsored on an experimental basis by the American Psychological Association and currently estimated to reach a readership of 20,000. PSYCOLOQUY publishes brief reports of ideas and findings on which the author wishes to solicit rapid peer feedback, international and interdisciplinary ("Scholarly Skywriting"), in all areas of psychology and its related fields (biobehavioral, cognitive, neural, social, etc.) All contributions are refereed by members of PSYCOLOQUY's Editorial Board. Target articles should normally not exceed 500 lines in length, commentaries and responses should not exceed 200 lines. All target articles must have (1) a short abstract (<100 words), (2) an indexable title, (3) 6-8 indexable keywords, and the (4) author's full name and institutional address. The submission should be accompanied by (5) a rationale for soliciting commentary (e.g., why would commentary be useful and of interest to the field? what kind of commentary do you expect to elicit?) and (6) a list of potential commentators (with their email addresses). Commentaries must have indexable titles and the commentator's full name and institutional address (abstract is optional). PSYCOLOQUY also publishes reviews of books in any of the obove fields; these should normally be the same length as commentaries, but longer reviews will be considered as well. Authors of accepted manuscripts assign to PSYCOLOQUY the right to distribute their text electronically and to archive and make it permanently retrievable electronically. However, they retain the copyright, and after it has appeared in PSYCOLOQUY authors may republish their text any way they wish -- electronic or print -- as long as they clearly acknowledge PSYCOLOQUY as its original locus of publication. However, except in very special cases, agreed upon in advance, contributions that have already been published or are being considered for publication elsewhere are not eligible to be considered for publication in PSYCOLOQUY, Please submit all material to psyc@pucc.bitnet or psyc@pucc.princeton.edu Stevan Harnad Department of Psychology Princeton University harnad@clarity.princeton.edu / harnad@pucc.bitnet / srh@flash.bellcore.com harnad@learning.siemens.com / harnad@elbereth.rutgers.edu / (609)-921-7771 ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle engle@cmsa.berkeley.edu or meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOG ***, where *** is the month and day on which the issue was mailed, to LISTSERV@UCCVMA.BITNET. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.