Information Retrieval List Digest 008 (January 23, 1990) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-008 IRLIST Digest January 23, 1990 Volume VII Number 2 Issue 8 *************************************************************** Continued from Volume VII Number 2, Issue 7 *************************************************************** I. NOTICES: Meeting announcements/Calls for papers (Last minute urgent additions) A.13 CHI '90 Workshop on Multimedia and Multimodal Interface Design III. JOB ANNOUNCEMENTS (last minute urgent additions) 1. ETH, Zurich, Switzerland IV. PROJECTS: Initiatives and proposals / Bibliographies Abstracts / Miscellaneous D.4. Survey Corpora *************************************************************** I. NOTICES I.A.13. Fr: Paul (P.M.) Brennan Re: CHI '90 Workshop on Multimedia and Multimodal Interface Design Call for Participation CHI '90 Workshop on Multimedia and Multimodal Interface Design The workshop on multimedia and multimodal interface design is scheduled as part of the CHI '90 Conference in Seattle, Washington, and will be held on Sunday and Monday, April 1-2, 1990. A large number of disparate technologies contribute to an understanding of how to design a multimedia, multimodal interface. These vary from the use of vision, gesture and sound to applications, tools for interface construction, and enabling technologies, such as video disk, and image processing. The primary objective of the workshop is to integrate what we presently know of these technologies and consider the areas that need to be explored for future development of multimedia in the computer interface. The following topics will be explored by the participants: the integrated use of vision, gesture, and sound to interact in an organized dialogue (modality integration); the use of metaphor and composite screen paradigms in a multimedia environment; applications that lend themselves to the use of multimedia, such as electronic books, data bases, and aids for users with special needs. Five problems are to be addressed by the participants in connection with the above topic areas: current problems, future directions, backgrounds and specialties of those in multimedia R & D, and tools and methodologies. Attendance at the workshop is by invitation. It is intended that the workshop will bring together approximately twenty people who have a background in one or more of the topic areas. Individuals wishing to participate are requested to submit five copies of a position paper presenting their views on one of the topic areas. The position paper should be two to five pages long, double-spaced, and should include a brief description of the individual's interest in the area. It would be helpful to the Program Committee to enclose a previously published paper or technical report authored by the applicant on the topicof the position paper, however, position papers will be judged on their own merits. Position papers must be received before February 6, 1990. Position papers of invited attendees will distributed to the attendees before the workshop. The workshop program committee consists of: Meera Blattner (University of California, Davis, and Lawrence Livermore National Lab); William Buxton (University of Toronto); Roger Dannenberg (Carnegie-Mellon); Alistair Edwards (University of York); and Simon Holland (Aberdeen University). The invitations will be issued by the Program Committee. The invitees will be expected to actively participate as speakers, discussants, and summary authors. A workshop report will be prepared for the SIGCHI Bulletin and invitees are expected to participate in drafting the report. A $50 workship fee will be charged to each participant to help defray the costs of coffee breaks and A/V equipment. Important dates: Five copies of the position paper due by February 6, 1990. Invitations sent out by February 19, 1990. Agenda and invitee's position papers sent by February 26, 1990 Send position papers to: Meera Blattner, L-540 Lawrence Livermore National Laboratory P.O. Box 808 Livermore, CA 94551 Phone: (415) 422-3505 FAX:(415) 422-8681 Netmail: blattner@lll-crg.llnl.gov ------------------------------------------------ Call for Position Papers Workshop on Context: What Does it Mean to Application Design? This second workshop on context is scheduled as part of the CHI'90 Conference in Seattle, Washington and will be held on April 1st - 2nd, 1990. The objective of the two-day workshop is to explore the user interface issues associated with building applications which take advantage of context. This workshop continues on from the Context workshop at CHI '89. The primary goal for this workshop will be to examine the impact of maintaining context on user interface design. Context is a subjective concept. In the first workshop, context was found to be an extremely complex subject, capable of being subjected to study from many different viewpoints. Three areas were studied in detail - the dynamics of context (how it develops and changes, the role of 'conversation' between human and computer), the roles of conceptual models and information theory in understanding context, and the implementation issues (hardware and software) related to facilitating the building of common context between the computer and the user. In the second workshop, the ideas and definitions from the first workshop will be reviewed. The workshop will then focus on how the user interface of applications can be structured to develop and use context that meets user expectations. Attendance at the workshop will be by invitation. It is intended that the workshop will bring together approximately twenty people who have relevant user interface design or research experience in areas such as related to complex, multi-tasking and/or multi-windowing systems; expert systems; user's mental models; or adaptive systems. Individuals who would be able to comment on the architectural implications of the workshops' discussions will also be welcomed. Individuals wishing to participate are requested to submit four copies of a position paper presenting their views on the workshop topic. This position paper should be one to two pages long and should contain a brief summary of experience in the areas listed above and a brief description of the individual's interest in the subject of context. The workshop committee (Helen Maskery, Gord Hopkins and Tim Dudley (Cognos)) will issue invitations based on the position papers and timeliness of the paper's submission. All invitees will be expected to actively participate as speakers, discussants, and brainstormers. They will also be expected to contribute to a workshop report for the SIGCHI Bulletin. A $50 workshop fee will be collected from each participant to help defray costs associated with coffee breaks and A/V equipment. Important dates: Four copies of the position paper due by: February 6, 1990 Invitations (with agenda) sent out by: February 19, 1990 Workshop to be held: April 1-2, 1990 Send position papers to: Helen Maskery (for more information, call (613) 763 2386) BNR Ltd, 9Y12 P.O. Box 3511, Station C, Ottawa, Ontario, K1Y 4H7 CANADA maskery@bnr.ca ---------------------------------------------------- Call for Participation CHI'90 Workshop Modeling the User Interface A comparison of approaches towards the "Mental Model" Can we describe a user interface without engaging users? Can we predict how difficult an interface will be to learn, to understand, and to use? Can we prescribe how an interface should be designed? In the HCI community there exist several approaches to answer these questions by modeling different aspects of the interface and of HCI. A limited attendance, invitational workshop on Modeling the User Interface is being organized for the CHI'90 Conference in Seattle. The workshop will be arranged on April 1st and 2nd. This workshop aims at comparing, analyzing, and struc- turing some different modeling approaches, for instance CLG, GOMS, Cognitive Complexity Theory, TAG, and ETAG. The following aspects will be discussed: (1) aspect(s) of HCI covered (2) general theoretical background (3) descriptive power (4) predictive power (5) engineering applicability Results expected consist of evaluations of the different types of models for different purposes. Their merits and drawbacks will be illuminated and an overview picture drawn for when to use what model. Possible requirements for new approaches will also be identified. The results will be pub- lished in the SIGCHI bulletin. Participants in the workshop should have experience with working with modeling interfaces, either theoretically or empirically. People who have already attempted comparisons and analyses of different models are also invited to parti- cipate. Please send a description of yourself (experience, expertise and interest) and a position statement (about 2 pages) to the workshop organizers not later than February 6th, 1990 (prefarably by electronic mail). The workshop organizers (Gerrit C. van der Veer, Free University, Amsterdam, The Netherlands, and Yvonne Waern, Stockholm University, Sweden) will invite participants on the basis of the position statements (not later than Febru- ary 20, 1990). The position papers and a final agenda will be mailed to all participants, upon notification of accep- tance. A workshop fee of $50 will be collected from each participant to help defray the costs of coffee breaks and A/V equipment. Correspondence address: Gerrit C. van der Veer Free University, Dept. of Psych. De Boeleaan 1111, Prov.I, C. 102 1081 HV Amsterdam The Netherlands Email: gerrit@psy.vu.nl (uucp) Telephone: + 31-20-5484405, telefax: + 31-20-5484443 ------------------------------------- Call for Participation CHI'90 Workshop on Structure Editors A limited attendance, invitational workshop on Structure Editors is being organized for the CHI'90 Conference in Seattle, Washing- ton. The workshop will be held on April 1-2, 1990. Syntax-directed editors and graphical structure editors vary widely in the mechanisms they provide for entry, modification, and display. Some structure editors are special purpose or oriented towards novice programmers; some support multiple phases of the software lifecycle. Editor generators allow more rapid specification of application-specific or user-specific structure editors. The workshop will address the following questions: 1) What are the benefits and drawbacks of structure editors? 2) To what extent do they provide the benefits that their designers attribute to them, such as increasing the ease with which novices learn to program? 3) Can they be redesigned to remove their deficiencies, extended to provide additional capabilities, or diluted to provide partial functionality such as prettyprinting? 4) What are the future directions/uses for these tools? For instance, are they particularly well-suited for designing direct manipulation interfaces or certain types of applica- tions? Likewise, are they beneficial for certain classes of users only, and, if so, why? Participation in the workshop will be limited to twenty people. Individuals wishing to participate are requested to submit four copies of a position paper (and an additional copy by email, if possible) presenting their views on structure editors and their experience with the design or use of structure editors. The position paper should also include a brief discussion of the above questions. Papers should not exceed three pages in length. The workshop organizers (Lisa Neal, Harvard University and Gerd Szwillus, Universitaet Dortmund) will issue invitations based on the position papers. Papers will be distributed to all partici- pants, along with an agenda, upon notification of acceptance. The workshop results will be reported in SIGCHI Bulletin. A $50 workshop fee will be charged to each participant to help defray the costs associated with coffee breaks and A/V equipment. Position papers are due no later than February 6, 1990. Send position papers to: Lisa Neal Aiken Computation Laboratory Harvard University Cambridge, MA 02138 lisa@harvard.harvard.edu 617-495-8848 Invited participants will be notified by February 20, 1990. ------------------------------------------------------------ Call for Participation CHI'90 Workshop on Taking Design Seriously: Exploring Techniques Useful in HCI Design A limited attendance, invitational workshop on techniques for improving HCI Design is being organized for the CHI'90 conference in Seattle. The workshop will be held on Sunday, April 1, 1990. The primary goal of the workshop is to bring together individuals who feel that they have developed or used techniques which have proven useful in producing quality systems. The workshop is particularly aimed at discussing techniques for encouraging a user-centered focus in the design of complex software systems. Design of useful and usable computer systems is a complex activity involving many decisions on many levels, ranging from questions of what function to include in the system to details of how to present output. This is a problem solving activity (generally done by a group) requiring consideration of a number of views of the objectives. While there has been a great deal of research conducted in the field of Human-Computer Interaction, it is not clear what the impact of this work has been on supporting actual design practice. There seems to be a shared notion that 'things could be better', but little agreement on how to proceed. This workshop is intended to bring together individuals who feel that they have thoughts or experiences to share that may contribute to improvements in actual design practice. The focus of the workshop will be on the complete design process, with particular attention to the earliest stages of design. Of particular interest is how current software engineering practice may (or may not) adequately describe techniques for maintaining a sufficient user-centered focus. Contributions may focus on individual techniques or tools found useful (e.g., techniques for task and requirements analysis, tools for rapid prototyping, ways to support group problem solving in design, use of theoretical models, or techniques for user involvement in design), but should also consider the design process in-the-large (i.e., how the technique fits into the group and organizational work activity of designing a system). The workshop will include brief presentations by each of the participants, followed by a discussion within the group and an attempt to develop a report outlining a 'Program to improve HCI design practice'. Producing the outline report will serve to focus the activity of the workshop, and provide a summary to the SIGCHI community. Individuals interested in participation are requested to submit four copies of a position paper outlining their views on a technique useful in the design process. This paper should be a brief summary of no more than three pages in length. Participants will be charged a $25 workshop fee to help defray the costs associated with coffee breaks and A/V equipment. Approximately 20 participants will be selected from those submitting position papers, based on quality of papers received. Please send position papers along with a brief statement of your background to: John Karat User Interface Institute IBM Watson Research Center Box 704 Yorktown Heights, NY, 10598 (914) 789-7832, jkarat@ibm.com Papers must be received by February 6, 1990. Invited participants will be notified by February 19, 1990. Copies of accepted position papers, along with a workshop agenda will be mailed to participants by February 26, 1990. -------------------------------------------------- Call for Position Papers CHI '90 Workshop: Computer-Human Interaction in Aerospace Systems A limited attendance, invitational workshop on Computer-Human Interaction in Aerospace Systems is being organized for the CHI '90 conference in Seattle, Washington, 1-5 April, 1990. The workshop will be 1-2 April 1990. The purpose of the workshop is to explore issues facing designers and users of decision support systems in real-time, high-risk aerospace systems. The workshop will focus on the following issues: how do we model the user/operator in complex time-critical environments; how do we design the human-computer interaction (displays, controls, and aids) to ensure that the user is integrated into the decision process; what comprises an intelligent computer interface for a supervisory controller; how do we provide timely, context-sensitive information in real time without overloading or distracting the human operator; how do we design operator aids/tutors using knowledge-based technology that enhances the human-computer interaction and overall system effectiveness rather than replacing the human decision maker. Application domains include the design of human-computer interaction (e.g., displays, controls, aids, tutors) in satellite ground control systems, shuttle and space station control systems, and advanced aviation in both the cockpit and air traffic control. Workshop participation is limited to twenty people. Individuals wishing to attend the workshop may request an invitation by submitting a 3-5 page position paper discussing their experiences or research in the area of human-computer interaction in real-time complex dynamic systems (particularly aerospace applications). We encourage both human factors and systems designers, as well as other knowledgeable individuals to participate. We wish to obtain a balanced university/government/industry representation and interaction. The position papers of the invited attendees will be reproduced and distributed to the attendees prior to the workshop. Selected papers may be submitted for publication in the SIGCHI Bulletin. Also, workshop results will be reported in the Bulletin. Participants will be charged a $50 workshop fee to help defray the costs of coffee breaks and A/V equipment. Position papers are due no later than January 30, 1989. Three copies, double-spaced, should be sent to: CHI '90 Workshop on Computer-Human Interaction in Aerospace Systems Christine M. Mitchell Center for Human-Machine Systems Research School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 Phone: (404) 894-4321 E-mail: mitchell@chmsr.gatech.edu Fax: (404) 894-2301 Invited Participants will be notified by March 1, 1989, and will also be sent copies of the selected position papers together with a final agenda for the workshop. ------------------------------------------------ Call for Participation Workshop on Visual Interfaces to Geometry in conjunction with CHI'90 Seattle, Washington, April 1-2, 1990 A two-day workshop on Visual Interfaces to Geometry will be held as part of the CHI'90 Conference. The objective is to explore and integrate advanced approaches to the visual representation and interactive manipulation of geometric information. The primary goals are to (1) define desirable properties of interfaces to geometric data in different areas, (2) identify problems in achieving these properties, and (3) discuss generally applicable approaches to these problems from an interdisciplinary perspective. Geometric data are an important resource in many areas: Computer Aided Engineering (CAE), Geographic Information Systems (GIS), Computer Graphics, Graphic Arts, Electronic Publishing, etc. Data structures for their representation, and algorithms for their manipulation, have received ample attention in the domain of computational geometry, with the object being an optimal use of computer memory and processing time. This focus on internal representations and manipulations contrasts with a lack of emphasis on corresponding external concerns: how can geometric information be presented and manipulated at the user interface, optimizing "memory load" and "execution speed" (among other criteria) for users? Considerable expertise on various aspects of this question has been developed in a number of fields; however, little effort has been made to combine this expertise. The workshop will take a multidisciplinary approach to the problem of improving the visual presentation and interactive manipulation of geometric information. It should bring together approximately fifteen people from Computer Science, Engineering, Cognitive Science and Linguistics, Geography and Cartography, and other related areas. Attendance at the workshop will be by invitation. Individuals wishing to attend are requested to submit four copies of a short position paper (500 - 1500 words) to the organizers of the workshop. They should present their view of issues and approaches to be discussed as well as a brief summary of their background and interest in the subject. The position papers of the participants will be distributed with the invitation to the workshop. The participants will also receive a catalogue of related researchable questions which has been compiled at a workshop on "Languages of Spatial Relations", organized by the National Center for Geographic Information and Analysis, as well as an extended bibliography on the same topic. A $50 fee will be collected from each participant to help defray the costs of coffee breaks and audio-visual equipment. Important Dates: 4 copies of position paper due by February 1, 1990 Invitations sent out by February 19,1990 Workshop held at CHI'90 April 1-2, 1990 Organizers: Werner Kuhn and Max J. Egenhofer National Center for Geographic Information and Analysis University of Maine Orono, ME 04469 kuhn@mecan1.bitnet and max@mecan1.bitnet Phone: (207) 581 2118 or 2149 Fax: (207) 581 2206 *************************************************************** III. JOB ANNOUNCEMENTS III.1. Fr: wyle@inf.ethz.ch (Mitchell F. Wyle) Re: ETH, Zurich, Switzerland The information retrieval research group in the computer science department at the ETH is looking for teaching/research assistants to join our team. The document and information processing group is developing an experimental information server which filters information contained in wide area networks (WANs) according to user profiles. Information is delivered to subscribers by electronic mail. The project goals include developing new methods of text comparison, new performance measures, user profile specification, and new ideas in user / system E-mail dialogue. Another project is the application and integration of modern Information Retrieval methods into a unified system connected to a commercial database service (Data Star). The project goals include developing novel user interfaces, exploiting information structures, and interpreting relevance feedback and including it into the retrieval algorithms. The positions entail approximately 15 hours per week teaching and the rest is research. Facilities include Sun workstations, mac-IIs and a well-equipped computing center with, among other machines, a cray. Salary is quite competitive considering that the research you perform can be for your PhD (Masters required for a PhD). Zurich is one of the most beautiful cities in Europe, with very little crime and quite a bit of local culture. Highly qualified individuals should send applications and resumes to: Prof. H. P. Frei Institut fuer Informationssystemme ETH Zentrum 8092 Zurich, Switzerland Questions should be addressed to me electronically at wyle@inf.ethz.ch -Mitchell F. Wyle Institut fuer Informationssysteme wyle@inf.ethz.ch ETH Zentrum / 8092 Zurich, Switzerland +41 1 256 5237 *************************************************************** IV. PROJECTS IV.D.4. Fr: FAFSRV%NOBERGEN.BITNET@CUNYVM.CUNY.EDU Re: SURVEY CORPORA LANCASTER PRELIMINARY SURVEY OF MACHINE-READABLE LANGUAGE CORPORA *************************************************************** Lita Taylor & Geoffrey Leech (February 1989) INTRODUCTION This descriptive list of machine-readable text corpora is the result of a preliminary survey undertaken by Lita Taylor at the University of Lancaster in January-February 1989. We acknowledge with gratitude the help of Longman Group U.K. Limited, in providing the financial support for the survey. Because of the limitations of time and the availability of information, the list is incomplete, omitting corpora and corpus details of which we were unaware or which we were unable to track down during the requisite period. A particular limitation is that the survey is heavily biased towards modern English: details of corpora in other languages were included only where information could be readily obtained at the time. Since this survey was undertaken primarily to serve the purpose of linguistic research, it does not, on the whole, duplicate information given by the catalogue of holdings of the Oxford Text Archive, especially where corpora held by the Archive are primarily of literary or philological interest. For further details of such corpora, contact: Oxford Text Archive Oxford University Computing Service 13 Banbury Road, Oxford OX2 6NN In spite of these limitations, we hope the survey will be a useful source of information to corpus users, and that it will be possible to update the list periodically, and to increase its coverage. [We apologise for the absence of accents, umlauts, and other diacritics from the character set used in producing the list.] Geoffrey Leech. This document can be obtained from: The Norwegian Computing Centre for the Humanities P.O. Box 53, Universitet, N-5027 Bergen, Norway. If you are able either to add new information, or to update existing information, in this survey, would you kindly send details to: Survey of Machine-Readable Language Corpora, c/o G. Leech, UCREL, Bowland College, University of Lancaster, Lancaster LA1 4YT, England. email: G.N. Leech@uk.ac.lancs.cent1 Please relate your information to the set of headings we have used in specifying the characteristics of each corpus entry. List of corpora described: A Language bank of modern Swedish A corpus for dialectometry A corpus of dramatic texts in Scots A corpus of spoken Northern Ireland English American Heritage Intermediate corpus American News Stories Augustan prose sample Berkeley corpus Birmingham University corpus Bonner Zeitungskorpus Teil 1 Brown corpus CHILDES database Corpus of English-Canadian writing Corpus of Portuguese Dialogstrukturenkorpus Freiburger corpus Gothenburg corpus Guangzhou Petroleum English corpus (GPEC) Handbuchkorpora Helsinki corpus International corpus of English JDEST corpus Kolhapur corpus of Indian English LIMAS corpus Lancaster Parsed Corpus Lancaster-Leeds Treebank Lancaster-Oslo/Bergen (LOB) corpus London-Lund corpus of spoken English Longman/Lancaster English Language corpus Macquarie (University) corpus Mannheim corpora Melbourne-Surrey corpus Nijmegen corpus PoW corpus SEC corpus Survey of English Usage Susanne corpus Thomas Mann corpus Warwick corpus The descriptions are organised into two sections: English machine- readable corpora and non-English machine-readable corpora, and are ordered alphabetically within the sections. ********************************************************************* 1. ENGLISH MACHINE-READABLE CORPORA --------------------------------------------------------------------- A CORPUS FOR DIALECTOMETRY --------------------------------------------------------------------- Compiled by: John M Kirk Compiled at: The Queen's University of Belfast Date of compilation: Sampling period: Mid-late 1950's Language (variety): Scots English Spoken/written: Written Size: c. 38000 words so far. Details of material: Formal written questionnaire responses. Organisation: Organised by counties, localities, and responses. How transcribed: Nonstandard orthography, reflecting pronunciation. How analysed: - Use of corpus: For dialectometrical analysis. Development of a dialectometrical methodology. Availability: Not available. Storage details: Mainframe computer, but others possible. Other: Cocoa symbols used (for use with OCP). Based on "The Linguistic Atlas of Scotland": Scots section, Vol 2. So far the corpus comprises only nine mainland counties and Northern Ireland. --------------------------------------------------------------------- A CORPUS OF DRAMATIC TEXTS IN SCOTS --------------------------------------------------------------------- Compiled by: J. M. Kirk Compiled at: The Queen's University of Belfast Date of compilation: Sampling period: Mid-twentieth century Language (variety): English - Traditional Scots & Glasgow Scots Spoken/written: Written Size: 101,000 words Details of material: Dramatic texts Organisation: Six dramatic texts, 5 in Glasgow Scots, 1 in traditional Scots. How transcribed: How analysed: Primary and modal auxiliary verbs have been given syntactic and semantic word tags. Use of corpus: The study of the grammar of Scots using written material which was presumed to reflect a high degree of speech realism. As a source of data for new book on auxiliary verbs, and for a future grammar of Scots. Availability: Not available. Three of the Glasgow Scots texts are lodged with the Oxford Text Archive. Storage details: Every way-magnetic tape, mainframe directory, hard disk, floppy disk. Other: Contains cocoa tags for use with OCP. --------------------------------------------------------------------- A CORPUS OF SPOKEN NORTHERN IRELAND ENGLISH - under development --------------------------------------------------------------------- Compiled by: John M Kirk Compiled at: The Queen's University of Belfast Date of compilation: Language (variety): Northern Ireland English, i.e. Ulster Scots, Mid-Ulster English, and South-Ulster English. Spoken/written: Spoken Size: c. 400,000 words Details of material: Material taken from 42 grid-referenced localities in Northern Ireland, comprising three age ranges for each locality: children, middle-aged, and elderly. The style is informal conversational, esp. narrative. Organisation: Numbered by locality; within locality, by speaker. How transcribed: Orthographically. How analysed: Not analysed. Use of corpus: To make available a machine-readable corpus of spoken N.I. English, for syntactic analysis, for comparison with two similar corpora from the south of Ireland, and for use by the Ulster Folk Museum. Availability: Will be available on completion. Storage details: Audio tapes, eventually floppy disks. Other: Completion date of the corpus is May 1990. Contains cocoa references for use with OCP. --------------------------------------------------------------------- AMERICAN HERITAGE INTERMEDIATE CORPUS --------------------------------------------------------------------- Compiled by: American Heritage Dictionary Division Compiled at: Date of compilation: November 1969 Language (variety): American English Spoken/written: Written Size: Over 5 million words Details of material: Published texts most likely to be encountered by school children of grade 3-9. 500-word samples were extracted from 1,045 published texts. Organisation: Divided into categories: Reading English and Grammar Composition Literature Mathematics Social Studies Spelling Science Music Art Home Economics Shop Library fiction Library nonfiction Library reference Magazine Religion How transcribed: Ordinary written text How analysed: Word frequency lists produced. Use of corpus: As a database for the "American Heritage School Dictionary". Availability: Storage details: Requires 15 reels of tape. Other: Word frequency lists reproduced in "The American Heritage Word Frequency Book", Carroll, Davies & Richman (1971), Houghton Mifflin Company, American Heritage Publishing Co., Inc. --------------------------------------------------------------------- AMERICAN NEWS STORIES --------------------------------------------------------------------- Compiled by: Unknown - deposited at OTA by G Akers in 1979. Compiled at: Unknown Date of compilation: Unknown Language (variety): American English Spoken/written: Written Size: Approximately 250,000 words Details of material: News stories extracted from the Associated Press network during December, 1979. Organisation: Divided into two files, but not categorised. How transcribed: Ordinary written text. How analysed: Not analysed. Availability: Distributed through Oxford Text Archive. Storage details: Requires 2.5 Mb storage. --------------------------------------------------------------------- AUGUSTAN PROSE SAMPLE --------------------------------------------------------------------- Compiled by: Louis T. Milic Compiled at: Dept of English Cleveland State University Date of compilation: Language (variety): English Spoken/written: Written Size: Approximately 80,000 words. Details of material: Samples of Augustan prose. 52 selections by 51 authors, published 1675 - 1725 in England. Average length of a sample is 1522 words. Organisation: Each file consists of one text. How transcribed: Spellings have been regularized to the American standard, but an original spelling version is available. Punctuation has been slightly simplified, and the whole is in upper-case letters with dollar signs to indicate proper names. How analysed: Use of corpus: Availability: Distributed through Oxford Text Archive, and Louis T. Milic, Department of English, Cleveland State University, Cleveland, OH 44115. Storage details: Available in tape or diskette. Other: Full documentation including the entire text, selected statistics, and instructions are available. --------------------------------------------------------------------- BERKELEY CORPUS --------------------------------------------------------------------- Compiled by: Wallace Chafe, Gunnel Tottie Compiled at: University of California, Uppsala University Date of compilation: - Language (variety): American English Spoken/written: Spoken and written Size: - Details of material: Mainly spoken and written American English. Recordings made by John Gumperz and Susan Ervin-Tripp, and fifty hours of dinner-table conversation collected by Wallace Chafe. Organisation: How transcribed: How analysed: Use of corpus: Availability: Storage details: Other: --------------------------------------------------------------------- BIRMINGHAM UNIVERSITY CORPUS --------------------------------------------------------------------- Compiled by: J. Sinclair and others Compiled at: University of Birmingham, England Date of compilation: Language (variety): Predominantly British English, but including American and other varieties. Spoken/written: Spoken and written Size: c. 20 million words (plus an additional c. 20 million words of more specialised material). Details of material: See A.J. Renouf, 'Corpus Development at Birmingham University', in J. Aarts and W. Meijs, (eds.), Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research, Amsterdam: Rodopi, 1984. Also: A.J. Renouf, 'Corpus Development', Chapter 1 of J.M. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing. London and Glasgow: Collins ELT, 1987. Organisation: See references above. How transcribed: Orthographically. How analysed: Primarily KWIC concordances, lexical database. Use of corpus: Primarily lexicography. Used for producing the Collins COBUILD English Language Dictionary. Availability: Storage details: --------------------------------------------------------------------- BROWN CORPUS --------------------------------------------------------------------- Compiled by: W. Nelson Francis & Henry Kucera Compiled at: Brown University, Providence, Rhode Island Date of compilation: Language (variety): American English Spoken/written: Written Size: 1,014,294 words Sampling period: 1961 Details of material: Organisation: Divided into categories: A Press: reportage B Press: editorial C Press: reviews D Religion E Skills & hobbies F Popular Lore G Belles lettres, biography, essays H Miscellaneous J Learned K General fiction L Mystery & detective fiction M Science fiction N Adventure & western fiction P Romance & love story R Humor 500 samples of c. 2000 words each. How transcribed: Orthographically How analysed: Word-tagged using the "TAGGIT" program, but this version unavailable. Availability: Distributed through ICAME and Oxford Text Archives - untagged version. Storage details: See ICAME distribution entry Other: Part of the Brown corpus has been syntactically analysed - see the entry for the "Gothenburg Corpus". --------------------------------------------------------------------- CHILDES DATABASE --------------------------------------------------------------------- Compiled by: Brian MacWhinney Compiled at: Carnegie-Mellon University, Pennsylvania. Date of compilation: Language (variety): English Spoken/written: Size: Details of material: 21 corpora of parent-child interactions from English-speaking children. There are also corpora from several other languages. Organisation: How transcribed: How analysed: Use of corpus: Availability: There are plans to circulate data (120 Mb) on CD-ROM. Storage details: Other: --------------------------------------------------------------------- CORPUS OF ENGLISH-CANADIAN WRITING --------------------------------------------------------------------- Compiled by: W. C. Lougheed Compiled at: Strathy Language Unit, Queen's University, Kingston, Ontario. Date of compilation: Language (variety): English Canadian Spoken/written: Written Size: 2.5 million words Details of material: Material taken from magazines, books, and newspapers. Organisation: How transcribed: How analysed: Use of corpus: Availability: For research purposes, contact Margery Fee, Director, Strathy Language Unit, 207 Stuart Street, Room 316, Rideau Building, Queen's University, Kingston, Ontario K7L 3N6. Storage details: Other: --------------------------------------------------------------------- GOTHENBURG CORPUS --------------------------------------------------------------------- Compiled by: Alva Ellegard Compiled at: University of Gothenburg Date of compilation: Language (variety): American English Spoken/written: Written Size: 128,000 words Details of material: A subset of the BROWN corpus - comprises 64 of the 500 text extracts in the Brown corpus, including 16 each from the categories A - press reportage; G - belles lettres, biography; J - learned and scientific; N - adventure and western fiction. Organisation: How transcribed: How analysed: A form of dependency-tree analysis. Codes functional as well as formal properties, and includes some limited indications of logical or "underlying" structure where this differs from surface grammatical structure. Use of corpus: Availability: Contact: Gudrun Magnusdottir Sprakdata Goteborgs Universitet S-412 98 Goteborg Sweden Storage details: Other: --------------------------------------------------------------------- GUANGZHOU PETROLEUM ENGLISH CORPUS (GPEC) --------------------------------------------------------------------- Compiled by: Zhu Qi-bo Compiled at: Guangzhou Training College of the Chinese Petroleum University Date of compilation: 1986 - 1987 Sampling period: Mostly 1975 - 1986 Language (variety): British and American English Spoken/written: Mainly written Size: 411,612 words Details of material: The sampled materials represent exclusively petroleum English (PE) texts. The corpus consists of 700 texts of about 500-600 words each. Organisation: Divided into categories: A 1 Petroleum geology and prospecting B 2 Petroleum refinery and petrochemistry C 3 Drilling D 4 Offshore exploration E 5 Petroleum pipeline How transcribed: Ordinary written text How analysed: Concordances and frequency lists produced. A pack of concordance programs for the corpus has also been worked out. Use of corpus: To study the lexicon of Petroleum English. To provide Petroleum English teachers and text-compilers some first-hand information such as word lists, grammatical structures, etc. To provide information for comparative language analysis. Availability: Restricted by the compiler's grant, certain committments, and some other specifications. Distributed by the compiler. Storage details: On floppy disks. *************************************************************** Continued in Volume VII Number 4, Issue 9 *************************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@postgres.berkeley.edu calur@uccmvsa.bitnet Mary Engle engle@cmsa.berkeley.edu meeur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. These files are not to be sold or used for commercial purposes. Contact Mary Engle or Nancy Gusack for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.