Information Retrieval List Digest 145 (January 5, 1992)
URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-145

IRLIST Digest                               ISSN 1064-6965
January 5, 1992
Volume X, Number 1
Issue 145
**********************************************************
  I. NOTICES
     A. Meeting Announcements/Calls for Papers
        1. 5th Message Understanding System Evaluation/
           Message Understanding Conference
 II. QUERIES
     B. Requests for Information
        1. Encryption/Decryption on Top of Fulltext Retrieval Software
        2. Request for Informatin in Archives
III. JOB ANNOUNCEMENTS
        1. Researcher, Siemans Corporate Research, Princeton, New Jersey
**********************************************************
I. NOTICES
I.A.1.
Fr: Beth M. Sundheim <sundheim@cod.nosc.mil>
Re: 5th Message Understanding Conference--Call for Participation

                      * * * CALL FOR PARTICIPATION * * *
                 FIFTH MESSAGE UNDERSTANDING SYSTEM EVALUATION
                  AND MESSAGE UNDERSTANDING CONFERENCE (MUC-5)
                         1 MARCH - 27 AUGUST, 1993
                    Preparation: 1 March - 23 May
                                 29 May - 25 July
                    Evaluations: 24-28 May (dry run)
                                 26-30 July (formal run)
                     Conference:  25-27 August
                             Sponsored by:
                Defense Advanced Research Projects Agency
           Software and Intelligent Systems Technology Office
                             (DARPA/SISTO)

The Message Understanding Conferences have provided on ongoing
forum for assessing the state of the art and practice in text
analysis technology and for exchanging information on innovative
computational techniques. They have also encouraged
experimentation in the context of fully implemented systems that
perform the realistic task of extracting factual information from
free text. The first two conferences focused on short naval
messages; the two most recent conferences challenged the systems
with longer and stylistically varied terrorism news stories. The
four conferences have seen the application of a wide variety of
approaches to the information extraction task.

ATTENDANCE AT THE CONFERENCE IS LIMITED TO EVALUATION
PARTICIPANTS AND TO GUESTS INVITED BY DARPA. A conference
proceedings, including all test results, will be published.

Modest amounts of financial support will be made available to
selected participants in an effort to maximize the number of
participants and to attract the widest possible variety of
technical approaches and system architectures. This funding is
intended only as a supplement to other support. Both U.S. and
non-U.S. participants are eligible for this funding.

SCHEDULE:
     3 January 1993    Deadline for applications that include funding
                       requests (PAST)
     15 January 1993   Final application deadline (no funding requests)
     1 February 1993   Notification of acceptance and funding
     1 March 1993      Release of system development corpus and
                       evaluation software
     24-28 May 1993    Performance evaluation (dry run) on test corpus
     26-30 July 1993   Performance evaluation (formal run) on new test
                       corpus
     25-27 August 1993 Fifth Message Understanding Conference

DATA AND TASK DESCRIPTION: Subject to successful completion of
negotiations to obtain proper permissions concerning the data,
the data and task to be used for MUC-5 will be the same as those
already in use for the data extraction portion of the DARPA/SISTO
TIPSTER Text program. There are two languages, English and
Japanese, and two domains, joint ventures and microelectronic
chip fabrication. These form four separate corpora. The texts are
newswire articles selected to produce the desired mix of relevant
and nonrelevant texts, and they were blindly divided into pools
of development (training) and test data.

The task is to extract information about the nature and status of
activities in the domain, the entities involved, etc. Analysts
have been doing software-assisted manual generation of the "key"
templates against which the system-generated templates will be
evaluated. The template design is object oriented, and each slot
in the template has its own fill specifications for data type,
valency, etc. The fill specifications in each domain vary
slightly between English and Japanese, reflecting differences in
language usage; however, the general design of the template is
the same for both languages.

An English and a Japanese sample text and corresponding template
in the joint ventures domain are available from the program chair
(address at end of this announcement). Please specify which
language(s) you are interested in. A microelectronics example may
be available shortly. The total amount of data that will be
available in March to support system development is expected to
be between 200 and 1,000 templates and corresponding texts. This
number will vary according to the corpus and the data rights that
are obtained. To receive the data, participants will be required
to acknowledge its copyright status by signing agreements to
safeguard the data and to use it for research purposes only.

TEST PROTOCOL AND EVALUATION CRITERIA: MUC-5 participants may
elect to do either language or both languages; they are limited
to selecting just one domain. Participants will have access to
TIPSTER Government-Furnished Information and shared resources
such as the training texts and templates, task documentation,
gazetteers, and evaluation software. TIPSTER data extraction
contractors will be participating in MUC-5, for which previously
unseen test data will be used.

Each test set will consist of 100-300 texts, depending on
language and domain. A dry-run test will be conducted about three
months after the release of the training data; the formal test
will be conducted about two and one-half months after the dry
run. Each test will be carried out by the participants at their
own sites in accordance with a prepared test procedure and the
results submitted to NRaD for official scoring by domain
analysts.

Systems will be evaluated using the criteria applied to the
TIPSTER Text data extraction systems. These criteria, which are
still under development, are likely to use the scoring categories
(correct, partially correct, incorrect, spurious, missing, and
noncommittal) to support not only the measures used for MUC-4
(recall, precision, overgeneration, fallout, and F-measure) but
also new measures (probability of detection, probability of false
alarm, and a measure that combines them). MUC-5 participants will
be able to familiarize themselves with the evaluation criteria
through usage of the evaluation software, which will be released
along with the training data.

INSTRUCTIONS FOR RESPONDING TO THE CALL FOR PARTICIPATION:
Organizations within and outside the U.S. are invited to respond
to this call for participation. Minimal requirements include
development before the dry-run test of a system that can accept
texts without manual preprocessing, process them without human
intervention, and output templates in the expected format.
Organizations should plan on allocating at least three
person-months of effort for participation in the evaluation and
conference; a substantially greater level of effort is likely to
be needed in order to achieve relatively high performance. It is
understood that organizations will vary with respect to
experience with information extraction, domain
expertise/engineering, resources, contractual
demands/expectations, etc. Recognition of such factors will be
made in any analyses of the results.

Organizations wishing to participate in the evaluation and
conference must respond by submitting a summary of their text
analysis approach and a system architecture description, not to
exceed five pages in total. The summary should include the
strengths of the approach and highlight its innovative aspects.
Acceptance or rejection of each application will be determined on
the basis of a technical assessment by the program committee. The
body of the application will serve as the basis for an article in
the conference proceedings. Participants will have the
opportunity to make revisions prior to publication.

The application must also include the following information: 1.
Domain (choose only one) a. Joint ventures b. Microelectronics 2.
Language (choose one or two) a. English b. Japanese 3. An
estimate of the degree of coverage and/or length of time under
development of existing software to be applied to the MUC-5 task
in the selected language(s) and domain. 4. Primary point of
contact for notification of acceptance/rejection of application.
Please include name, surface and email addresses, and phone and
fax numbers.

Those organizations wishing to request funding to supplement
their own resources must provide a second statement, not to
exceed two pages. This statement should include an estimate of
the amount of funding available from other sources to support
participation in this work and a specification of the amount of
funding desired and the minimal acceptable amount. In addition,
it should describe any software to be used for MUC-5 that the
organization is willing to deliver to NRaD and MUC participants
for possible redistribution. Please indicate clearly whether the
organization is interested in participating in MUC-5 even if no
funding is available. Evaluators of funding requests will not
include any MUC system developers.

THE DEADLINE FOR OTHER RESPONSES IS JANUARY 15, 1993. All
participants are expected to have Internet access and to be able
to do electronic file transfer via anonymous FTP. All responses
should be submitted to the program chair via email to
sundheim@nosc.mil. If Internet access is currently unavailable,
responses may be sent via surface mail to Beth Sundheim,
NCCOSC/NRaD, Code 444, San Diego, CA 92152-5000, and if a quick
reply to questions is needed, the program chair may be reached by
phone at 619/553-4145.

REFERENCE:
_Proceedings_of_the_Fourth_Message_Understanding_Conference_
(MUC-4)_, Morgan Kaufmann, June, 1992. To order, call
(800)745-7323 (toll free in North America) or (415)578-9928
(direct), send fax to (415)578-0672 or email to
morgan@unix.sri.com. Please refer to ISBN 1-55860-273-9.
**********************************************************
II. QUERIES
II.B.1.
Fr: Chaim Manaster <manaster@yu1.yu.edu>
Re: Encryption/Decryption on Top of Fulltext Retrieval Software

I need to locate people who have or are working on encryption
software that will work on top of text retrieval software of the
type usually found on CDROM databases. While I am not a technical
sophisticate let me attempt to elaborate to clarify my needs.

As I understand things, typically the retrieval software will
first create in inverted index file of the textual database, and
then when the user inputs his search terms the retrieval engine
will quickly search the index, obtain the associated pointer to
the original text in the database and retrieve the database.

When encryption is introduced into the picture untop of the above
retrieval software (without modifying the retrieval software) It
would seem that in order that the user enter his query terms in
an unencoded fashion the encryption software first nust encrypt
his search terms then enter the encrypted** inverted index
located the correct entry, decrypt it, locate the associated
encrypted text, decrypt it and present that to the end user, ALL
DONE ON THE FLY.

** The inserted index must be encrypted as well since its is
fairly easy to rebuild the original text solely from the
unencrypted inverted index files.

Thus it seems to me that any encryption method must be able to
use a decryption method that can start to decrypt at any random
point in the encrypted file (or a large number of points as an
approximation) to pick out some small portion of either the
encrypted index or text files without the need do decrypt the
entire file, which is usually huge, just to get at a search term
in the index or the retrieved textual paragraph from the large
database.

Would such an encryption scheme of necessity, merely be some form
of substitution cypher and therefore not worth writing (too easy
to break)? What kind of encryption shemes would be worth
considering and is there any software out there (shareware,
public domain or commercial) that will do the trick, or is anyone
working on a similar project at the moment?

One of my main concerns is that the encryption be transparent to
the retrieval engine (or at the very least require minor
modification if immpossible otherwise).

Please Email responses, I am not sure if I can access relevant
newsgroups with my sites newsfeed. If I posted this to groups of
marginal relevance forgive me but please at the very least
suggest the appropriate newsgroups I should post to that don't
require special access.

P.S. I cross-posted this to sci.crypt comp.compression
comp.compression.research

Thank you in advance for the help.

Henry Manaster

        Henry Manaster          *     EMail: manaster@yu1.yu.edu
        Brooklyn, NY            *
                                *
        Disclaimer: The above is not necessarily MY opinion nor that
                                of anyone else :-)  ????!
**********
II.B.2.
Fr: Ed Haupt <haupt@pilot.njin.net>
Re: Request for Information in Archives

For archives, I am seeking addresses, preferably e-mail, for this
archive.

Records for G.E. Mueller, who was Ordinarius for Philosophie for
1 year (1880-1881) at Tchernowsky (Cernauti, Chernovtsy,
Austro-Hungarian name, Czernowitz), now part of the Ukraine, but
it was Austria-Hungary at that time, in particular the
independent area of Bukovina.  Where are the archives? Are they
in Wien (because the central government took them back in 1918?),
or Budapest (because Bukovina was then considered part of
Hungary?), or Bucharest (because Cernauti was part of Romania
from 1918 to 1944?), or Kiev (because after 1944, it was part of
the Ukraine?), or Moscow (because everything was moved to
Moscow), or just simply in Chernovtsy?.

Please reply to
haupt@pilot.njin.net
since I am not a member of this group.

Thanks in advance...  Edward J. Haupt snail: voice: 1(201)
893-4327 Department of Psychology internet: haupt@pilot.njin.net
Montclair State bitnet: haupt@njin 1 Normal Ave. fax: 1(201)
893-5455 Upper Montclair, NJ 07043-1624 USA d 1/16
**********************************************************
III. JOB ANNOUNCEMENTS
III.1.
Fr: Ellen Voorhees <ellen@sol.siemens.com>
Re: Job announcement

Siemens Corporate Research in Princeton, New Jersey is looking to
hire an additional researcher for its information retrieval
project in the Learning Systems Department. The position requires
either a PhD in computer science (information retrieval,
knowledge representation, etc.), computational linguistics, or a
similar field (preferred) or a masters degree with some
experience in a related field. The main responsibility of the
successful candidate will be to conduct research in automatic
information retrieval and (statistical) natural language
processing. Tasks include setting up and running experiments,
programming, etc.

People interested in the position should send a PLAIN ASCII
resume to ellen@learning.siemens.com or a hardcopy of the resume
to:
        Human Services
        Department EV
        Siemens Corporate Research, Inc.
        755 College Road East
        Princeton, NJ 08540
Siemens is an equal opportunity employer.
**********************************************************
IRLIST Digest is distributed from the University of California,
Division of Library Automation, 300 Lakeside Drive, Oakland, CA.
94612-3550.

Send subscription requests to: LISTSERV@UCCVMA.BITNET
Send submissions to IRLIST to: IR-L@UCCVMA.BITNET

Editorial Staff:
 Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet
 Nancy Gusack ncgur@uccmvsa.bitnet
 Mary Engle meeur@uccmvsa.bitnet

The IRLIST Archives will be set up for anonymous FTP, and the
address will be announced in future issues. To access back issues
presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET.
To get a specific issue listed in the Index, send the message GET
IR-L LOGYYMM, where YY is the year and MM is the numeric month in
which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or
LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the
entire month you have requested.

These files are not to be sold or used for commercial purposes.
Contact Nancy Gusack or Mary Engle for more information on
IRLIST.  THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE
OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME
FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO
IRLIST.