Information Retrieval List Digest 004 (December 1989)
URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-004

IRLIST Digest
December 1989
Volume VI Number 4
Issue 4

***************
This is the last Issue to be distributed in 1989.  After
Christmas holidays are over, IRList Digest will resume
distribution.  At that time five more issues will complete the
backlog of information received during the Digest's hiatus.
Information received since our monitorship of the Digest will
either be incorporated into those remaining issues (if timeliness
is crucial) or will be distributed in issues following the
backlog.  We hope to be completely up to date by February.
Thanks for your patience.
***************

***************************************************************
              Continued from Volume VI Number 3, Issue 3
***************************************************************

 IV.  PROJECTS:  Initiatives and proposals / Bibliographies
                 Abstracts / Miscellaneous

      A.1.  EXPRES note (more)

      A.6.  CLARIT Project

      B.1.  References on Matching of KR Structures


***************************************************************
IV. PROJECTS

IV.A.1.
Fr: Edward A. Fox <fox@fox.cs.vt.edu>
Re: EXPRES project (see VI:3, 3)

Hi!  The note in #3 about EXPRES should be
updated. I have heard that there has been a
change in policy about EXPRES and suggest you
quickly update the announcement with a
follow-on based in current official info
from NSF. Thanks, Ed

**********

IV.A.6.
Fr: Thomas M. Kuhn <kuhn@horatio.LCL.CMU.EDU>
Re: Description of the CLARIT project

The following is a description, based on our original proposal to DEC,
of the CLARIT project's philosophy and major goals. Please let me know
if you have any questions, corrections, etc. Thanks

Thomas Kuhn
Administrative Assistant
CLARIT project

kuhn@horatio.lcl.cmu.edu
(inet address 128.2.229.27)

Computational-Linguistic Approaches to
Retrieval and Indexing of Text:
The CLARIT Project

A Proposal to the Digital Equipment Corporation
>From the Laboratory for Computational Linguistics,
Carnegie Mellon University

Principal Investigators:
========================

David A. Evans, Associate Professor Linguistics and Computer Science
Dana S. Scott, University Professor Computer Science, Mathematical
Logic, and Philosophy

August 1988

EXECUTIVE SUMMARY

The management of information in computer-held textual databases -- in
science and industry and in education and business -- continues to be
a fast-growing problem. New, faster machines and new means of delivery
of information have only exacerbated the situation. The difficulties
of searching and organizing information can no longer be solved using
traditional (e.g., keyword-based) indexing and retrieval technologies.
Effective techniques, we assert, will increasingly depend on the
ability to identify concepts as they occur in natural-language
expressions, and future technologies will be based on
computational-linguistic approaches to the analysis of free text.

While very large-scale, full-text natural-language processing remains
an elusive and remote possibility, we are convinced that a variety of
existing computational-linguistic techniques can be applied
effectively at this time to information management tasks to improve
present technology both powerfully and simply. We therefore propose to
undertake a one-year pilot project to explore along these lines the
augmenting of current information-management techniques to enhance
both information retrieval and indexing. In particular, we propose to
develop lexical resources (i.e., computational lexicons and thesauri)
and natural-language processing software to control for morphological
variation in the forms of expressions; to map target `words' to
canonical and related concepts to increase semantic coverage; and to
consider restricted natural-language permutations in the syntax of
expressions. This proposal continues past work on several projects
with the aims both of improving and consolidating resources and
software and of developing new applications in conjunction with other
research groups at DEC and OCLC under the general guidance of the
MERCURY Project.

The specific twelve-month objectives of the CLARIT Project include:

1) A review of current indexing and retrieval technologies, to
identify candidate systems susceptible to enhancement with
computational-linguistic techniques (15% of our total effort).

2) The development of a prototype information workbench to serve as an
interface to textual databases, in which the facilities associated
with control of morphological, semantic, and syntactic variation could
be tested (70% of our total effort).

and 3) The design of a longer-term project aimed at the comprehensive
treatment of one or more specific subject domains, using
computational-linguistic information processing (15% of our total
effort).


MANAGING TEXTUAL INFORMATION

The ability to manage problems and to develop innovative solutions is
principally a function of the ability to manage information, which, in
turn, is limited by the ability to organize and access all data
relevant to a (typically) precise topic. This information is often in
the form of written records, reports, and research literature -- and
we assume in this proposal that such information is already in
computer-readable form. Increasingly in practice, this management task
translates into the problem of indexing and retrieving selective
items; however, while our ability to store documents in electronic
form has expanded tremendously with the advent of media such as CD-ROM
and other optical-storage devices -- and while our ability to access
remote databases has become commonplace -- our ability to identify and
retrieve the information we require has not improved commensurably.

One reason automation of information management has been frustrated by
the limitations of traditional indexing and retrieval technology is
that -- in their most primitive realization -- these technologies are
based on `keyword' searches over the overt character strings that
comprise the bulk of textual documents. More sophisticated approaches
include the ability to search on Boolean combinations and regular
expression sets of keywords. (Cf. Salton, 1983, for a discussion of
such approaches.) Speed is often increased by utilizing inverted files
as indexes of occurrences of words and terms. The theory underlying
such technology is that the information content of documents is
contained in the `words' of the document; and that for information
indexing and retrieval, it is sufficient to identify a subset of words
that both typically occur in documents associated with specific
domains of infomation and also capture the sense of users' questions.

The best current systems go so far as to establish statistical
measures on the appropriateness of keywords in specific domains and to
augment actual searches with keyword variants (e.g., `synonyms') that
have a high expected co-occurrence or co-variation with the keywords
that a user offers in the statement of a particular topic. However,
such approaches are limited in principle because they are wedded to
the use of `words' instead of `concepts' and because they are unable
to control for the linguistic variations -- in morphology, in
semantics, and in syntax -- that complicate the surface forms found in
natural-language texts.

In short, our claim is that natural-language texts are not just
collections of character strings with probabilistic associations to
topics; rather, they are the precisely coded realizations of concepts;
and information processing that ignores the properties of
natural-language code and the structure of concepts in a domain will
have only haphazard success -- inversely proportional to the size of
databases.

In our past research on the development of tools and techniques for
the improvement of natural-language information management in the
biomedical domain in the MEDSORT-I, MEDSORT-II, and UMLS Projects,
(For reports on these projects, see Carbonell, Evans, Scott &
Thomason, 1985; Evans, 1987; Evans & Miller, 1987) we have argued that
the only sound method for identifying the information content of
textual records depends on using natural language as an index to
concepts (Cf. Evans & Scott, 1987, for discussion of `concepts' in
this context) supported by standardized representations of concepts in
a domain. (Cf. Carbonell, Evans, Scott & Thomason, 1986; Evans, 1988,
for concrete proposals on the structuring of domain-specific concepts
for purposes of natural-language processing.) And we have adumbrated
frequently-occurring features of natural-language expressions that
play havoc with string-based, `keyword' approaches to text processing,
including the following:

1) Different Words -- Same Meaning: `Word'-based processing fails to
capture the variation in expression associated with synonymy,
general/technical usage distinctions, and domain-specific idiom.
Example: "stomach pain after eating" versus "postprandial abdominal
discomfort".

2) Same Words -- Different Meaning: `Word'-based processing cannot
readily accommodate the differences signaled by natural-language
syntax. Example: "Venetian blind" versus "blind Venetian" (We thank
Michael Lesk for this fine example.); "juvenile victims of crime"
versus "victims of juvenile crime".

3) Pragmatic Perspective -- Circumlocution: `Word'-based processing
does not control for differences in perspective that different users
may bring to an information searching task.  Example: lawyers on
opposite sides of a damage case may regard an event -- and refer to it
consistently -- from orthogonal points of view: "the accident" versus
"the unfortunate incident". (Cf. Blair & Maron, 1985, for a discussion
of the difficulties created by just such effects.)

4) Domain Specificity: `Word'-based processing cannot handle the sense
restricitons typical of language in domain-specific usage. Example:
"floating" has different senses in banking and in swimming; "sharp" is
principally a `pain-sensation' modifier in clinical medicine,
secondarily, a measure of mental-acuity, and unrelated to cutting-tool
quality.

The ultimate solution to the problem of indexing and retrieving
information from textual databases is to process natural language for
the concepts that `lie behind' the surface expressions we encounter in
free text. Such rendering of expressions into representations of
concepts is the common goal of virtually all approaches to
natural-language processing (NLP). But full-scale NLP depends on
resources (such as extensive knowledge bases) and algorithms (such as
methods for resolving reference across clauses) that are not available
in current technology. In part because of the general recalcitrance of
free-text NLP, approaches to traditional text processing have eschewed
computational-linguistic techniques in developing indexing and search
strategies. We believe, however, that much of what is difficult in
free-text NLP is also irrelevant to the identification of concepts in
natural-language expressions; and that we can overcome the apparent
NLP impasse with a selective application of computational-linguistic
intelligence -- or more precisely, a dose of good
computational-linguistic common sense. (A primitive illustration of
the efficacy of even minor linguistic enhancement to conventional
indexing strategies is offered in Vries, Shoval, Evans, et al., 1986.)

The middle ground of information is semantic structure -- `words' are
imprecise; intentions and intelligent inference, too difficult to
capture. Semantic structure can be identified through a combination of
relatively simple processes involving (1) control of lexical form
(morphology), (2) screening for relevant modification relations
(syntax), and (3) reference to standardized representations (knowledge
representation). It is essentially such semantic structure that users
attempt to capture when coining `keyword' phrases; and it is
essentially such structure -- in the conceptual clustering found in
domain-specific phrases and in the `topic' clauses -- that carries the
information content in texts.

In sum, we believe it is possible to base information-management
technology (e.g., indexing and retrieval) on natural-language semantic
structure; and that the required techniques will involve combinations
of computational-linguistic processing and more traditional database
management.

REFERENCES

Blair, D.C. & Maron, M.E. (1985).  An evaluation of retrieval
effectiveness for a full-text document-retrieval system.
Communications of the ACM, 28, 289-299.

Carbonell, J.G., Evans, D.A., Scott, D.S. & Thomason, R.H. (1985).
Final Report on the Automated Classification and Retrieval Project
(MedSORT-I). Technical Report No. CMU-LCL-85-1, Laboratory for
Computational Linguistics, Carnegie Mellon University.

Carbonell, J.G., Evans, D.A., Scott, D.S. & Thomason, R.H. (1986).  On
the design of biomedical knowledge bases. In R. Salamon, B. Blum &
M.Jorgensen (eds.), Medinfo 86. Amsterdam, The Netherlands: Elsevier
Science Publishers B.V. (North Holland), 37-41.

Evans, D.A. (1987).  Final Report on the MedSORT-II Project:
Developing and Managing Medical Thesauri. Technical Report No.
CMU-LCL-87-3, Laboratory for Computational Linguistics, Carnegie
Mellon University.

Evans, D.A. (1988).  Pragmatically-structured, lexical-semantic
knowledge bases for unified medical language systems. Proceeding of
the Twelfth Annual Symposium on Computer Applications in Medical Care
(SCAMC), Washington, DC: IEEE Computer Society, 1988.

Evans, D.A. & Miller, R.A. (1987).  Final Task Report (Task 2) --
Unified Medical Language System (UMLS) Project: Initial Phase in
Developing Representations for Mapping Medical Knowledge:
INTERNIST-I/QMR, HELP, and MeSH. Technical Report No.CMU-LCL-87-1,
Laboratory for Computational Linguistics, Carnegie Mellon University.

Evans, D.A. & Scott, D.S. (1987).  Concepts as procedures. F.
Marshall, A. Miller & Z. Zhang (eds.), ESCOL '86, Proceedings of the
Third Eastern States Conference on Linguistics (Pittsburgh,
Pennsylvania, October 10-12, 1986), Department of Linguistics, The
Ohio State University, Columbus, OH, 533-543.

Salton, M. & McGill, M.J. (1983).  Introduction to Modern Information
Retrieval.  McGraw-Hill Book Company.

Vries, J.K., Shoval, P., Evans, D.A., Moossy, J., Banks, G. & Latchaw,
R. (1986).  An expert system for indexing and retrieving medical
information.  Working Paper, Decision Systems Laboratory, University
of Pittsburgh, Pittsburgh, PA.


COMPUTATIONAL-LINGUISTIC ENHANCEMENTS TO INDEXING AND RETRIEVAL OF
TEXT:  A PILOT PROJECT

Project Timing: To begin Sept. 1, 1988; end, August 31, 1988.

Goal: Research designed to achieve improvements in the quality of
information indexing and retrieval.

Method: Computational-linguistic (CL) techniques to facilitate the
identification of semantic structures under morphological and
syntactic variation; exploitation of domain-specific semantic
networks.

Philosophy: Emphasis on `simple' solutions; straightforward
enhancements of current approaches.

Specific Phase-1 Objectives (coordinated with DEC):

1) Survey existing technologies in information management -- to
identify candidate systems that would benefit from CL enhancements.

2) Develop a prototype `information workbench' that would serve as a
(partial) natural-language interface to textual databases. This
activity would include:

+ Experimentation with a set of `simple' but sophisticated CL
technologies to improve indexing and retrieval incrementally, in a
designated domain, using a moderately large corpus; possibly combining
CL and more traditional approaches to indexing/retrieval.

+ Exploring the possibility of providing CL resources for a
`multiply-expert' workstation for information processing.

+ Identification in a large general corpus (to be chosen in
consultation with DEC) of the ``standard and stable structures'' of
language, to be used in creating a basic CL knowledge base
(lexicon/thesaurus).

+ Development of specific, general language computational-linguistic
resources (e.g., lexicons, thesauri, parsers) to supplement existing
resources in the LCL (e.g., MORPH, the MEDSORT-II/UMLS
Lexicon/Thesaurus).

3) Define a longer-term project and application, possibly in
collaboration with groups other than DEC and CMU.


RESEARCH ENVIRONMENT AND PROJECT PERSONNEL

Carnegie Mellon University

Carnegie Mellon University (CMU) is located in Pittsburgh,
Pennsylvania. It currently has 4,320 undergraduate and 2,218 graduate
students; and 503 full-time and 40 adjunct faculty, in seven
college-level units, including the College of Humanities and Social
Sciences and the Department of Computer Science, both represented in
this proposal. The university is internationally recognized as a
leader in Computer Science and applied computational technologies. It
is strongly committed to the development of resources for
undergraduate and graduate education, faculty research, and general
academic computing. Among CMU's initiatives in this area are a
campus-wide communications network, the Andrew system, and the initial
steps of the MERCURY Project for the creation of an electronic library
system.

The Laboratory for Computational Linguistics

The CMU Program in Computational Linguistics and Laboratory for
Computational Linguistics (LCL) are unique nationally; and the faculty
associated with the project are experts in Cognitive Science, Computer
Science, and Linguistics. The Laboratory for Computational Linguistics
(LCL) is directed by David A. Evans and is based in the Philosophy
Department at CMU, which administers the Joint Graduate Program in
Computational Linguistics. (The Joint Program is collaborative
academic program between CMU and the University of Pittsburgh.) The
LCL is the best equipped academic research facility for Computational
Linguistics in the world, having both a high-capacity super-mini
central computer and twenty high-powered artificial-intelligence
workstations, along with many other kinds of support equipment. The
LCL houses several large research projects and is committed to the
development of applications for CL techniques; and it will make its
resources freely available to the CLARIT Project. In particular, the
CLARIT Project will utilize, expand, and refine the following
resources:

1) MORPH -- a natural-language morphology program which, in
conjunction with a lexicon, can identify and analyze word-form
variants (e.g., as resulting from inflectional morphology), derived
forms (e.g., as resulting from compounding, prefixing, suffixing, and
other combinations of morphemes), phrasal lexical expressions (e.g.,
idioms, domain-specific phrases, etc.), and selected neologisms.

2) The COMP-LEX Lexicon -- a computational lexicon currently
consisting of a basic-English core vocabulary (approximately 8,000
terms), soon to be augumented with the domain-specific lexicons
developed under the MEDSORT-II and UMLS projects (representing an
additional, approximately 25,000 terms in clinical and general
medicine).

3) NEWCAT -- a natural-language parser (developed by Roland Hausser)
with special facilities for processing free-text.

4) The MEDSORT-II/UMLS Knowledge Base -- an extensive knowledge base,
especially for `findings' in clinical medicine, but also offering
thesaurus-like classification of basic concepts and providing a design
for selective natural-language processing of general language texts.

and 5) On-Line Dictionaries -- Webster's Seventh (and other
dictionaries).

**********

IV.B.1.
Fr: LEWIS@cs.umass.EDU
Re: For IRLIST: References on matching of KR structures

This message contains a list of references I received in response to
my request to AILIST and NL-KR in November 1988 for information on
finding matches between knowledge representation structures. It
includes all references received by Roland Zito-Wolf, and previously
posted, in response to a similar query in 1987. Roland was
particularly interested in approximate matching algorithms, while I
was particularly interested in matching when some transformation of
knowledge structures had to be done, so you know our biases. The
papers are broken down into broad classes--each paper is present in
only one class, though many are appropriate to several of the classes.
Also, I have not actually seen a number of the papers below, so some
may have been placed in inappropriate classes. Finally, absolutely no
claims of thoroughness are made for the following listing. Some of the
listed areas have huge literatures and are represented here by only
one or two references.

The following people submitted references to Roland or I, or otherwise
aided in the creation of this list:

Len Friedman, Philippe Dugerdil, Bob MacGregor, Peter Szolovits, Len
Moskowitz, Terry Winograd, Austin Tate, Roy Rada, Mike Shafto, William
J.  Rapaport, Mike Tanner, Lisa Rau, Robert Levinson,
akbari@CS.COLUMBIA.EDU, INDUGERD%CNEDCU51.BITNET@wiscvm.wisc.edu,
greene@m.cs.uiuc.edu, walt%cs.hw.ac.uk@cs.ucl.ac.uk, Dave Barrington,
Karen Sparck-Jones, Dave Stallard, Nehru Bhandaru, Penni Sibun,
Richard Korf, Arny Rosenberg, Robert Krovetz, John Brolio, Philip
Johnson, Dan Suthers, Adele Howe, Ed Hovy, Thad Polk, Lee Spector,
Dave Forster, rpg@cs.brown.EDU, Brian Quinn, Ingemar Hulthage, Carole
Hafner, Gary Berg-Cross, Graeme Hirst

Many thanks to them, and more thanks and apologies to anyone
accidentally omitted.

What I've found most interesting in looking into this subject is the
difficulty of characterizing just what it means when one finds a match
between parts of, say, two semantic network structures. If one takes
the knowledge representation to be an alternate notation for a set of
clauses in some logical language, then it would seem that matching is
identifying subtheories which two larger theories have in common. For
my particular application, text retrieval, this would seem to mean
that a user query specifies a set of clauses from which can be
inferred the information they are interested in, and the system is
obliged to retrieve both documents that contain the same same set of
clauses, and documents that contain other sets of clauses that allow
the same or similar conclusions to be drawn. This leads to a very
different view of matching than thinking of semantic networks as
colored graphs does. I'd be interested in hearing of references (here
we go again!) related to this interpretation of matching.

Best, Dave

David D. Lewis ph. 413-545-0728
Computer and Information Science (COINS) Dept. BITNET: lewis@umass
University of Massachusetts, Amherst CS/INTERnet:
Amherst, MA 01003 lewis@cs.umass.edu
USA
UUCP: ...!uunet!cs.umass.edu!lewis@uunet.uu.net

REFERENCES RELATED TO MATCHING OF KR STRUCTURES

(As can be seen, the references are in a variety of formats, as
provided by the senders. Some effort has been made to make
abbreviations consistent.)

1. String matching. Matching of DNA sequences.

Wagner and Fischer, The String-to-string correction problem JACM Jan
1974.  Lowance and Wagner, An extension to ... , JACM April 1975
various references to quick string-search algorithms, such as
Boyer-Moore Hall & Dowling, Approximate String Matching, Computing
Surveys, Dec 80


2. Hamming distance, distance in metric spaces, nearest-neighbor
algorithms, bit pattern similarity measures, classical information
retrieval, etc.

Kanerva, Pentti, Self-Propagating Search, CSLI report 84-7 (now a
book)

Geoffrey Hinton, Distributed Representations, CMU-CS-84-157 (also in
PDP?)

Lots of work on information retrieval (Salton, Croft, etc.)

Connection-Machine implementation described in Stanfill and Kahle,
Parallel Free-Text Search..., COMM ACM, Dec 1986


3. Matching and retrieval algorithms for unlabeled trees and graphs

Fowler, Haralick, et. al. "Efficient Graph Automorphism by Vertex
Partioning" AIJ 21 (1983) 245-269.

McGregor 1982 "Backtrack Search Algorithms and the Maximal Common
Subgraph Problem" Software--Practice and Experience, v. 12, pp 23-34,
1982.

Arnborg, et al "Complexity of Finding Embeddings in a k-tree." SIAM J.
Alg. Disc. Methods, v. 8, no. 2, 1987.

Preparata anmd Shamos, Computational Geometry,

Garey & Johnson, Computers & Intractability, p202.

Articles by K.S. Fu et al. (especially Eshera).


4. Misc. matching and retrieval of labeled trees and graphs. (The more
explicitly semantic network oriented work is in category 9.)


Hayes-Roth & Mostow "An Automatically Compilable Recognition Network
for Structured Patterns" IJCAI-75.

K-D trees and other divide-and-conquer methods for speeding up
searches ex: Omohundro, Efficient Algorithms with Neural Network
Behavior, U. Ill. Report UIUCDCS-R-87-1131;

finding patterns in networks, eg for simplifying constriant networks
ex: Gosling, Algebraic Constraints, CMU CS-83(?)-132

Robert Levinson -- "A Self-Organizing Retrieval System for Graphs" in
Proc.  AAAI-84. Also his thesis of the same title available from the
AI Lab at the University of Texas at Austin available for free as tech
report AI-85-05.

Spencer, Weighted Matching Algorithms, Stanford CS-87-1162

the RETE algorithm for speedily finding productions whose conditions
are satisfied-- see any good algorithms text or Forgy, RETE: A Fast
Algorithm..., AI vol 19, 1982


5. Transformation of knowledge representation structures, or the need
for transformation of knowledge representation structures. Issues of
knowledge representation in NL DB interfaces, NL generation systems.

Hajicova and Hnatkova "Inferencing on Linguistically Based Semantic
Structures." COLING-84.

Hobbs, Stickel, et al. "Interpretation as Abduction" ACL-88.

Hobbs and Martin, "Local Pragmatics". IJCAI-87.

Rau, "Spontaneous Retrieval in a Conceptual Information System."
IJCAI-87.

Wilks, "Understanding Without Proofs" IJCAI-73.

Sparck Jones, K. "Shifting Meaning Representations." IJCAI-83.

Moore, "Natural-Language Access to Databases--Theoretical/Technical
Issues." ACL-82.

Petrick, "Theoretical/Technical Issues in Natural Language ACcess to
Databses" ACL-82.

Nagao and Tsujii, "Mechanism of Deduction in a Question Answering
System with Natural Language Input." IJCAI-73.

Scha "English Words and Data Bases: How to Bridge the Gap". ACL-82.

Sangster, "On the Automatic Transformation of Class Membership
Criteria" ACL-79.

Stallard, "A Terminological Transformation for Natural Language
Question-Answering Systems" ACL-86.

Tomabechi, "Direct Memory Access Translation." IJCAI-87.

Euzenat, Normier, Ognowski, Zarri. "SAPHIR+RESEDA, A New Approach to
Intelligent Data Base Access". IJCAI-85.

Hovy, "Interpretation in Generation" AAAI-87.


6. Graph matching in vision

Spector, L.; Hendler, J.; Canning, J. Rosenfeld, A. "Symbolic
Model/Image Matching in Expert Vision Systems". CS-TR-370, Computer
Vision Laboratory, Center for Automation Research, University of
Maryland, College Park, MD, July 1988.


7. Matching of syntactic structures

@book{SALTON68 , author = "Gerald Salton" , title = "Automatic
Information Organization and Retrieval" , publisher = "McGraw-Hill
Book Company" , year = 1968 , address = "New York" }

Sumita, E. and Tsutsumi, Y. "A Translation Aid System Using Flexible
Text Retrieval Based on Syntax-Matching." TR87-1019, IBM Tokyo
Research Laboratory, 5-19 Sanbancho, Chiyoda-ku, Tokyo 102. May, 1988.


8. Constraint Satisfaction (DDL: I include some samples from the
constraint satisfaction literature, because graph matching problems
are often formulated in terms of constraint satisfaction.)

Nadel, B. A. "The General Consistent Labeling (Or Constraint
Satisfaction) Problem" DCS-TR-170. Department of Computer Science,
Rutgers University, New Brunswick, NJ, 08903. 1986.

Shapiro & Haralick "Structural Descriptions and Inexact Matching" IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol.
PAMI-3, no.  5, September, 1981.

Haralick & Elliott "Increasing Tree Search Efficiency for Constraint
Satisfaction Problems" AIJ 14 (1980), 263-313.

McGregor 1979 "Relational Consistency Algorithms and Their Application
in Finding Subgraph and Graph Isomorphisms." Information Sciences 19,
229-250 (1979).

Dechter & Pearl, " Network-Based Heuristics for
Constraint-Satisfaction Problems" AIJ 34 (1987), 1-38.

Mackworth 1977 "Consistnecy in Networks of Relations" AIJ 8 (1977),
99-118.


9. Semantic networks and frame-based knowledge representations,
especially knowledge base classifiers, KR structure matching
algorithms, and frame retrieval algorithms.

Hayes-Roth, "The Role of Partial and Best Matches in Knowledge
Systems" in Pattern-Directed Inference Systems. D.A. Waterman and
Frederick Hayes-Roth, eds. Academic Press, 1978.

Patel-Schneider, Brachman, Levesque 1984 "ARGON: Knowledge
Representation meets Information Retireval" In First IEEE Conference
on AI Applications.

KRL (Bobrow and WInograd, in Cog Sci 1980)

Brachman R.J. and Schmolze J.G. An overview of the KL-ONE Knowledge
Representation System. Cognitive Science vol.9 pp. 171-216, 1985

Schmolze J.G. and Lipkis T.A. Classification in the KL-ONE Knowledge
Representation System. Proc IJCAI-83, Karlsruhe, W-Germany.

Sowa, John F. _Conceptual Structure: Information Processing in Mind
and Machine_. The Systems Programming Series. Addison-Wesley
Publishing Company. Reading, MA, 1984.

You might look into Janet Kolodner's work on CYRUS (I think she's
currently at Georgia Tech), and Michael Lebowitz's (at Columbia) work
on UNIMEM and IPP.

There was one in the Prospector system, see Reboh, Knowledge
Engineering Tools in Prospector..., SRI TN243, 1981 theres a
desdcription of KODIAK and operations on its structures in Norvig,
Unified Theory of Text Understanding, UCB CSD 87-339 lots of the work
at Yale (or extending out of it) deals implicitly with the need to
recognize patterns in large semantice networks BORIS, IPP, UNIMEM,
etc.  Kolodners CYRUS work (see Cog Sci, 1981?) Patil, Causal Repr. of
Acid-Base Diagnosis, MIT LCS TR-267, 1981 deals with the issues for
translating between alternate network representations (representing
different levels of causal explanation) or a generally neat article,
Pople, Heuristic Methods for imposing Structure..., in Szolovitz, ed,
AI in Medicine, 1982


Shapiro, Stuart C., & Rapaport, William J. (1987), "SNePS Considered
as a Fully Intensional Propositional Semantic Network," in G. McCalla
and N. Cercone (eds.), The Knowledge Frontier: Essays in the
Representation of Knowledge (New York: Springer-Verlag): 262-315;
earlier version preprinted as Technical Report No. 85-15 (Buffalo:
SUNY Buffalo Dept.  of Computer Science, 1985).

Saks, Victor (1985), "A Matcher of Intensional Semantic Networks,"
SNeRG Technical Note No. 12 (Buffalo: SUNY Buffalo Dept. of Computer
Science).

Chin 1983 "A Case Study of Knowledge Representation in UC" IJCAI-83.


10. Classification algorithms, theories of classification

Shepard, Toward a Universal law of Generalization..., Science, 11 sept
87 Bobick, Natural Object Categorization, MIT AI TR 1001, 1987 Eleanor
Rosch's work on how classification works in people


11. Analogy recognition, analogical reasoning, metaphor

Gentner 1983 "Structure Mapping: A Theoretical Framework for Analogy"
Cognitive Science 7, 155-170 (1983).

Martin, "Understanding New Metaphors" IJCAI-87.

Carbonell "Metaphor -- A Key to Extensible Semantic Analysis" ACL-80.

---Dan Brotsky's SM thesis from a few years ago at the [MIT] AI Lab.
He did a parser for net grammars, to use in Winston-analogy reasoning.


AUTHOR Falkenhainer, Brian. and Forbus, Kenneth D. and Gentner, Dedre.
ORGANIZATION University of Illinois at Urbana-Champaign. Dept. of
Computer Science. Report UIUCDCS ; 1361 TITLE The structure-mapping
gengine : algorithm and examples / by Brian Falkenhainer, Kenneth D.
Forbus, Dedre Gentner.  Report UIUCDCS-R. University of Illinois at
urbana-Champaigu. Dept.  of Comuter Science. ; 1361 CITATION Urbana,
Ill : University of Illinois, Dept. of Computer Scinece 1987. 54 p. :
ill. ; 28 cm.  NOTES Bibliography: p. 42-54 Funding: Supported by the
Office of Naval Research, Personnel and Training Research Programs.
N00014-85-K-0559 SUBJECT Artificial intelligence.  Analogy.
Artificial intelligence.

ANNOTE "Reasoning by analogy between two domain, e.g. the flow of
water is analogous to the flow of heat. Analogy within a domain is
usually of the litteral-similarity type, a third type is called
mere-appearance. The latter two types are only discussed briefly, but
algorithms are given for all three."


12. Machine learning

--Winston's work on concept learning by subgraph isomorphism.


13. Partial matching in retrieval, especially using hashing (database
retrieval on secondary keys, retrieval of PROLOG clauses, etc.)

--Work done by John lloyd and his team at:  Department of Computer
Science, University of Melbourne, Melbourne, NSW, Australia. Ask them
for reports such as Partial-match retrieval for dynamic files TR 81/5
Dynamic hashing schemes TR 86/6 Partial-match retrieval using hashing
and descriptors TR 82/1


14. Language-oriented information (text) retrieval. Knowledge-based
information retrieval.


@article{LEWIS88b , author = "David D. Lewis and W. Bruce Croft and
Nehru Bhandaru" , title = "Language-Oriented Information Retrieval" ,
journal = "International Journal of Intelligent Systems" , year = 1989,
note = "To appear."} (DDL: The above paper contains a survey and
large number of references from this area. A slightly earlier version
is available as Technical Report 88-36; Dept. of Computer and
Information Science; Univ. of Massachusetts; Amherst, MA 01003. Below
I list those references most directly related to knowledge structure
matching, plus those that don't appear in the bibliography of LOIR.)

@article{COHEN87 , author = "Paul R. Cohen and Rick Kjeldsen" , title
= "Information Retrieval by Constrained Spreading Activation in
Semantic Networks." , journal = "Information Processing and
Management" , year = 1987 , volume = 23 , number = 4 , pages =
"255-268" }

@article{RAU87 , author = "Lisa F. Rau" , title = "Knowledge
Organization and Access in A Conceptual Information System" , journal
= "Information Processing and Management" , year = 1987 , volume = 23,
number = 4 , pages = "269--283"}

Cohen, Stanhope, Kjeldsen 1986 "Classification by Semantic Matching"

%A Roy Rada %T Knowledge-Sparse and Knowledge-Rich Learning in
Information Retrieval %J Information Processing and Management %D 1987
%P 195-210

%A Richard Forsyth %A Roy Rada %T Machine Learning: Expert Systems and
Information Retrieval %I Ellis Horwood %C London %D 1986

%A Roy Rada %T Gradualness Facilitates Knowledge Refinement %J IEEE
Transactions on Pattern Analysis and Machine Intelligence, 7, 5 %D
September 1985 %P 523-530

%A Hafedh Mili %A Roy Rada %T A Statistically Built Knowledge Base %J
Proceedings Expert Systems in Government Conference %D Oct 1985 %I
IEEE Computer Society Press %P 457-463


15. Matching for lexical choice in generation

Miezitis, Mara Anita. "Generating Lexical Options by Matching in a
Knowledge Base" Technical Report CSRI-217, Computer Systems Research
Institute.  University of Toronto. Toronto, Canada, M5S 1A1.
September, 1988.


16. Misc.

Cohen, A Powerful and Efficient Structural Pattern-Recognition System,
Art. Intell. 9, 1978 Purdom & Brown, Tree Matching and Simplification,
Software Practice&experience, Feb 1987 --look up the papers Nehru got
from Mostow

Allemang, Tanner, Bylander, and Josephson. "On the computational
complexity of hypothesis assembly." IJCAI-87.

***************************************************************
               Continued in Volume VI Number 5, Issue 5
***************************************************************
IRLIST Digest is distributed from the University of California,
Division of Library Automation, 300 Lakeside Drive, Oakland, CA.
94612-3550.

Send subscription requests to: LISTSERV@UCCVMA.BITNET
Send submissions to IRLIST to: IR-L@UCCVMA.BITNET

Editorial Staff:
   Clifford Lynch         lynch@postgres.berkeley.edu
                             calur@uccmvsa.bitnet
   Mary Engle             engle@cmsa.berkeley.edu
                             meeur@uccmvsa.bitnet
   Nancy Gusack           ncgur@uccmvsa.bitnet

The IRLIST Archives will be set up for anonymous FTP, and the
address will be announced in future issues.

These files are not to be sold or used for commercial purposes.
Contact Mary Engle or Nancy Gusack for more information on IRLIST.
The opinions expressed in IRLIST do not represent those of the
editors or the University of California.  Authors assume full
responsibility for the contents of their submissions to IRLIST.