Information Retrieval List Digest 004 (December 1989) URL = http://hegel.lib.ncsu.edu/stacks/serials/irld/irld-004 IRLIST Digest December 1989 Volume VI Number 4 Issue 4 *************** This is the last Issue to be distributed in 1989. After Christmas holidays are over, IRList Digest will resume distribution. At that time five more issues will complete the backlog of information received during the Digest's hiatus. Information received since our monitorship of the Digest will either be incorporated into those remaining issues (if timeliness is crucial) or will be distributed in issues following the backlog. We hope to be completely up to date by February. Thanks for your patience. *************** *************************************************************** Continued from Volume VI Number 3, Issue 3 *************************************************************** IV. PROJECTS: Initiatives and proposals / Bibliographies Abstracts / Miscellaneous A.1. EXPRES note (more) A.6. CLARIT Project B.1. References on Matching of KR Structures *************************************************************** IV. PROJECTS IV.A.1. Fr: Edward A. Fox Re: EXPRES project (see VI:3, 3) Hi! The note in #3 about EXPRES should be updated. I have heard that there has been a change in policy about EXPRES and suggest you quickly update the announcement with a follow-on based in current official info from NSF. Thanks, Ed ********** IV.A.6. Fr: Thomas M. Kuhn Re: Description of the CLARIT project The following is a description, based on our original proposal to DEC, of the CLARIT project's philosophy and major goals. Please let me know if you have any questions, corrections, etc. Thanks Thomas Kuhn Administrative Assistant CLARIT project kuhn@horatio.lcl.cmu.edu (inet address 128.2.229.27) Computational-Linguistic Approaches to Retrieval and Indexing of Text: The CLARIT Project A Proposal to the Digital Equipment Corporation >From the Laboratory for Computational Linguistics, Carnegie Mellon University Principal Investigators: ======================== David A. Evans, Associate Professor Linguistics and Computer Science Dana S. Scott, University Professor Computer Science, Mathematical Logic, and Philosophy August 1988 EXECUTIVE SUMMARY The management of information in computer-held textual databases -- in science and industry and in education and business -- continues to be a fast-growing problem. New, faster machines and new means of delivery of information have only exacerbated the situation. The difficulties of searching and organizing information can no longer be solved using traditional (e.g., keyword-based) indexing and retrieval technologies. Effective techniques, we assert, will increasingly depend on the ability to identify concepts as they occur in natural-language expressions, and future technologies will be based on computational-linguistic approaches to the analysis of free text. While very large-scale, full-text natural-language processing remains an elusive and remote possibility, we are convinced that a variety of existing computational-linguistic techniques can be applied effectively at this time to information management tasks to improve present technology both powerfully and simply. We therefore propose to undertake a one-year pilot project to explore along these lines the augmenting of current information-management techniques to enhance both information retrieval and indexing. In particular, we propose to develop lexical resources (i.e., computational lexicons and thesauri) and natural-language processing software to control for morphological variation in the forms of expressions; to map target `words' to canonical and related concepts to increase semantic coverage; and to consider restricted natural-language permutations in the syntax of expressions. This proposal continues past work on several projects with the aims both of improving and consolidating resources and software and of developing new applications in conjunction with other research groups at DEC and OCLC under the general guidance of the MERCURY Project. The specific twelve-month objectives of the CLARIT Project include: 1) A review of current indexing and retrieval technologies, to identify candidate systems susceptible to enhancement with computational-linguistic techniques (15% of our total effort). 2) The development of a prototype information workbench to serve as an interface to textual databases, in which the facilities associated with control of morphological, semantic, and syntactic variation could be tested (70% of our total effort). and 3) The design of a longer-term project aimed at the comprehensive treatment of one or more specific subject domains, using computational-linguistic information processing (15% of our total effort). MANAGING TEXTUAL INFORMATION The ability to manage problems and to develop innovative solutions is principally a function of the ability to manage information, which, in turn, is limited by the ability to organize and access all data relevant to a (typically) precise topic. This information is often in the form of written records, reports, and research literature -- and we assume in this proposal that such information is already in computer-readable form. Increasingly in practice, this management task translates into the problem of indexing and retrieving selective items; however, while our ability to store documents in electronic form has expanded tremendously with the advent of media such as CD-ROM and other optical-storage devices -- and while our ability to access remote databases has become commonplace -- our ability to identify and retrieve the information we require has not improved commensurably. One reason automation of information management has been frustrated by the limitations of traditional indexing and retrieval technology is that -- in their most primitive realization -- these technologies are based on `keyword' searches over the overt character strings that comprise the bulk of textual documents. More sophisticated approaches include the ability to search on Boolean combinations and regular expression sets of keywords. (Cf. Salton, 1983, for a discussion of such approaches.) Speed is often increased by utilizing inverted files as indexes of occurrences of words and terms. The theory underlying such technology is that the information content of documents is contained in the `words' of the document; and that for information indexing and retrieval, it is sufficient to identify a subset of words that both typically occur in documents associated with specific domains of infomation and also capture the sense of users' questions. The best current systems go so far as to establish statistical measures on the appropriateness of keywords in specific domains and to augment actual searches with keyword variants (e.g., `synonyms') that have a high expected co-occurrence or co-variation with the keywords that a user offers in the statement of a particular topic. However, such approaches are limited in principle because they are wedded to the use of `words' instead of `concepts' and because they are unable to control for the linguistic variations -- in morphology, in semantics, and in syntax -- that complicate the surface forms found in natural-language texts. In short, our claim is that natural-language texts are not just collections of character strings with probabilistic associations to topics; rather, they are the precisely coded realizations of concepts; and information processing that ignores the properties of natural-language code and the structure of concepts in a domain will have only haphazard success -- inversely proportional to the size of databases. In our past research on the development of tools and techniques for the improvement of natural-language information management in the biomedical domain in the MEDSORT-I, MEDSORT-II, and UMLS Projects, (For reports on these projects, see Carbonell, Evans, Scott & Thomason, 1985; Evans, 1987; Evans & Miller, 1987) we have argued that the only sound method for identifying the information content of textual records depends on using natural language as an index to concepts (Cf. Evans & Scott, 1987, for discussion of `concepts' in this context) supported by standardized representations of concepts in a domain. (Cf. Carbonell, Evans, Scott & Thomason, 1986; Evans, 1988, for concrete proposals on the structuring of domain-specific concepts for purposes of natural-language processing.) And we have adumbrated frequently-occurring features of natural-language expressions that play havoc with string-based, `keyword' approaches to text processing, including the following: 1) Different Words -- Same Meaning: `Word'-based processing fails to capture the variation in expression associated with synonymy, general/technical usage distinctions, and domain-specific idiom. Example: "stomach pain after eating" versus "postprandial abdominal discomfort". 2) Same Words -- Different Meaning: `Word'-based processing cannot readily accommodate the differences signaled by natural-language syntax. Example: "Venetian blind" versus "blind Venetian" (We thank Michael Lesk for this fine example.); "juvenile victims of crime" versus "victims of juvenile crime". 3) Pragmatic Perspective -- Circumlocution: `Word'-based processing does not control for differences in perspective that different users may bring to an information searching task. Example: lawyers on opposite sides of a damage case may regard an event -- and refer to it consistently -- from orthogonal points of view: "the accident" versus "the unfortunate incident". (Cf. Blair & Maron, 1985, for a discussion of the difficulties created by just such effects.) 4) Domain Specificity: `Word'-based processing cannot handle the sense restricitons typical of language in domain-specific usage. Example: "floating" has different senses in banking and in swimming; "sharp" is principally a `pain-sensation' modifier in clinical medicine, secondarily, a measure of mental-acuity, and unrelated to cutting-tool quality. The ultimate solution to the problem of indexing and retrieving information from textual databases is to process natural language for the concepts that `lie behind' the surface expressions we encounter in free text. Such rendering of expressions into representations of concepts is the common goal of virtually all approaches to natural-language processing (NLP). But full-scale NLP depends on resources (such as extensive knowledge bases) and algorithms (such as methods for resolving reference across clauses) that are not available in current technology. In part because of the general recalcitrance of free-text NLP, approaches to traditional text processing have eschewed computational-linguistic techniques in developing indexing and search strategies. We believe, however, that much of what is difficult in free-text NLP is also irrelevant to the identification of concepts in natural-language expressions; and that we can overcome the apparent NLP impasse with a selective application of computational-linguistic intelligence -- or more precisely, a dose of good computational-linguistic common sense. (A primitive illustration of the efficacy of even minor linguistic enhancement to conventional indexing strategies is offered in Vries, Shoval, Evans, et al., 1986.) The middle ground of information is semantic structure -- `words' are imprecise; intentions and intelligent inference, too difficult to capture. Semantic structure can be identified through a combination of relatively simple processes involving (1) control of lexical form (morphology), (2) screening for relevant modification relations (syntax), and (3) reference to standardized representations (knowledge representation). It is essentially such semantic structure that users attempt to capture when coining `keyword' phrases; and it is essentially such structure -- in the conceptual clustering found in domain-specific phrases and in the `topic' clauses -- that carries the information content in texts. In sum, we believe it is possible to base information-management technology (e.g., indexing and retrieval) on natural-language semantic structure; and that the required techniques will involve combinations of computational-linguistic processing and more traditional database management. REFERENCES Blair, D.C. & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28, 289-299. Carbonell, J.G., Evans, D.A., Scott, D.S. & Thomason, R.H. (1985). Final Report on the Automated Classification and Retrieval Project (MedSORT-I). Technical Report No. CMU-LCL-85-1, Laboratory for Computational Linguistics, Carnegie Mellon University. Carbonell, J.G., Evans, D.A., Scott, D.S. & Thomason, R.H. (1986). On the design of biomedical knowledge bases. In R. Salamon, B. Blum & M.Jorgensen (eds.), Medinfo 86. Amsterdam, The Netherlands: Elsevier Science Publishers B.V. (North Holland), 37-41. Evans, D.A. (1987). Final Report on the MedSORT-II Project: Developing and Managing Medical Thesauri. Technical Report No. CMU-LCL-87-3, Laboratory for Computational Linguistics, Carnegie Mellon University. Evans, D.A. (1988). Pragmatically-structured, lexical-semantic knowledge bases for unified medical language systems. Proceeding of the Twelfth Annual Symposium on Computer Applications in Medical Care (SCAMC), Washington, DC: IEEE Computer Society, 1988. Evans, D.A. & Miller, R.A. (1987). Final Task Report (Task 2) -- Unified Medical Language System (UMLS) Project: Initial Phase in Developing Representations for Mapping Medical Knowledge: INTERNIST-I/QMR, HELP, and MeSH. Technical Report No.CMU-LCL-87-1, Laboratory for Computational Linguistics, Carnegie Mellon University. Evans, D.A. & Scott, D.S. (1987). Concepts as procedures. F. Marshall, A. Miller & Z. Zhang (eds.), ESCOL '86, Proceedings of the Third Eastern States Conference on Linguistics (Pittsburgh, Pennsylvania, October 10-12, 1986), Department of Linguistics, The Ohio State University, Columbus, OH, 533-543. Salton, M. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill Book Company. Vries, J.K., Shoval, P., Evans, D.A., Moossy, J., Banks, G. & Latchaw, R. (1986). An expert system for indexing and retrieving medical information. Working Paper, Decision Systems Laboratory, University of Pittsburgh, Pittsburgh, PA. COMPUTATIONAL-LINGUISTIC ENHANCEMENTS TO INDEXING AND RETRIEVAL OF TEXT: A PILOT PROJECT Project Timing: To begin Sept. 1, 1988; end, August 31, 1988. Goal: Research designed to achieve improvements in the quality of information indexing and retrieval. Method: Computational-linguistic (CL) techniques to facilitate the identification of semantic structures under morphological and syntactic variation; exploitation of domain-specific semantic networks. Philosophy: Emphasis on `simple' solutions; straightforward enhancements of current approaches. Specific Phase-1 Objectives (coordinated with DEC): 1) Survey existing technologies in information management -- to identify candidate systems that would benefit from CL enhancements. 2) Develop a prototype `information workbench' that would serve as a (partial) natural-language interface to textual databases. This activity would include: + Experimentation with a set of `simple' but sophisticated CL technologies to improve indexing and retrieval incrementally, in a designated domain, using a moderately large corpus; possibly combining CL and more traditional approaches to indexing/retrieval. + Exploring the possibility of providing CL resources for a `multiply-expert' workstation for information processing. + Identification in a large general corpus (to be chosen in consultation with DEC) of the ``standard and stable structures'' of language, to be used in creating a basic CL knowledge base (lexicon/thesaurus). + Development of specific, general language computational-linguistic resources (e.g., lexicons, thesauri, parsers) to supplement existing resources in the LCL (e.g., MORPH, the MEDSORT-II/UMLS Lexicon/Thesaurus). 3) Define a longer-term project and application, possibly in collaboration with groups other than DEC and CMU. RESEARCH ENVIRONMENT AND PROJECT PERSONNEL Carnegie Mellon University Carnegie Mellon University (CMU) is located in Pittsburgh, Pennsylvania. It currently has 4,320 undergraduate and 2,218 graduate students; and 503 full-time and 40 adjunct faculty, in seven college-level units, including the College of Humanities and Social Sciences and the Department of Computer Science, both represented in this proposal. The university is internationally recognized as a leader in Computer Science and applied computational technologies. It is strongly committed to the development of resources for undergraduate and graduate education, faculty research, and general academic computing. Among CMU's initiatives in this area are a campus-wide communications network, the Andrew system, and the initial steps of the MERCURY Project for the creation of an electronic library system. The Laboratory for Computational Linguistics The CMU Program in Computational Linguistics and Laboratory for Computational Linguistics (LCL) are unique nationally; and the faculty associated with the project are experts in Cognitive Science, Computer Science, and Linguistics. The Laboratory for Computational Linguistics (LCL) is directed by David A. Evans and is based in the Philosophy Department at CMU, which administers the Joint Graduate Program in Computational Linguistics. (The Joint Program is collaborative academic program between CMU and the University of Pittsburgh.) The LCL is the best equipped academic research facility for Computational Linguistics in the world, having both a high-capacity super-mini central computer and twenty high-powered artificial-intelligence workstations, along with many other kinds of support equipment. The LCL houses several large research projects and is committed to the development of applications for CL techniques; and it will make its resources freely available to the CLARIT Project. In particular, the CLARIT Project will utilize, expand, and refine the following resources: 1) MORPH -- a natural-language morphology program which, in conjunction with a lexicon, can identify and analyze word-form variants (e.g., as resulting from inflectional morphology), derived forms (e.g., as resulting from compounding, prefixing, suffixing, and other combinations of morphemes), phrasal lexical expressions (e.g., idioms, domain-specific phrases, etc.), and selected neologisms. 2) The COMP-LEX Lexicon -- a computational lexicon currently consisting of a basic-English core vocabulary (approximately 8,000 terms), soon to be augumented with the domain-specific lexicons developed under the MEDSORT-II and UMLS projects (representing an additional, approximately 25,000 terms in clinical and general medicine). 3) NEWCAT -- a natural-language parser (developed by Roland Hausser) with special facilities for processing free-text. 4) The MEDSORT-II/UMLS Knowledge Base -- an extensive knowledge base, especially for `findings' in clinical medicine, but also offering thesaurus-like classification of basic concepts and providing a design for selective natural-language processing of general language texts. and 5) On-Line Dictionaries -- Webster's Seventh (and other dictionaries). ********** IV.B.1. Fr: LEWIS@cs.umass.EDU Re: For IRLIST: References on matching of KR structures This message contains a list of references I received in response to my request to AILIST and NL-KR in November 1988 for information on finding matches between knowledge representation structures. It includes all references received by Roland Zito-Wolf, and previously posted, in response to a similar query in 1987. Roland was particularly interested in approximate matching algorithms, while I was particularly interested in matching when some transformation of knowledge structures had to be done, so you know our biases. The papers are broken down into broad classes--each paper is present in only one class, though many are appropriate to several of the classes. Also, I have not actually seen a number of the papers below, so some may have been placed in inappropriate classes. Finally, absolutely no claims of thoroughness are made for the following listing. Some of the listed areas have huge literatures and are represented here by only one or two references. The following people submitted references to Roland or I, or otherwise aided in the creation of this list: Len Friedman, Philippe Dugerdil, Bob MacGregor, Peter Szolovits, Len Moskowitz, Terry Winograd, Austin Tate, Roy Rada, Mike Shafto, William J. Rapaport, Mike Tanner, Lisa Rau, Robert Levinson, akbari@CS.COLUMBIA.EDU, INDUGERD%CNEDCU51.BITNET@wiscvm.wisc.edu, greene@m.cs.uiuc.edu, walt%cs.hw.ac.uk@cs.ucl.ac.uk, Dave Barrington, Karen Sparck-Jones, Dave Stallard, Nehru Bhandaru, Penni Sibun, Richard Korf, Arny Rosenberg, Robert Krovetz, John Brolio, Philip Johnson, Dan Suthers, Adele Howe, Ed Hovy, Thad Polk, Lee Spector, Dave Forster, rpg@cs.brown.EDU, Brian Quinn, Ingemar Hulthage, Carole Hafner, Gary Berg-Cross, Graeme Hirst Many thanks to them, and more thanks and apologies to anyone accidentally omitted. What I've found most interesting in looking into this subject is the difficulty of characterizing just what it means when one finds a match between parts of, say, two semantic network structures. If one takes the knowledge representation to be an alternate notation for a set of clauses in some logical language, then it would seem that matching is identifying subtheories which two larger theories have in common. For my particular application, text retrieval, this would seem to mean that a user query specifies a set of clauses from which can be inferred the information they are interested in, and the system is obliged to retrieve both documents that contain the same same set of clauses, and documents that contain other sets of clauses that allow the same or similar conclusions to be drawn. This leads to a very different view of matching than thinking of semantic networks as colored graphs does. I'd be interested in hearing of references (here we go again!) related to this interpretation of matching. Best, Dave David D. Lewis ph. 413-545-0728 Computer and Information Science (COINS) Dept. BITNET: lewis@umass University of Massachusetts, Amherst CS/INTERnet: Amherst, MA 01003 lewis@cs.umass.edu USA UUCP: ...!uunet!cs.umass.edu!lewis@uunet.uu.net REFERENCES RELATED TO MATCHING OF KR STRUCTURES (As can be seen, the references are in a variety of formats, as provided by the senders. Some effort has been made to make abbreviations consistent.) 1. String matching. Matching of DNA sequences. Wagner and Fischer, The String-to-string correction problem JACM Jan 1974. Lowance and Wagner, An extension to ... , JACM April 1975 various references to quick string-search algorithms, such as Boyer-Moore Hall & Dowling, Approximate String Matching, Computing Surveys, Dec 80 2. Hamming distance, distance in metric spaces, nearest-neighbor algorithms, bit pattern similarity measures, classical information retrieval, etc. Kanerva, Pentti, Self-Propagating Search, CSLI report 84-7 (now a book) Geoffrey Hinton, Distributed Representations, CMU-CS-84-157 (also in PDP?) Lots of work on information retrieval (Salton, Croft, etc.) Connection-Machine implementation described in Stanfill and Kahle, Parallel Free-Text Search..., COMM ACM, Dec 1986 3. Matching and retrieval algorithms for unlabeled trees and graphs Fowler, Haralick, et. al. "Efficient Graph Automorphism by Vertex Partioning" AIJ 21 (1983) 245-269. McGregor 1982 "Backtrack Search Algorithms and the Maximal Common Subgraph Problem" Software--Practice and Experience, v. 12, pp 23-34, 1982. Arnborg, et al "Complexity of Finding Embeddings in a k-tree." SIAM J. Alg. Disc. Methods, v. 8, no. 2, 1987. Preparata anmd Shamos, Computational Geometry, Garey & Johnson, Computers & Intractability, p202. Articles by K.S. Fu et al. (especially Eshera). 4. Misc. matching and retrieval of labeled trees and graphs. (The more explicitly semantic network oriented work is in category 9.) Hayes-Roth & Mostow "An Automatically Compilable Recognition Network for Structured Patterns" IJCAI-75. K-D trees and other divide-and-conquer methods for speeding up searches ex: Omohundro, Efficient Algorithms with Neural Network Behavior, U. Ill. Report UIUCDCS-R-87-1131; finding patterns in networks, eg for simplifying constriant networks ex: Gosling, Algebraic Constraints, CMU CS-83(?)-132 Robert Levinson -- "A Self-Organizing Retrieval System for Graphs" in Proc. AAAI-84. Also his thesis of the same title available from the AI Lab at the University of Texas at Austin available for free as tech report AI-85-05. Spencer, Weighted Matching Algorithms, Stanford CS-87-1162 the RETE algorithm for speedily finding productions whose conditions are satisfied-- see any good algorithms text or Forgy, RETE: A Fast Algorithm..., AI vol 19, 1982 5. Transformation of knowledge representation structures, or the need for transformation of knowledge representation structures. Issues of knowledge representation in NL DB interfaces, NL generation systems. Hajicova and Hnatkova "Inferencing on Linguistically Based Semantic Structures." COLING-84. Hobbs, Stickel, et al. "Interpretation as Abduction" ACL-88. Hobbs and Martin, "Local Pragmatics". IJCAI-87. Rau, "Spontaneous Retrieval in a Conceptual Information System." IJCAI-87. Wilks, "Understanding Without Proofs" IJCAI-73. Sparck Jones, K. "Shifting Meaning Representations." IJCAI-83. Moore, "Natural-Language Access to Databases--Theoretical/Technical Issues." ACL-82. Petrick, "Theoretical/Technical Issues in Natural Language ACcess to Databses" ACL-82. Nagao and Tsujii, "Mechanism of Deduction in a Question Answering System with Natural Language Input." IJCAI-73. Scha "English Words and Data Bases: How to Bridge the Gap". ACL-82. Sangster, "On the Automatic Transformation of Class Membership Criteria" ACL-79. Stallard, "A Terminological Transformation for Natural Language Question-Answering Systems" ACL-86. Tomabechi, "Direct Memory Access Translation." IJCAI-87. Euzenat, Normier, Ognowski, Zarri. "SAPHIR+RESEDA, A New Approach to Intelligent Data Base Access". IJCAI-85. Hovy, "Interpretation in Generation" AAAI-87. 6. Graph matching in vision Spector, L.; Hendler, J.; Canning, J. Rosenfeld, A. "Symbolic Model/Image Matching in Expert Vision Systems". CS-TR-370, Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park, MD, July 1988. 7. Matching of syntactic structures @book{SALTON68 , author = "Gerald Salton" , title = "Automatic Information Organization and Retrieval" , publisher = "McGraw-Hill Book Company" , year = 1968 , address = "New York" } Sumita, E. and Tsutsumi, Y. "A Translation Aid System Using Flexible Text Retrieval Based on Syntax-Matching." TR87-1019, IBM Tokyo Research Laboratory, 5-19 Sanbancho, Chiyoda-ku, Tokyo 102. May, 1988. 8. Constraint Satisfaction (DDL: I include some samples from the constraint satisfaction literature, because graph matching problems are often formulated in terms of constraint satisfaction.) Nadel, B. A. "The General Consistent Labeling (Or Constraint Satisfaction) Problem" DCS-TR-170. Department of Computer Science, Rutgers University, New Brunswick, NJ, 08903. 1986. Shapiro & Haralick "Structural Descriptions and Inexact Matching" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-3, no. 5, September, 1981. Haralick & Elliott "Increasing Tree Search Efficiency for Constraint Satisfaction Problems" AIJ 14 (1980), 263-313. McGregor 1979 "Relational Consistency Algorithms and Their Application in Finding Subgraph and Graph Isomorphisms." Information Sciences 19, 229-250 (1979). Dechter & Pearl, " Network-Based Heuristics for Constraint-Satisfaction Problems" AIJ 34 (1987), 1-38. Mackworth 1977 "Consistnecy in Networks of Relations" AIJ 8 (1977), 99-118. 9. Semantic networks and frame-based knowledge representations, especially knowledge base classifiers, KR structure matching algorithms, and frame retrieval algorithms. Hayes-Roth, "The Role of Partial and Best Matches in Knowledge Systems" in Pattern-Directed Inference Systems. D.A. Waterman and Frederick Hayes-Roth, eds. Academic Press, 1978. Patel-Schneider, Brachman, Levesque 1984 "ARGON: Knowledge Representation meets Information Retireval" In First IEEE Conference on AI Applications. KRL (Bobrow and WInograd, in Cog Sci 1980) Brachman R.J. and Schmolze J.G. An overview of the KL-ONE Knowledge Representation System. Cognitive Science vol.9 pp. 171-216, 1985 Schmolze J.G. and Lipkis T.A. Classification in the KL-ONE Knowledge Representation System. Proc IJCAI-83, Karlsruhe, W-Germany. Sowa, John F. _Conceptual Structure: Information Processing in Mind and Machine_. The Systems Programming Series. Addison-Wesley Publishing Company. Reading, MA, 1984. You might look into Janet Kolodner's work on CYRUS (I think she's currently at Georgia Tech), and Michael Lebowitz's (at Columbia) work on UNIMEM and IPP. There was one in the Prospector system, see Reboh, Knowledge Engineering Tools in Prospector..., SRI TN243, 1981 theres a desdcription of KODIAK and operations on its structures in Norvig, Unified Theory of Text Understanding, UCB CSD 87-339 lots of the work at Yale (or extending out of it) deals implicitly with the need to recognize patterns in large semantice networks BORIS, IPP, UNIMEM, etc. Kolodners CYRUS work (see Cog Sci, 1981?) Patil, Causal Repr. of Acid-Base Diagnosis, MIT LCS TR-267, 1981 deals with the issues for translating between alternate network representations (representing different levels of causal explanation) or a generally neat article, Pople, Heuristic Methods for imposing Structure..., in Szolovitz, ed, AI in Medicine, 1982 Shapiro, Stuart C., & Rapaport, William J. (1987), "SNePS Considered as a Fully Intensional Propositional Semantic Network," in G. McCalla and N. Cercone (eds.), The Knowledge Frontier: Essays in the Representation of Knowledge (New York: Springer-Verlag): 262-315; earlier version preprinted as Technical Report No. 85-15 (Buffalo: SUNY Buffalo Dept. of Computer Science, 1985). Saks, Victor (1985), "A Matcher of Intensional Semantic Networks," SNeRG Technical Note No. 12 (Buffalo: SUNY Buffalo Dept. of Computer Science). Chin 1983 "A Case Study of Knowledge Representation in UC" IJCAI-83. 10. Classification algorithms, theories of classification Shepard, Toward a Universal law of Generalization..., Science, 11 sept 87 Bobick, Natural Object Categorization, MIT AI TR 1001, 1987 Eleanor Rosch's work on how classification works in people 11. Analogy recognition, analogical reasoning, metaphor Gentner 1983 "Structure Mapping: A Theoretical Framework for Analogy" Cognitive Science 7, 155-170 (1983). Martin, "Understanding New Metaphors" IJCAI-87. Carbonell "Metaphor -- A Key to Extensible Semantic Analysis" ACL-80. ---Dan Brotsky's SM thesis from a few years ago at the [MIT] AI Lab. He did a parser for net grammars, to use in Winston-analogy reasoning. AUTHOR Falkenhainer, Brian. and Forbus, Kenneth D. and Gentner, Dedre. ORGANIZATION University of Illinois at Urbana-Champaign. Dept. of Computer Science. Report UIUCDCS ; 1361 TITLE The structure-mapping gengine : algorithm and examples / by Brian Falkenhainer, Kenneth D. Forbus, Dedre Gentner. Report UIUCDCS-R. University of Illinois at urbana-Champaigu. Dept. of Comuter Science. ; 1361 CITATION Urbana, Ill : University of Illinois, Dept. of Computer Scinece 1987. 54 p. : ill. ; 28 cm. NOTES Bibliography: p. 42-54 Funding: Supported by the Office of Naval Research, Personnel and Training Research Programs. N00014-85-K-0559 SUBJECT Artificial intelligence. Analogy. Artificial intelligence. ANNOTE "Reasoning by analogy between two domain, e.g. the flow of water is analogous to the flow of heat. Analogy within a domain is usually of the litteral-similarity type, a third type is called mere-appearance. The latter two types are only discussed briefly, but algorithms are given for all three." 12. Machine learning --Winston's work on concept learning by subgraph isomorphism. 13. Partial matching in retrieval, especially using hashing (database retrieval on secondary keys, retrieval of PROLOG clauses, etc.) --Work done by John lloyd and his team at: Department of Computer Science, University of Melbourne, Melbourne, NSW, Australia. Ask them for reports such as Partial-match retrieval for dynamic files TR 81/5 Dynamic hashing schemes TR 86/6 Partial-match retrieval using hashing and descriptors TR 82/1 14. Language-oriented information (text) retrieval. Knowledge-based information retrieval. @article{LEWIS88b , author = "David D. Lewis and W. Bruce Croft and Nehru Bhandaru" , title = "Language-Oriented Information Retrieval" , journal = "International Journal of Intelligent Systems" , year = 1989, note = "To appear."} (DDL: The above paper contains a survey and large number of references from this area. A slightly earlier version is available as Technical Report 88-36; Dept. of Computer and Information Science; Univ. of Massachusetts; Amherst, MA 01003. Below I list those references most directly related to knowledge structure matching, plus those that don't appear in the bibliography of LOIR.) @article{COHEN87 , author = "Paul R. Cohen and Rick Kjeldsen" , title = "Information Retrieval by Constrained Spreading Activation in Semantic Networks." , journal = "Information Processing and Management" , year = 1987 , volume = 23 , number = 4 , pages = "255-268" } @article{RAU87 , author = "Lisa F. Rau" , title = "Knowledge Organization and Access in A Conceptual Information System" , journal = "Information Processing and Management" , year = 1987 , volume = 23, number = 4 , pages = "269--283"} Cohen, Stanhope, Kjeldsen 1986 "Classification by Semantic Matching" %A Roy Rada %T Knowledge-Sparse and Knowledge-Rich Learning in Information Retrieval %J Information Processing and Management %D 1987 %P 195-210 %A Richard Forsyth %A Roy Rada %T Machine Learning: Expert Systems and Information Retrieval %I Ellis Horwood %C London %D 1986 %A Roy Rada %T Gradualness Facilitates Knowledge Refinement %J IEEE Transactions on Pattern Analysis and Machine Intelligence, 7, 5 %D September 1985 %P 523-530 %A Hafedh Mili %A Roy Rada %T A Statistically Built Knowledge Base %J Proceedings Expert Systems in Government Conference %D Oct 1985 %I IEEE Computer Society Press %P 457-463 15. Matching for lexical choice in generation Miezitis, Mara Anita. "Generating Lexical Options by Matching in a Knowledge Base" Technical Report CSRI-217, Computer Systems Research Institute. University of Toronto. Toronto, Canada, M5S 1A1. September, 1988. 16. Misc. Cohen, A Powerful and Efficient Structural Pattern-Recognition System, Art. Intell. 9, 1978 Purdom & Brown, Tree Matching and Simplification, Software Practice&experience, Feb 1987 --look up the papers Nehru got from Mostow Allemang, Tanner, Bylander, and Josephson. "On the computational complexity of hypothesis assembly." IJCAI-87. *************************************************************** Continued in Volume VI Number 5, Issue 5 *************************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@postgres.berkeley.edu calur@uccmvsa.bitnet Mary Engle engle@cmsa.berkeley.edu meeur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. These files are not to be sold or used for commercial purposes. Contact Mary Engle or Nancy Gusack for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.