O'Connor, 'Concepts and Techniques for Use-Specific Representations of Documents', LIBRES v4n04 (December 31, 1994) URL = http://hegel.lib.ncsu.edu/stacks/serials/libres/libres-v4n04-o'connor-concepts LIBRES: Library and Information Science Research Electronic Journal ISSN 1058-6768 December 31, 1994 Volume 4 Issue 4 Quarterly LIBRE4N4 OCONNOR _____________________________________________________________ Concepts and Techniques for Use-Specific Representations of Documents Authors: Brian O'Connor, PhD School of Library and Information Management Emporia State University Emporia, KS 66801, USA oconnorb@esuvm.BITNET *ABSTRACT* The results of an experiment in automatic selection of key frames of moving image documents are summarized, then used as the basis for reconsidering the representation of documents in any medium. Failures of access tools such as abstracts are considered as inadequacies of a representation system. Points of change in a document's physical data presented without intermediary conceptual tagging are proposed for the construction of representations which are formed at the time of an individual use and which accommodate a user's idiosyncratic requirements. *BACKGROUND* *machine augmented representation of image documents* The Image Structure Information System (ISIS) project grew out of conversations with film theorist Bertrand Augst at the University of California, Berkeley. The early intent was to devise a computerized system for detecting shot changes in narrative films. The project, now based at the School of Library and Information Management at Emporia State University, has become an ongoing series of exploratory research projects concerned with machine-augmented representation of image-based documents. These range from detection and analysis of the physical data to collection of user-generated aboutness judgments. The projects provide insights into the use and representation of a document type for which the standard access tools have been demonstrably inadequate. (O'Connor, 1984) Typical systems of access to moving image documents have attempted to apply representation techniques developed for verbal documents. The consequence of inadequate representation has been inadequate access capability. Identifying Key Frames in a manner analogous to the computer identification of Key Words in print documents was the first ISIS project devoted to bibliographic access concerns. It used the physical data of the document without labelling concepts to accomplish this identification of Key Frames. In so doing, it demonstrated the possibility of a computer making representations of moving image documents (MIDs) which could distinguish between two documents on the same topic but with different extratopical attributes. Expanding the concept of using only the physical document data from moving image documents to using the physical data of documents in general lies at the heart of these considerations of document representations. (O'Connor, 1991) *EXPERIMENT: DISCRIMINATION BASED ON PHYSICAL DATA* *background* The absence of any formulaic relationship between images and words limited the precision and subtlety of representation of MIDs by words. Patterns of light are neither governed by rules of content, nor easily described by verbal means. Film scholars and catalogers alike have been compelled to use vague terms to describe elements within a frame and across frames. For example, the Anglo American Cataloguing Rules, sections 7.0B1, 7.7B11, 7.7B18 which deal with description of documentary and unedited film, make use of verbal descriptors (evidently at the cataloger's discretion) of the objects recorded, as well as imprecise terminology (e.g., shot, CU, MS, LS) to represent movement and stylistic (extra-topical) aspects of the material. In addition, the very term "shot" presents considerable difficulty. There are no defining bounds on the content, appearance, or length of a "shot". (Arnheim, 1969; Nichols, 1981; Novitz, 1977; O'Connor, 1984; Pryluck, 1982) Wordbased documents could always be described in terms of letters and groups of letters existing at a precise address (e.g., number of spaces from beginning of the text) and subsequent comparisons of groupings and locations could easily be made. This is not to say that digital environments have not greatly enhanced (particularly in terms of time) the description of verbal documents nor that strings of characters always present unambiguous meaning. (Paice, 1990) However, moving image documents, while they are comprised of frames which can be counted, are also comprised of many types of data within each frame. The MID text has, at most, few, and arguably no, rules for what can or must be present in any frame or group of frames. The old phrase "a picture is worth a thousand words" simply does not hold. If there is any element of the MID analogous to the letter, it is the pixel, the individual picture element. This can be visualized as a cell in a grid laid over the picture, typically between 100,000 and 400,000 cells in each frame (1800 frame per minute of running time). Each pixel has a color and brightness value. While the physical characteristics of the individual recording or display system will determine the range of values for each pixel, there are no rules on what should or can be at each location. The digital environment enables determination of values for each and every pixel, as well as tracking of locations and values across the length of the document. This, in turn, allows a three-dimensional, multi-stream model of moving image documents for precise description within and across frames.( O'Connor, 1991; Rorvig, 1993) The guiding concept for the ISIS project was Samuel Johnson's definition of an abstract as "a smaller quantity containing the virtue and power of a greater." (Simpson, 1989) That is, the system did not seek any predetermined percentage of total document length or for data from any particular portion of the moving image document. A distinction was assumed between the physically present text and the conceptual text -- the ideas stimulated in the viewer by the encoding presented by the physical text. (Yamaguchi,1982) The project sought to avoid any concept of "abstract" which was necessarily analogous to the definitions developed for print documents beyond the general concept that it be a smaller package than the original. *ISIS experiment* The primary hypothesis for the ISIS experiment was: numeric descriptions of physical characteristics of moving image documents within a digital environment would discriminate between two documents described with the same topic heading. Video tapes of two women's marathons were used as the first test set. Document 1 was a segment of Bud Greenspan's documentary of the 1984 Olympic Games _Sixteen Days of Glory_. (Greenspan, 1986) The footage was obtained from multiple cameras and edited after the event. Document 2 was a segment of the live feed from Moscow TV of the 1985 World Games. (Goodwill Games, 1986) The footage was obtained from multiple cameras (though only approximately one quarter the number of cameras used for the Greenspan film) and was edited in real time. Both documents presented approximately four dozen women running 26 miles in urban environments. Reviews of the two productions, as well as the comments of several viewers questioned during the design of the experiment, indicated that Document 1 was generally regarded as "exciting," "thrilling," "a real thrill," "great." Document 2 was generally regarded as "boring," "a real snoozer," "dull." It should be noted, though, that a subset of viewers made some distinctions; marathon runners and coaches agreed that Document 1 was exciting, but noted that they could not really tell what was happening in terms of strategy and pacing, whereas in the "dull" Document 2, they could determine these. In an operational sense, the task of the ISIS system was to describe those physical characteristics of the two documents which would account for the different perceptions in a value-neutral manner. The ISIS experiment used only two attributes of the MIDs: percentage of screen area occupied by the primary object on the screen; screen location (x,y coordinates) of that object. It did not account for sound or for changes in color from frame to frame or any determinable aspects of the recording process such as lens length. It should also be noted that some simplifying assumptions were used in this first exploration. Selection of the primary object on the screen was made by the experimenter, rather than by any algorithm calculating dominant presence (perhaps by comparison of size and location across frames). Since the topic of each film was women running, it was assumed that images of women running (or objects related to the topic, such as starter's gun, track, beverage containers, etc.) could be taken as a primary topic and, thus, as a primary subject in each frame. This in no way suggests that other elements could not be considered important to some users. A sociologist might want to look at backgrounds to see clothing styles or group dynamics of the audience; an urban planner might be interested in the layout of streets in the two documents; a kinesiologist might isolate just one muscle group across the time of the document. Indeed, one reason for enriched access tools is precisely to serve users interested in 'secondary' topics. *ISIS Results* Analysis of document structure was achieved by seeking points of discontinuity based on changes in the size of the target object within the frame. The ISIS system supports a variable threshold for percentage of change; in this instance it was set at 15%. Such a figure eliminates the small variations which take place in documentaries as a result of minor fluctuations such as dips in the road, changes of speed of the runners, or the natural movements imparted to a hand-held camera. The stream of data was sampled at each frame (1/30th second) and was marked at each point where a subsequent frame presented a 15% or more change. The segment of data between any two points was termed an 'image set.' This unit is roughly analogous to the term "shot" but is more closely defined. table 1: _measure_ doc1_ _doc2_ longest image set (secs) 19.08 61.88 shortest image set 0.68 1.38 mean seconds/image set 4.99 14.98 variance 10.12 174.14 standard deviation 3.18 13.20 These figures represent a significant difference in the document structures (roughly, the "style"). The Moscow broadcast (doc2) presents many fewer changes of data sets than the Greenspan film (doc1). There is, essentially, a 3:1 ratio between doc2 and doc1 for longest, shortest, and mean image set length. Frequency data for number of image sets at one second intervals show that nearly 90% of the images in doc1 lie at 10 seconds or less, while nearly 30% of the images in doc2 lie beyond the longest image set of doc1. table 2: _measure_ _doc1_ _doc2_ mean area (pixels) obj1 55767 28394 mean area (pixels) obj2 10336 5795 Difficulty in seeing the women because of the small amount of screen space they generally occupied was an aspect of the Goodwill Games document which was mentioned frequently by viewers. This reaction was especially pronounced when the viewers had already seen the Los Angeles video with its many screen-filling images of the women. In order to quantify this perception, two measurements were made in each frame: the area occupied by all the women in the frame and the area occupied by a single woman. If there was only one woman in the frame, that area was used for both measures. In table two, the area of all women in the frame is represented by 'obj1' and the area of the single woman within a group by 'obj2.' In both cases, the mean areas show that women in the Los Angeles video occupied twice the on-screen area than those in the Moscow video did. *DISCUSSION* *representation of moving image documents* Being able to identify points of change in the physical data of a moving image document enables precise description, representing the documents using native elements. Numerical and graphical representations can be constructed on data of the sort demonstrated in the ISIS work. Also, the data can be used to present to each user the images found at the point of difference. Thus, the representation could well be a screen or screens with still images or short full motion clips. The representation algorithm could set some threshold for the point of difference (for whatever attribute might be considered) and so establish a default representation of the document. The algorithm could also allow user setting of the threshold, either initially or after examination of the default set of representative images. A user might wish to examine the default set for each of several works, then set a different level of representation for some subset. Similarly, a user might set a threshold for representing each member of a collection, then set another level for examining the subset of documents with a likelihood of being relevant. The possibility of using different thresholds of selection allows for different uses, each requiring different levels of penetration. Some users might be able to make a relevance judgment on the basis of a single image, where others might require several images per document just to identify candidates for relevance judgment. Some might require a histogram or other frequency data to situate some images, while others might just skim images at random until one "catches the eye," then say to the system "find other images with the same physical data." Using sets of images, together with graphs and charts of statistics, a user can make decisions on less data and time than would be required for examination of the complete document and with greater precision than that provided by brief verbal descriptions. The user is in control of the level of representation, the number of images or precision of other data. If the user has (or, during a search, comes upon) a pixel level criterion such as "I want to see any images that have as much or more of the color in the grass of this image," it is possible to conduct such a search. On the other hand, if the searcher has only a vague idea, rapid examination of the images at the points of discontinuity in the data makes use of the pattern recognition capabilities of the human brain. Just as most users of verbal documents are interested in words or phrases rather than individual letters or even the shapes of parts of certain letters (though close textual criticism would certainly require such concern), MID users are not going to be interested in the individual pixel in a video. However, just as the analysis of an abstractor (human or machine) is conducted at the character level (for it is here that the words are constructed and differentiated), so too the analysis of images is based on the pixel. In a sense, while most users are unlikely to seek at the letter or pixel levels, the representation must be conducted at a level so deep and familiar as to go unnoticed by most in order to make sense of the larger constructions -- words, sentences, pictures. An analogy would be the topographic map. Such a map presents the physical data of a geographic area. One does not actually see trees, water, hills, etc., but one can see contour lines and determine the slope and location of hills and valleys. One can consult a large area map for an overview, then go to a detail map. Or one can simply start looking at detail maps for details without concern for what region one is examining until a desired detail is found. *representation of documents in general* Extrapolation of the ISIS moving image project to documents in general may provide an approach to a serious difficulty with indexing and abstracting. The concept of using the physical data, looking for points of discontinuity, avoiding conceptual tagging, and presenting the user with various tools for control and analysis of selected data might be a means of avoiding failures due to the lack of coordination between a user and the representation system of a bibliographical tool. It is important here to clarify the use of certain terms. "REPRESENTATION" is used here to include both indexing and abstracting, not only to generalize the two terms, but because the two are not distinct. "INDEXING" is taken to be a pointing function, directing a user to a certain part of a collection or document; "ABSTRACTING" is taken to be the essence of a document, providing a means of analysis with a smaller amount of data. Since different users have differing requirements and, as suggested above, may make different sets of the same representation, the function served determines what is an abstract and what is an index. It is neither necessary nor appropriate to assume that an index must have a certain form or size, or that an abstract must be a short narrative rendering of an original. In the considerations which follow, the term "abstract" has been used because most of the access issues discussed are typically associated with what is commonly termed abstracting. In a system in which the user controls depth of representation and makes functional use of any available representation, "abstract" would be generalized to include both the pointing and the summarizing functions. *smaller quantity of physical data* Returning again to the idea of an abstract as a smaller quantity with the virtue and power of a greater, one might say that the smaller quantity is a function of the physical document and that the virtue and power are results of the conceptual document. One could then say that achieving just a smaller quantity is a process presenting little challenge; however, achieving a smaller quantity with the virtue and power of a greater requires a fundamental understanding of the question state and coding/decoding abilities of users. The physical text, whether a printed document, an audio tape, a photograph, or a computer disc, presents a (typically) stable set of data. This data can be said to be diachronic, remaining stable across time. Any user of the document will see the same pattern of squiggles on paper, or be stimulated by the same patterns of color and brightness from a videotape, or have air waves vibrated in the same manner by an audio compact disc. Interpretations of this data are dependent on time of use and the cognitive models of the user; they are the synchronic attributes of the document. As an extreme example, one might think of picking up a paperback of Homer's _Iliad_. Allowing for the minor differences between typeset letters and handwritten letters, the squiggles on the page are the same that an Athenian bard might have used in a competition in the second century BCE or a Roman schoolboy might have studied in third century AD. Today, the typical reaction might well be "It's Greek to me!;" the code is not familiar to most readers. Subtler differences can also be troubling. Movies which seemed compelling two decades ago seem quaint or even boring, both because of changes in physical appearance of dress, homes, cars, etc., and because of stylistic differences. Similarly, numerous books written up to the 1970s, despite their close considerations and compelling arguments, cause hesitations in many readers because of their use of masculine instead of gender-neutral pronouns. *differences and user determination of concepts* Bateson posits that information is a "difference that makes a difference." (Bateson, 1979) The first difference can be linked to the physical document and the changes of data it presents at different points. Each and every user, including the person (or machine) who acts as the representing agency, has the same data set and, therefore the same set of differences. The individual user, however, determines which of those differences make a difference "for this use." For example, reviewers of an early draft of this paper had in front of them the same words in the same order; yet they made comments reflecting significantly different conceptual renderings. * ...enjoyed both the clear, lucid style and the contents * ...interesting ...but the author's vague style and diversions obscure the point *abstracts as search tools* Abstracts constitute a critical tool for users of information systems because they reduce search and evaluation time and, in some instances, serve as adequate responses to information requirements in their own right. Each abstract accomplishes this by being, once again in Johnson's terms "a smaller quantity containing the virtue and power of a greater." In the general case, an abstract is a verbal representation of a verbal document, though, of course, other sorts of documents are represented by both verbal and other means. A representation of any sort of document selects certain elements of the original for highlighting and presentation. (Marr, 1982) The purpose of the representation will likely influence or even necessitate both the method of selection of elements and the mode of presentation. (Hayes, 1993) The representation may or may not be comprised of native elements of the document. For example, a movie might be represented by words such as a card catalog entry or a review in a magazine or it might be represented by a set of images and sounds directly extracted from the original. Additionally, generalization and translation may take place. That is, an article about photographing landscapes, sculpting in glass, and writing poetry might be represented with a generalized term such as "arts." Translation might be writing an English abstract of a Japanese article, or it might be the simplification of technical terminology into lay user terms. *failures* While the general concept of abstracts as access tools holds sound, two vexing sorts of failure raise critical questions: a retrieved document does not meet expectations raised by the abstract appropriate documents never found because useful concept was not included in the abstract Such failures may be addressed by consideration of the nature of representation and by using a digital environment to operationalize a user-centered model of representation. The components of a representation provide a structure for the examination of abstracts: what is the purpose? what are the native elements of the original? what elements are to be highlighted? what is the process of selection? what are the elements of the representation? who makes the decisions? are the users aware of the selection process? are the users aware of what is excluded? As Boyce notes: "...abstracting activities are not ends in themselves, and cannot be evaluated is if they were." (Boyce, 1992) Craven speaks to the automated "production of different summaries of a single document to suit various user needs," noting that little work has been done on computer-assisted techniques. (Craven, 1991) Frequently, representations are generated by an agency other than the author or the user. This means that the selection of important elements and the representation style are established outside the document/user relationship. (O'Connor, 1988) The agency establishes just which characteristics of the documents are to be available for consideration and just what characteristic of the user are acceptable. Clearly, in the majority of cases, there is sufficient overlap between user needs and document representations for user satisfaction. Yet failures do occur. These are not the failures of an abstractor misspelling a word, incorrectly copying a name, or even misunderstanding a document. Rather, these are systemic failures, instances of the representation system not providing what a user requires and, possibly, not making it obvious that this is the case. This results from assuming that the subject of a document is an independent, stable entity, rather than a relationship between a user and a document. (DeMay, 1980; Robertson, 1982) Before the availability of digital systems, it would have been difficult, if not unthinkable, to wait until a user came to the system to generate the abstract or to customize the abstract to the particular user at each instance of use, though, of course, the good reference encounter is a user-specific representation. "It is neither practical nor possible to index every concept under every possible user approach." (Yerkey, 1991) Thus, the majority of abstractors have used general rules for selection of document elements based on quantities (e.g., abstract = 1/10th to 1/20th the length of the original) and locations (e.g., select subtitle, first and last sentences; or select the sentences in which the most frequently occurring words appear) to produce single abstracts. These are intended to serve the needs of all potential users of a document regardless of their requirements or their abilities. Mapping the physical data of the documents and assuming that the conceptual aspects are embodied in the physical data enable the construction of algorithms which can present the user with different, more appropriate: selection criteria for the representation levels of generality levels of penetration into the text Abstracting systems which present the user with one pre-constructed representation must assume: an abstractor will always choose all the concepts in each and every document that would be of interest to any user of the system. any translation of elements will be accomplished in terms comprehensible and obvious to any user any generalizations will follow part/whole or exemplar/class paths which are obvious and useful to any user of the system each and every user is aware of the rules of selection and presentation by which the abstract was made The general problem of failures in representation of documents is addressed by, among others, Blair as subject indeterminacy (Blair, 1986); by Wilson in his discussion of catalog as access mechanism (Wilson, 1983); by Liddy in her discussion of the discourse-level structure of abstracts (Liddy, 1991); and by Weinberg in her consideration of "Why Indexing Fails the Researcher" (Weinberg, 1987). In each of these, the general problem is one of disjunction between an agency's method of representing documents and the requirements of the users. One approach to resolving these difficulties is simply to avoid using representations as access tools completely. In fact, this is an approach often used by research scholars. (O'Connor, 1993) Especially in their case, there is little likelihood that representations will be adequate (how is one to ask for "those works which will stimulate new knowledge?"). Fortunately, research scholars are (at times) in the position of having both the motivation and the resources of time and ability to engage in direct contact with the collection -- browsing. Browsing can be seen as any of a number of idiosyncratic search strategies in which the user makes up the rules of representation of collection and of each individual document and the searcher's knowledge state. Browsing can also be seen as representation without an intermediary and searching without a specified target. Important as browsing is and as logical a response as it may be to resolving the inadequacies of system-generated representations, it is clearly not useful for the scholar scanning journal contents, finding large numbers of citations on a database utility, or even trying to make a decision about documents found during browsing. Nor is browsing the answer for anybody with a clear idea of a useful topic or set of general topic areas. Clearly, there is often a need for search tools. Abstracts reduce the time required to distinguish between candidate items located by any means or in determining if a seemingly useful document warrants deeper attention; they also make it possible to stay abreast of developments without lengthy engagement with an original document. They warrant our attention precisely because they are such useful tools and, conversely, because their ubiquity and their very utility may mask any shortcomings resulting from systems of representation which do not adequately account for idiosyncratic needs of each user. Insofar as a user resembles the profile assumed by the abstractor and seeks material within the topical bounds obvious to the abstractor, there is a high likelihood of success -- of the abstractor achieving virtue and power relevant to the user. To whatever degree this assumed overlap or common ground does not hold, it is necessary to consider the distinction between the physical and the conceptual documents. The operational consequence of such a distinction is to use algorithms which pinpoint changes in data sets, present the points of difference and perhaps the immediate neighbor data sets, and leave the conceptualizing to the user. Just what is meant by points of difference in the data will depend on the medium of the document. The presentation may be enhanced by graphical representation of quantities and relationships of data elements. Such a system for representation of documents could present a default set of points of difference which would be analogous to an abstractor making a decision about concepts; additionally, it could allow the user to set the threshold. In a system which depended on word frequency counts, the system could set a default level of, for example, only words which are found four or more times; the user might be quite satisfied with that, or might wish to see only a very few terms and so set the level at ten, or might wish to see everything not on a stop list and set the level at one. (Maron, 1982). In the ISIS system a user can specify any percentage of change in image size or image location on the screen from 1% upward. Images can also be sought by percentage of total frame comprised of a particular color. Ideally, a user could pick size, location, color, texture, indeed, elements in any combination and any percentage. Using just the size of the primary object in a frame and the representation mode of one frame (1/30th sec) on each side of a transition boundary (last from of one image set and first frame of subsequent image set) yields significant reduction in amount of data. From the ISIS project with the Greenspan (doc1) and the Moscow Television (doc2) documents data reduction might look like this: at a threshold of 15%, the mean number of frames per image data set are: doc1: 30 frames/sec * 4.99 sec = 150 doc2: 30 frames/sec * 14.98 sec = 450 By definition, an image data set is a stream of physical data in which there is no change equal to or greater than the stated threshold (in this case 15%). The points of change can, then, be represented by one frame (say, the last one for convenience, though any would do) of the preceding image data set, and one from the current image data set. Thus, each point of change is represented by two frames and each image data set can be situated by three frames. Therefore, the amount of data to be examined in the two test documents is reduced significantly: doc1: 2 frames, not 150; 1.3% of total doc2: 2 frames, not 450 0.4% of total Of course, the actual reduction of time required to examine the reduced data and make any determinations will be user dependent issues of type of display -- how many images on the screen at once; screen resolution; cut & paste abilities; graphical representation of the mean image set lengths from which the representative images are derived -- may also contribute to the actual time of engagement. *CONCLUSION* One might generalize the results of the ISIS project and the subsequent considerations of representation by redefining the access tool "abstract:" set of significant differences between subsequent units of meaning which is less than the total number of units of meaning "significant" is here taken to be user determination of which data is meaningful and what algorithms for data reduction are appropriate One could add to this a recognition that not all users will feel a need to describe themselves in terms different from system assumptions (or that abstractors often do a good job of user analysis and do construct their works according to the profile of typical use). That is, one could say that the abstracts typically constructed at present are not outside the representation model, rather they are a system-defined default set of element selection procedures and coding methods. So the preceding consideration of "abstract" should be modified to account for current practice: the representation system may present a default set of reduced data, but will comply with the definition of a representation only if it makes known to the users the system of selection and presentation Suggesting that units of meaning and significance depend on individual users' concerns, their idiosyncratic notions of which differences will make a difference, also suggests an administrative liberation. Dedicating system resources to identification of points of difference in the data, methods of presentation of resultant smaller quantities, and the evaluation of user satisfaction does step away from the traditional construction of secondary documents. However, it relieves the system of having to make the "right" determination of important concepts and the "right" method of presentation. *REFERENCES* Bateson, G. (1979). _Mind and nature: A Necessary unity_. New York: E. P. Dutton. Blair, D. C. (1986). Indeterminacy in the subject access to documents. Information Processing and Management, 22(2):229-241 Boyce, B. (1992). Review of the book _Indexing and Abstracting in Theory and Practice_, by W. F. Lancaster. Journal of the American Society for Information Science, 41, 456. DeMay, M. (1980). The relevance of the cognitive paradigm for information science. In: Harbo, O. & L. Kajberg (Eds.), _Theory and application of information research: Proceedings of the 2nd International Research Forum on Information Science_. London: Mansell. Goodwill Games. (1986). Atlanta, GA: Turner Broadcasting. Greenspan, B. (1986). _Sixteen days of glory_ [Film]. Burbank, CA: Paramount Home Video. Hayes, R. M. (1993). Measurement of information. Information Processing and Management, 29 (1), 1-11. Hjorland, B. (1992). Concept of "subject" in information science. Journal of Documentation, 48 (2), 172-200. Liddy, E. (1991). The Discourse-level structure of empirical abstracts: An Exploratory study. Information Processing & Management, 27 (1), 55-81. Maron, M. E. (1977). On indexing, retrieval and the meaning of about. Journal of the American Society for Information Science, 28 (1), 38-43. Marr, D. (1982). _Vision_. San Francisco: Freeman. Nichols, W. (1981). _Ideology of the image: Social representation in the cinema and other media_. Bloomington: Indiana University Press, p. 13 Novitz, D. (1977). _Pictures and their use in communication_. (pp 86-93). The Hague: Martinus Nijhoff. O'Connor, B. (1985). Access to moving image documents: background concepts and proposals for surrogates for moving image documents. Journal of Documentation, 4, 212-214. O'Connor, B. (1988). Fostering creativity: Enhancing the browsing environment. International Journal of Information Management, 8, 203-210. O'Connor, B. (1991). Selecting key frames of moving image documents: A Digital environment for analysis and navigation. Microcomputers for Information Management, 8 (2), 119-133. O'Connor, B. (1993). Browsing: A Framework f or seeking functional information. KNOWLEDGE, 15(2), 211-231. Overhage, C. F. J. & Harman, R. J. (Eds.). (1965). _INTREX: Report of a planning conference on information transfer experiments_. Cambridge, MA: MIT Press. Paice, C. (1990). Constructing literature abstracts by computer: Techniques and prospects. Information Processing & Management, 26 (1), 171-186. Robertson, S. E., Maron, M. E., & Cooper. W. S. (1982). Probability of relevance: A Unification of two models for document retrieval. Information Technology: Research and Development, 1 (1). Rorvig, M. E. (1993). A Method for automatically abstracting visual documents. Journal of the American Society for Information Science, 44 (1), 40-56. Simpson, J. A. & Weiner, E. (Eds.). (1989). _Oxford english dictionary_ (2nd ed., Vols. 1-20). Oxford: Clarendon Press. Weinberg, B. H. (1987). Why indexing fails the researcher. Proceedings of the 50th Annual Meeting of the American Society for Information Science, 24, 241-244. Yamaguchi, K. and Kunii, T. (1982). PICCOLO logic for a picture database computer and its implementation. IEEE Transactions on Computers, C-31 (10), 983-996. Yerkey, A. N. (1991). Review of the book _Introduction to Indexing and Abstracting_ (2nd ed.) by C. B. Cleveland and A.D. Cleveland. Information Processing & Management, 27 (4), 391-392.