+ Page 5 + +--------------------------------------------------------------------- | Stigleman, Sue. "Text Management Software." Public-Access | Computer Systems Review 1, no. 1 (1990): 5-22. +--------------------------------------------------------------------- I. Introduction Library users have access to an abundance of electronic text. Hundreds of electronic databases can be searched and information copied from them to a user's microcomputer. [1] Word processors are everywhere, being used to create notes, articles, and books, and to transfer documents such as letters and journals into electronic form. Scanners can easily copy text from print to disk. A rapidly growing collection of software is now available to help manage electronic text. This paper presents a taxonomy of the software designed for retrieving and manipulating text. Text management software can be divided into five categories: text retrieval, text database managers, bibliography formatting, hypertext, and text analysis. [2] The paper concludes with a discussion of the possible roles that libraries and librarians can play in fostering the utilization of this software by their users. A variety of names appear in the literature to describe the different categories of text management software. To help translate between this article and other articles or advertising literature, additional names are given for each category at the end of the section that discusses that category. A few representative microcomputer programs are also listed for each category. One category may be conspicuous by its absence from the list above. Personal information managers (PIMs) have gotten a lot of press in the last few years, beginning with the release of Lotus' Agenda. Initially, PIMs seemed to be a new category of text management software. However, a closer look at the text handling of PIMs reveals that it falls into three types: text retrieval, text database management, and hypertext, three of the five categories above. PIMs' uniqueness lies not in their text handling, but in the integration of text management with one or more of the following: calendaring, outlining, client management, personal project management, or desktop organizing. + Page 6 + II. Why Text Management Software? Why Not dBASE? While it is true that text can be stored and manipulated in various types of software, text management software is specifically designed to accommodate some of the particular characteristics of text. First, text has variable length values. Journal titles in citations can vary from short (Gut) to long (Transactions of the Section on Obstetrics, Gynecology and Abdominal Surgery of the American Medical Association). One oral history transcript may be 10 pages, another 50. A program which uses fixed length storage will force a user either to truncate long pieces of text or waste disk space on short ones. Text management software typically uses variable length storage. Second, text often has repeating values. A typical citation has multiple authors and multiple keywords. Research notes may each have multiple keywords. Generally, these authors or keywords should be treated equally in searching. Most text management software supports repeating values. Third, text files can be large. Conventional (i.e., non-text) file or database management programs often expect text to be short and distinct, such as part names or addresses. However, text as it is normally written or spoken is far from compact, which can result in files that would burst a program like dBASE at the seams. Text managers typically have large size limits, and are beginning to add support for media such as CD-ROMs, which can be used to store large volumes of text. Fourth, citations, notes, letters, transcripts, and other text may be in a variety of languages. Some text managers provide extensive support for a variety of foreign language alphabets. Fifth, text has an intricacy and complexity which places great demands on software. Text is filled with synonyms and variations in capitalization, spelling, and word forms. The searching features in text management software are more suited to text than those found in other types of software. Finally, searching is the heart of text management software. Before getting into the taxonomy of text managers, I'd like to give a fast overview of some of the searching features which can be found in various text managers. + Page 7 + Text Management Software's Searching Capabilities Text management software can employ a variety of term searching techniques: 1. Word or exact phrase searching. 2. Truncation (right, left, and internal). 3. Case insensitivity (often with case sensitivity as an option in a particular search). 4. Proximity searching: specifying how close words are to each other. 5. Field specification: in software that divides information into fields, being able to specify which field(s) the search term should appear in. 6. Boolean operators (AND, OR, and NOT). 7. Parentheses and nesting of Boolean operators. Several system capabilities can save the user time: 1. Building and manipulating multiple search statements. 2. Saving searches for later reuse. 3. Hedges or macros: storing multiple words which can be used in a search by entering the name of the hedge or macro. 4. Exploding sections of a hierarchical thesaurus. A variety of methods can be used to increase searching consistency: 1. Use of a thesaurus for data entry, editing, searching. 2. Data validation when data is input. 3. Mapping from abbreviations or codes to full terms. + Page 8 + These searching features are familiar to users of the typical bibliographic and nonbibliographic text databases commonly used in libraries. However, underlying these searching features are certain assumptions: 1. The user knows what words are used in the text. 2. The user knows how to spell. 3. The user knows how to type. In text searching, these assumptions are often not true. While some searches may be for known items (e.g., a particular citation, note, or paragraph), more typically the search is for an idea, which may be expressed in the text in a variety of different words and word forms. To help users find the text they want, some programs are adding more flexible searching features, such as the following: 1. Spelling checkers. 2. Automatic plurals. 3. Sound-alike searching (useful for finding spelling variations, particularly in names). 4. Fuzzy searching: searching for variations in a word or phrase. For example, the search "full text database" could retrieve "full text data file," "free text data," and "full text searching." 5. Weighted searching: assigning weights to each search term to indicate its relative importance. 6. Ranked output: displaying search results in order of relevancy, rather than the typical alphabetical or last-in-first-out orders. There are various ways to determine relevancy, such as the number of times the search term(s) appear in the text or the presence of the search term(s) in titles or section headings. 7. Profile: displaying a profile of the most common words in a document found using other searching techniques, thereby suggesting additional search terms to consider. 8. Similarity searching: "this record/document is what I want -- go find others like it." + Page 9 + Unfortunately, no single software program in any of the five categories offers all of these searching features. However, most commercially available text management programs have at least several of them, and the overall trend in all of the categories is a steady increase in searching power. III. Text Retrieval Software Text retrieval software searches files to find ones that match a search request. For example, text retrieval software can search the minutes of meetings that were created with a word processor, and identify all of the minutes which contain a particular word or phrase, such as "holiday hours" or "travel." Most text retrieval programs can then display the file(s) for browsing, highlighting the terms in the search request. A common feature is the ability to copy segments of the files to create a new file (a feature which led Burton Alperson to call this software "search and squirt" software). Text retrieval software comes in two general types: those that create indexes and those that don't. Programs that create indexes require additional time for indexing and additional disk space for the indexes, but search much more quickly. Non- indexing programs don't require the additional indexing time or space, but search more slowly because the program has to "read" each file every time it does a search. The most common type of index is the inverted index, although some programs use special proprietary methods to create smaller, space-saving indexes. Another way of dividing this software category is by the format of the files to be searched. Most text retrieval software can search files in common word processor formats, while the less powerful programs can search only through ASCII text. Some text retrieval programs are now branching out, searching through database records, spreadsheets, and computer programs. Text Retrieval Software Trends One of the most natural roles for text retrieval software is as a word processor "accessory." It will be interesting to see whether word processors evolve more sophisticated text search and retrieval powers of their own. For example, WordPerfect offers a "word search" command, which does have simple Boolean capability. However, displaying the text requires retrieving each file and then using the "search" command to find the desired character strings. + Page 10 + Some of the newest text retrieval programs not only provide browsing of files, they also operate as shells to call up the application that created the file. Uses of Text Retrieval Software Text retrieval software can be used for numerous applications. Since files stored on computer disks proliferate more quickly than the files in an average filing cabinet, text retrieval programs are very useful utilities for managing disks. Text retrieval programs can also enhance the use of administrative records such as manuals, minutes, and letters by making it easier to find particular topics. Other sample uses include managing the avalanches of paper created for legal trials, studying transcripts of interviews, analyzing collections of historical letters, and organizing reams of material downloaded from online databases. Other Names for Text Retrieval Software Text retrieval software can be called: Disk hunting software Full-text search and retrieval software Full-text retrieval software Indexers Indexing software Indexing and retrieval software Search and squirt software Search software Textual information management systems (TIMS) Text search software Representative Text Retrieval Software Programs Example text retrieval software programs include Gofer, Magellan, Text Collector, Total Recall, and ZyIndex. + Page 11 + IV. Text Database Managers Text database managers are designed for creating and searching databases of textual material (sometimes called textbases or lexical databases). The database can be created either from the keyboard, using the data entry features of the text database manager, or by importing text created in other programs or downloaded from other databases. Searches are performed on records in the database, typically only on one database at a time. Most text database managers can display the records retrieved by a search, highlighting the terms in the search request. Text database managers can be subdivided into free-form text database managers, which place no restrictions on the format of the text, and programs that require text to be formatted in a particular way, generally into fields. Some programs support a mix of formatted and unformatted text. Text database managers come in a variety of sizes. At the low end are the note programs, designed to substitute for the yellow stickies plastered on a person's desk, telephone, and door. The note variety of text database manager typically will hold fairly small amounts of text, and is often memory resident, allowing the program to be popped up whenever there is a sudden need to read or write a note. At the other end are the industrial-strength text database managers which can handle very large databases, and which are typically not memory resident. The uses of text database managers are infinite. They can be used for databases of reminders, research notes, citations, and case studies. A text database can be created from letters, interview transcripts, legal notes and transcripts, laboratory notes, diaries, or reports, to name a few. The database can be used to organize notes for writing, for faster retrieval of desired texts, for studying and analyzing the text itself, or for creating indexes to other collections such as reprint files, record or photograph collections, and laboratory specimens. Text database managers in some respects are quite similar to text retrieval software since both search text and can usually display retrieved text for browsing. However, text retrieval software searches files that were created by another program, typically a word processor, while text database managers search through text which has been stored in a text database. Text retrievers typically have no data entry module--they are primarily searching machines. Text database managers, on the other hand, have data entry and editing modules for creating and maintaining the text database. + Page 12 + For many applications, either a text retrieval program or a text database manager could be used. However, when the individual text items are very small (e.g., citations), using a text database manager to combine them into a text database makes more sense than cluttering up a disk with hundreds of tiny files. On the other hand, a text retrieval program would be preferred when the text files have a primary purpose other than searching. For example, my department creates numerous handouts which we use in the classes we teach. If the National Library of Medicine decided to stop publishing Index Medicus, our major journal index, a text retrieval program could tell us which handouts had the phrase "Index Medicus" in them and would need to be revised. Using a text database manager and merging all of these handouts into a textbase would have the disadvantage of stripping out all of the printer formatting codes, making it more difficult to produce the printed handouts. Text Database Managers Trends Many non-text file and database management programs are slowly becoming more friendly to text, which may eventually reduce the need for specialized text database management software. At the same time, some text database managers are adding features typically associated with file and database managers, such as security and programming languages. The line between the text and non-text file and database managers may eventually disappear. Other Names for Text Database Managers Software Text database managers software can be called: Archivers Full-text retrieval software Indexing software Information storage and retrieval software Information management software Lexical database management software Note managers Text retrieval software Text-oriented file management software Text-based database managers Text-based management systems (TBMS) Text-oriented database managers + Page 13 + Representative Text Database Managers Programs Example text database managers programs include Agenda, askSam, FYI 3000, INMAGIC, IZE, Marcon, Memory Mate, Nota Bene, Notebook II, SquareNote, and Textbank. V. Bibliography Formatting Software Bibliography formatting software lets you take a record that looks like this: AU Reid DC//Burnham RS//Saboe LA//Kushner SF TI Lower extremity flexibility patterns in classical ballet dancers and their correlation to lateral hip and knee injuries JR Am J Sports Med YR 1987 VO 14 IS 4 PG 347-52 and turn it into a citation that looks like this: Reid DC, Burnham RS, Saboe LA and Kushner SF. 1987. "Lower extremity flexibility patterns in classical ballet dancers and their correlation to lateral hip and knee injuries." Am J Sports Med 14(4):347-52. and then easily turn it into a citation that looks like this: Reid DC; Burnham RS; Saboe LA; Kushner SF. Lower extremity flexibility patterns in classical ballet dancers and their correlation to lateral hip and knee injuries. Am J Sports Med; 1987; 14(4): 347-52. Information from a citation needs to be entered only once, and it can then be formatted and reformatted into a variety of citation styles. Many bibliography formatting programs also can automatically assemble a bibliography from the references cited in a word-processed manuscript. + Page 14 + The classic use of bibliography formatters, besides formatting printed bibliographies, is to create an index to the contents of a personal or departmental filing cabinet or bookcase. The programs usually have space for storing notes for each citation, sometimes quite extensive ones. At the Health Sciences Library, we have used a bibliography formatter to create a database of sources of health statistics information, a common but particularly tricky area of reference work. Bibliography formatters can be regarded as text database managers which are set up to handle a particular type of text database--the citation database. Record formats for various types of citations are already defined, as are output formats for properly arranging the pieces of the citations into various citation styles. Text database managers can be used instead of bibliography formatters to set up databases of citations. The burden is usually on the user to design the record structures and citation formats, although some text database managers now come with bibliographic features. There also are some third party bibliography formatting add-ons for particular text database managers. Other Names for Bibliography Formatting Software Bibliography formatting software can be called: Bibliographic file management programs Bibliographic software Bibliography generators Citation managers Filing software Indexing software Literature retrieval systems Reprint software Representative Bibliography Formatting Software Programs Example bibliography formatting software programs include Bookends, Pro-Cite, Reference Manager, RefMaker, and RefMenu. + Page 15 + VI. Hypertext Software Hypertext software stores text in pieces called nodes, which are connected by links. The links allow movement from one node to another, following a conceptual path. Hypertext can be used to embed additional text, such as a glossary or commentary, into an existing text. It can also be used to link related parts of a single text or multiple texts, providing a visual cue to the reader that there is related material at the other end of the link. The node/link structure of hypertext makes it an ideal platform for developing instructional software, a rapidly growing area of hypertext use. The user of the instructional program can travel through the program following links, rather than being forced to follow a single path from beginning to end. Hypertext can also be used for storing texts, such as manuals or encyclopedias, with links built in for users, or in an open system where users can add links for subsequent users. Hypertext Software Trends A major trend in hypertext use is the addition of "hypertext" or "links" to other software programs, such as text retrieval or text database managers. Hypertext may become a feature of various categories of software, rather than a category of its own. Other Names for Hypertext Software If the software allows graphics, images, motion pictures, sound, or other media to be incorporated in the nodes, it is called "hypermedia." Representative Hypertext Software Programs Example hypertext software programs include Guide, Hypercard, Hyperpad, Hyperties, KnowledgePro, PC- Hypertext, and Textpro. + Page 16 + VII. Text Analysis Software Text analysis software is a loose collection of software that facilitates analyzing text by performing one or more of the following operations: concordancing, coding, or statistical analysis. Concordancing is the generation of lists of the words used in a text, accompanied by the location of the word and often some surrounding text. A concordance program offers more flexibility than a printed concordance. Users can specify what should be "concorded" (e.g., all words, all nouns, or all prefixes) and also context for the words (e.g., only a location or the surrounding sentence). More sophisticated programs allow accompanying translations or annotations. Some examples of this type of "interlinear text" are phonetic transcriptions, grammatical categories, intonation, and rhythm. Coding is the assignment of codes to specific sections of the text to allow retrieval of those sections of text. Coding is similar to assigning keywords, except that each coded segment has a specific beginning and ending point, and codes can be overlapped and even nested. A search on "marriage" might retrieve a two paragraph coded segment in an oral history transcript, while a search for "children" would retrieve only the two sentences within those two paragraphs which were coded for children. Statistical analysis is counting various text components, such as the number of unique words, the number of times words appear, or the distribution of words in parts of the text. Two major uses for text analysis software are for literary or linguistic analysis of text. Text analysis software can be used to examine themes in an author's works, to determine authorship of texts of unknown origin, or to analyze the grammatical structure of a language. Fields such as history, anthropology, sociology, psychology, nursing, education, and journalism use text analysis to discover themes in interview transcripts, a process called qualitative or content analysis. + Page 17 + Text Analysis Software Trends Concordancing programs serve a unique function and will probably continue to exist, particularly the ones designed for interlinear text manipulation. However, the future of coding and statistical analysis software is less certain. Unfortunately, coding programs, while providing retrieval of precisely defined segments of text, are often primitive in other respects. One popular coding program, for example, doesn't permit editing of the codes. To change one code, the entire text must be coded again. For this reason, text database managers or text retrieval software is sometimes used instead, even though keywords can't be assigned as precisely. If text database managers or text retrieval software added more sophisticated coding, particularly overlapped and nested coding, the rather primitive coding programs might disappear. Similarly, the addition of statistical analysis features to text database managers and text retrieval software might lessen the need for separate programs to do this analysis. Other Names for Text Analysis Software Text analysis software can be called: Concordance software Content analysis software Key-Word-in-Context (KWIC) programs Key-Word-Out-of-Context (KWOC) programs Qualitative analysis software Representative Text Analysis Software Programs Example text analysis software programs include the Ethnograph, Gator, IT, KWIC-MAGIC, KWICMERGE, Lbase, Micro- OCP, MTAS, TEXTPACK, and Wordcruncher. + Page 18 + VIII. Roles for Libraries Bibliography formatters and text database managers, the two types of text software that are particularly useful for citations, have found a natural home in libraries. Storing and retrieving citations has been the business of libraries for a long, long time. It is a fundamental area of expertise for most librarians, and users often think of the library as a natural place to ask for help. Many libraries actively support bibliography formatting software. [3] The workshops these libraries offer on reprint file management now include (or have been totally converted to) computerized reprint file management. In preparing for the workshops, librarians evaluate software programs, enabling them to serve as consultants for individuals or groups who want advice on selecting or using a program. Expertise in the programs is also developed by using them within the library to maintain local databases or to produce bibliographies. In a similar vein, some libraries evaluate and teach text database managers as substitutes for the more specialized (and usually more expensive) bibliography formatting software. (Some also give advice on how to use non-text database systems for storing text for those users who already use a non-text database program and don't want to invest time or money in an additional program.) Hypertext has also found an enthusiastic home in libraries, although most of the activity seems to be in the use of hypertext to develop library-related CAI, rather than fostering its use for text storage and retrieval. [4] Compared to the support offered for computerized citation files, there has been little formal activity in libraries to support non-citation text storage, retrieval, and analysis. However, interest in expanding into this area is implicit in the renaming of some bibliographic instruction programs to information management education. Most of the scholar's workstation and the "library of the future" projects also go beyond citation information into accessing and manipulating full text of various kinds. Certainly, libraries' support for citation software serves as a good model for some aspects of what they can do: education, evaluation of software, and consultation on selection and use of software. + Page 19 + Full-text storage will be a little more of a stretch for libraries than support for citations, although librarians are well aware of some of the pitfalls in full-text searching. (Users can be astonishing naive about the number of ways a single concept can be expressed, spelled, or punctuated.) Developing the necessary expertise with full-text software will not only require taking advantage of ways to use it in our own work, but also increasing our understanding of textual research methods used by scholars. Text analysis in particular is not an area of expertise for most librarians, and I haven't heard of any libraries studying or supporting this software. (At UNC- CH, the Institute for Research in Social Science has assumed responsibility for evaluating, promoting, and educating users in text analysis software.) There is also a strong need for assisting with data transfer. Moving text from one source to another is far from being a seamless process. Even when translator or importing programs are available to "automatically" transfer text into particular software programs, the user must be careful to use particular print formats when copying the text to disk and must often do some tedious manual editing of the resulting file. Librarians may find themselves (dare I say it) helping with the development of standardized formats for text data interchange. Conclusion The various types of text management software are particularly suited for searching text, and each type has a particular strength. For searching through files created by other applications, text retrieval software is used. Text database managers are used to build and search databases of text, ranging from small notes to collections of an author's writings. Bibliography formatters manage databases of citations and format citations into various styles. Building links between pieces of text is the strength of hypertext software. And finally, text analysis software generates online concordances, does coding of documents, and performs statistical analysis of text. + Page 20 + Increasingly, libraries are teaching users about text management software, and they are assisting users in employing this software. There are a number of practical issues which will need to be resolved for libraries interested in moving farther into supporting and promoting text management software. Hardware and software must be acquired and staff need time to explore and learn, all during lean financial times for most libraries. Many libraries are already struggling to meet the challenge of educating large numbers of people to search CD-ROM databases and online catalogs. However, the presence of those databases and catalogs in libraries provides librarians with an opportunity to demonstrate their expertise in citation management. It also opens a natural door into the broader world of text management. To help those who want to explore, I've attempted to provide a road map through the rapidly growing world of software tools for storing, retrieving, and manipulating electronic text. Notes 1. Whether information *may* be copied from a particular electronic database is of course an important issue, but a discussion of copyright of electronic media is beyond the scope of this paper. 2. The articles by Alperson, Badgett, Rupley, and Tenopir are useful overviews of the whole area of text management. Conklin's article is one of the classic overviews of hypertext. Matzkin and Puglia describe text database managers, while Melymuka describes text retrieval software. Angus and Walkenbach attempt to make sense of the chaotic world of PIMs. I found no good overview of text analysis; the articles by Simons, Fetters, and Giordano are illustrations of particular projects and software programs. 3. Articles by Wanat and Wood describe two libraries' programs for citation management. For members of the Library Orientation Exchange (LOEX), a request for material on reprint filing will result in a huge envelope of handouts developed by numerous libraries. EDUCOM's recently published book, Campus Strategies for Libraries and Electronic Information, is reportedly an excellent source of information on roles of libraries in supporting bibliography formatting and other kinds of text management software. 4. For further information on use of hypertext in libraries, see the discussion in the Public-Access Computer Systems Forum, a computer conference on BITNET (PACS-L@UHUPVM1). + Page 21 + Bibliography Alperson, Burton L. "Order Out of Chaos: The RIPS Are Here." Andrew Seybold's OUTLOOK on Professional Computing 6 (March 28, 1988): 1, 3-9. Angus, Jeff. "A Towering PIM Inferno: The Battle of Splitters vs. Lumpers." InfoWorld 11 (May 22, 1989): 45. Badgett, Tom. "Where Is It? Searching Through Files With Database Software." PC Magazine 6 (October 27, 1987): 175- 190. Conklin, Jeff. "Hypertext: An Introduction and Survey." Computer 20 (September 1987): 17-41. Fetters, Linda. "WordCruncher." Library Software Review 7 (July/August 1988): 294-297. Giordano, Richard. "Text Retrieval on a Microcomputer." Perspectives in Computing 8 (Spring 1988): 52-60. Matzkin, Jonathan and Catherine D. Miller. "Scratch Pads & Annotators: TSR Notes to Yourself." PC Magazine 6 (December 22, 1987): 185-198. Melymuka, Kathleen. "Text-Retrieval Software." PC Week 3 (February 25, 1986): 57-59. Miller, Michael J. "Personal Information Managers: The Next Big Application Category?" InfoWorld 10 (May 9, 1988): 75. Puglia, Vincent. "TBMS: Database Power Unleashed." PC Magazine 5 (November 25, 1986): 211-230. Rupley, Sebastian, Tracey Capen, and John Richey. "Quiet Please, Text Search In Progress." InfoWorld 11 (October 30, 1989): 55-72. Simons, Gary F. "Multidimensional Text Glossing and Annotation." Notes on Linguistics 39 (July 1987): 53-60. Tenopir, Carol. "Software Options for In-House Bibliographic Databases." Library Journal 112 (May 15, 1987): 54-55. Tenopir, Carol and Gerald W. Lundeen. "Software Choices for In- house Databases." Database 11 (June 1988): 34-42. Walkenbach, John. "Personal Information Managers." InfoWorld 10 (November 7, 1988): 57-79. + Page 22 + Wanat, Camille. "Management Strategies for Personal Files: The Berkeley Seminar." Special Libraries 76 (Fall 1985): 253- 60. Wood, Elizabeth. "Teaching Computer Literacy: Helping Patrons to Help Themselves." Medical Reference Services Quarterly 7, no. 3 (1988): 45-57. About the Author Sue Stigleman Information Management Education Health Sciences Library University of North Carolina at Chapel Hill Chapel Hill, NC 27599 uncses@med.unc.edu (919) 962-0700 +--------------------------------------------------------------------- | The Public-Access Computer Systems Review is an electronic | journal. It is sent to participants of the Public-Access Computer | Systems Forum, a computer conference on BITNET. To join the | PACS Forum, send an electronic mail message to LISTSERV@UHUPVM1 | that says: SUBSCRIBE PACS-L Your Name. (Put your first and last | name where it says "Your Name".) | | Copyright (C) 1990 by the University Libraries, University of | Houston. All Rights Reserved. Copying is permitted for | noncommercial use by computerized bulletin board/conference | systems, individual scholars, and libraries. This message must | appear on copied material. All commercial use requires | permission. + --------------------------------------------------------------------