+ Page 51 + ----------------------------------------------------------------- Manojlovich, Slavko. "Mounting Commercial Databases Using the SPIRES DBMS." The Public-Access Computer Systems Review 1, No. 3 (1990): 51-57. ----------------------------------------------------------------- 1.0 Introduction Commercial databases like ERIC, DISSERTATION ABSTRACTS, and INSPEC have been publicly accessible through the various online search services for over 20 years. A relatively small number of universities and other institutions have acquired and mounted some of these databases on their local database management system (DBMS) for at least as long a period of time. A fairly recent phenomenon is the general belief and/or demand that universities should be locally mounting a variety of commercial databases. For those institutions with integrated library systems, the demand for locally accessible commercial databases is going one step further with the demand that access to these databases somehow be integrated with access to the library's catalogue. Integration can mean either the use of a common interface for searching both the catalogue and other databases or the creation of a link between the commercial databases and the library's serial holdings as reflected in the catalogue. The vendors of integrated library systems are beginning to respond to this new demand by offering their customers pre-loaded commercial databases which can reside along with the library's catalogue and be accessed using a common interface. Pre-loaded databases are similar to CD-ROM databases in that the data have been prepackaged by the vendor for consumer use. Issues surrounding the packaging of the data, such as the number and type of access points (i.e., indexing) and the data output formats, are important only when comparing databases from different vendors. The customer typically has no control over the manner in which a commercial database is accessible through a vendor's integrated system. Commercial databases on CD-ROM or pre-loaded by a vendor may not be suitable for many institutions because of expensive licensing fees, limited access, or just poor packaging of the data. Another alternative to acquiring commercial databases on CD-ROM or from an integrated library system vendor is to purchase the databases on magnetic tape and mount them using a DBMS such as SPIRES, BRS, or BASIS. + Page 52 + Stanford University, Rensselaer Polytechnic Institute, and Memorial University of Newfoundland use the SPIRES DBMS (developed by Stanford University) to provide access to both the library catalogue and to commercial databases. Princeton University, Syracuse University, University of British Columbia, Simon Fraser University, and other institutions use SPIRES to provide access to GPO, ERIC, COMPUSTAT, PSYCHINFO, GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA, and other commercial databases. The remainder of the article will describe various issues associated with the local mounting of commercial databases and how SPIRES addresses and accommodates these issues. 2.0 Analyzing and Loading a Commercial Database Except for the U.S. MARC Communications Format there are no existing standards for the dissemination of commercial databases. A survey of a small number of commercial databases reveals that databases distributed on magnetic tape are written using either the ASCII or EBCDIC character set. They may be comprised of fixed or variable length records, and they may or may not represent diacritics following the American Library Association's standard. Given that these databases can be characterized as containing full-text, numeric, bibliographic, or other types of data, even the identification of a "record" or a "field" is not that straightforward. For example, what constitutes a record in ISI's CURRENT CONTENTS database? Is it the journal issue or the article within the issue? In the GROLIER ACADEMIC AMERICAN ENCYCLOPEDIA database a paragraph of an article and not the article constitutes a record. The loading of the database is the transformation of the original data into a format required by the DBMS. During the initial examination of the data the analyst is formulating a model of how the data will be represented in the DBMS. The primary factor determining how the data are stored is the DBMS's ability to accommodate the data. For example, MARC records contain the hexadecimal character code '1F' to indicate the start of a subfield or may contain hexadecimal characters representing diacritics. If the DBMS cannot store these characters, some form of data transformation must take place. The same is true of graphic images. + Page 53 + Ideally, the DBMS should preserve the original content of the data as supplied by the database vendor. The SPIRES load procedure is designed to accommodate the broad spectrum of data types supplied by commercial database vendors. Following the creation of a description of the database for SPIRES (i.e., the "file definition") there are two ways to "batch" load a database into SPIRES: writing a computer program to convert the data to the SPIRES input format or writing an input load procedure using SPIRES formats language. 2.1 Writing a Computer Program to Convert the Data The first method of loading data is to write a computer program that will convert the original data into SPIRES "input format." SPIRES input format identifies the start and end of a record, field, subfield, etc. A sample entry for the 245 MARC tag would be as follows: 245 = (10 aGone with the Wind.); In this example, "245" is the field name, the parentheses surround the value of the field, and the semi-colon is the end- of-field terminator. SPIRES will load anything found within the parentheses including the hexadecimal code "1F," which is stored after the "0" in the above example. 2.2 Writing a Load Procedure Using the SPIRES Formats Language The second method of loading data is to write a input load procedure using the SPIRES formats language. This load procedure will read in data from an external file and parse it into records, fields, subfields, etc. For an application which requires a lot of coding or parsing (e.g., a MARC record) it is probably easier to write a computer program using PL/1 than to do the equivalent using the SPIRES formats language. + Page 54 + 3.0 Indexing SPIRES provides the entire range of indexing options available in most DBMSs, including keyword, phrase, date, and coded indexes. SPIRES also provides a "personal name index" which is designed to accommodate simultaneously both a "first name surname" and "surname, first name" name search. A search for "John Smith" or "Smith, John" will both retrieve the same records in a personal name index search. Index names can have aliases associated with them. For example, someone accustomed to always using "FIND NAME" to search for individuals in every database can have "NAME" added as an alias for a "FIND ARTIST" search in a fine arts slides database or as an alias for "FIND FONDS" search in an archival and manuscripts database. ("FONDS" is the equivalent of "MAIN ENTRY" for archivists.) In the creation of an index, you specify to SPIRES the fields which will be included in the index. You also specify through actions called "PASSPROCS" how the index term will be created from the input data. For example, you can specify a list of stop words (terms which will not be indexed), or indicate that you don't want to include punctuation in the index term. Another important feature of SPIRES involves the ability to transform an index file into a separate database and associate additional information with each index record entry. In addition, SPIRES uses action statements called SEARCHPROCS that allow you to take a search term and process it through, for example, a thesaurus file, to determine the proper form of the search term. The SPIRES $REPARSE SEARCHPROC will then take this converted search expression and execute it. The use of SEARCHPROCS and $REPARSE to process and transform search statements is one of the methods of creating database linkages in SPIRES. Database linkages result in the delivery of value-added packaging of information. + Page 55 + Consider the following example of the implementation of the EXPLODE command on a sample MEDLINE file at Memorial University of Newfoundland. The EXPLODE command enables you to retrieve all the subordinate subject entries associated with a Medical Subject Heading (MeSH) term. MeSH terms are part of a hierarchical subject classification. An index is created from the MeSH database with the heading being the key of the record. Each index record also contains a concatenated list of MeSH tree numbers associated with the heading. When a patron performs an EXPLODE search (e.g., "FIND EXPLODE ABO FACTOR") on the MEDLINE bibliographic database SPIRES first looks up the heading in the MeSH heading index, retrieves a list of MeSH tree numbers, and appends a truncated search character to each tree number. This OR'd list of tree numbers is passed back to SPIRES, which then re-executes a new search on the tree number index which is built from the MEDLINE database. The above model of database linkages can be applied to any commercial database which has an associated machine-readable thesaurus or classification system (e.g., ERIC and PSYCINFO). It is also useful in multilingual database applications where a multilingual dictionary could be used by SPIRES to transform a search term into an OR'd set of corresponding search terms for each language. For example, a "FIND SUBJECT SOCIAL SCIENCES" search in the MICROLOG (Canadian Research and Report Literature) database would also retrieve all of the french records with the term "SCIENCES SOCIALES." 4.0 Data Output SPIRES data output, as with indexing and searching, has associated with it a range of actions which enable you to transform the data as per your requirements. SPIRES provides an almost unlimited variety of ways to output your data, including formatting reports with statistical calculations. Within the SPIRES FOLIO environment, the patron simply specifies the type of output by including a "format name" following the DISPLAY command. + Page 56 + SPIRES formats can do much more than simply provide brief, full, or MARC output. If the patron's workstation on a network can accommodate the display of diacritics, the user can specify a format which includes these characters. A format can also look up and display information from a database other than the one being searched. This ability provides the framework for linking journal holdings information to commercial databases. As part of displaying a citation, the format looks up the journal title, ISSN, or other key in a file containing a list of journals held by the library and adds a holdings status message. The SPIRES SAVE command allows you to write the formatted results of a search to a file. The SAVE command enables a patron to search a numeric database (e.g., COMPUSTAT) and output the data for input to a statistical package. Similarly, it allows users of a full-text database to output a true reproduction of an article, in contrast to obtaining a copy of the article using the screen dump procedure. Finally, it can be used to output bibliographic records for input to a micro-based DBMS. 5.0 Conclusion The SPIRES DBMS has served librarians for over a decade. It is now used primarily to create local databases and to mount commercial ones. Because of SPIRES ability to handle MARC records, institutions like Rensselaer Polytechnic Institute and Memorial University of Newfoundland are developing fully functional integrated library systems with linkages to commercial databases. SPIRES functionality and versatility as illustrated in this article insure that SPIRES will continue to meet the evolving needs of the library community. + Page 57 + About the Author Slavko Manojlovich Assistant to the University Librarian for Systems and Planning Memorial University of Newfoundland St. John's, Newfoundland A1B 3Y1 Canada BITNET Address: SLAVKO@KEAN.UCS.MUN.CA ----------------------------------------------------------------- The Public-Access Computer Systems Review is an electronic journal. It is sent free of charge to participants of the Public-Access Computer Systems Forum (PACS-L), a computer conference on BITNET. To join PACS-L, send an electronic mail message to LISTSERV@UHUPVM1 that says: SUBSCRIBE PACS-L First Name Last Name. This article is Copyright (C) 1990 by Slavko Manojlovich. All Rights Reserved. The Public-Access Computer Systems Review is Copyright (C) 1990 by the University Libraries, University of Houston. All Rights Reserved. Copying is permitted for noncommercial use by computer conferences, individual scholars, and libraries. Libraries are authorized to add the journal to their collection, in electronic or printed form, at no charge. This message must appear on all copied material. All commercial use requires permission. ----------------------------------------------------------------