Re: standards

From: Houghton,Andrew <houghtoa_at_nyob>
Date: Tue, 20 Jun 2006 11:19:00 -0400
To: NGC4LIB_at_listserv.nd.edu
> From: Next generation catalogs for libraries
> [mailto:NGC4LIB_at_LISTSERV.ND.EDU] On Behalf Of Bernhard Eversberg
> Sent: 20 June, 2006 07:27
> To: NGC4LIB_at_LISTSERV.ND.EDU
> Subject: Re: [NGC4LIB] standards
>
> Eric Lease Morgan schrieb:
> >
> > Store the data in a relational database using standard SQL.
>
> It wasn't designed for the type of
> applications we have on our hands: semi-structured text data.

I disagree with this statement.  Many modern relational databases,
such as SQL Server, Oracle, etc. support XML columns and full text
indexing on columns.  Those two features can be used to effectively
handle semi-structured data.

In addition, any current SQL database can deal with semi-structured
data without the use of XML columns or full text indexing through
the use of stored procedures and/or business logic to shred the
semi-structured data into a relational form.

We need to be careful whether we are talking about the application
of a relation database to semi-structured data or SQL, the language
used to retrieve information from a relational database.  SQL, the
language, has been revised several times and will probably continue
to evolve.  OPENXML in SQL is a good example of such an evolution.

> To mention only two things: SQL doesn't support index browsing.
> The thing called "index" in SQL is basically invisible, you
> cannot display it to enable browsing.

You cannot display a SQL index since it is an internal mechanism
that is used in relation to the data stored in the tables in your
database.  This doesn't mean you cannot support "index browsing".

> Further, SQL doesn't support structured fields. A field
> content cannot have subfields, or rather, SQL has no tool to
> do anything with them.

This is just not true.  Lets take the MARC standard.  Each field
has a one to many relationship with its subfields.  In a relational
database this *could be* modeled with two tables.  The field table
contains a primary key, tag, indicator 1, and indicator 2 columns.
The subfield table contains a primary key, a foreign key to the
associated field table, subfield code, and subfield content columns.

This may not be the best database architecture for modeling MARC in
a relation database, but it shows that structured fields can be
implemented in a relational database by applying appropriate data
normalization techniques.

> It also means you cannot, within SQL,
> create a keyword index (and display that for browsing as
> well). By "browsing", I mean you can jump in at any point and
> go down and up as far as you want.

Again, a relational database can contain keyword indexes either
through full text indexing a column or using appropriate business
logic and/or a stored procedures to shred the information and
store the data in a table.

> The more recent concept of the "object-oriented" database, on
> the other hand, is not embodied in any ready-to-use industry
> standard or open-source software packages.

Which brings us back around to Eric's remark about using standard
relational database technology.  Some relational database engines
have incorporated object-oriented features and there are a number
of fully object-oriented database engines that exist, but no standards
have evolved.  With modern relational databases incorporating XML
columns, *one could*, implement objects in a relational architecture.
Not saying that would be the best approach... but it might invoke
that 80-20 rule where using a relational database is good enough.


Andy.
Received on Tue Jun 20 2006 - 11:26:16 EDT