Designer's choice (was "Are "good enough" standards ok?")

From: Andrews, Mark J. <MarkAndrews_at_nyob> Date: Wed, 21 Jun 2006 09:20:38 -0500 To: NGC4LIB_at_listserv.nd.edu

> I also pointed out, without detail, that the trivial data
> normalization I applied might not be the best relational database
> architecture.  Simply it will not scale.  This doesn't mean, as you
> suggest, that a relational database is an inappropriate standard to
> use.  For example, OCLC's WorldCat uses the Oracle relational database
which comprises 60+ million records.
> Proper database architecture can go a long way...

But do you know how much of its inner workings are RDB and how much is
proprietary code piled on top of it?
The Pica software, dominant in Europe, also uses an RDBS, but not for
searching. It just uses the engine for storing records as long strings,
no attempt at tabulation of bib data. All manipulation is done by
non-SQL code, and for searching, there is some extra software
altogether. So in that case, to say they are using SQL doesn't mean a
lot.

It needs to be made clear, IOW, for what kind of tasks SQL is to be
preferred and for which it is inappropriate. As you indicate, some
operations just won't scale, most notably joins.
It needs to be shown, in yet other words, for what tasks we may not now
find appropriate standard software or off-the-shelf tools.
---
Everyone,

This is a geeky, gearhead topic (see the sample conversation above).  We
all know that the choice of system architecture, the potential
extensibility of that architecture, and platform (including design,
testing, documentation, coding, programming language(s), nationalization
support, operating system, hardware, and database manager, to name a few
things) are all "designer's choice."  I have lived through more argument
over this stuff on the vendor side than I care to remember personally or
relate publicly.  Lets remember, we build stuff 'cause it solves a
problem, right?  Helping people find stuff?  HELLO?!

A few examples will suffice:

   CLSI's LIBS 100 Plus - A bazillion lines of DEC PDP-11 Macro
Assembler running under a Forth or MUMPS-like monitor (NOT a full-blown
operating system) called FLIRT - File Language in Real Time, and a
proprietary file system they called the "Information Processing
Facility."  This was the system to beat in the late 70s and early 80s,
and there were 300 customers on this thing.  The whole shootin' match
was later scrapped in place of a UNIX-based product and the PROGRESS
programming language and RDBMS, but too late to save CLSI, which was
purchased by GEAC.

   INLEX/3000 - A half-million lines of HP PASCAL that used a
combination of HP's TurboIMAGE network database management system and a
proprietary file system for MARC records.  The product had a great
looking online catalog that featured "progressive disclosure" of
information to users and great context-sensitive help.  However, the
product was foundationally network unaware.  The company could never
deliver a working serials control system.  Acquisitions had what I'll
politely call "arithmetic problems" (I know because I, and others, had
to fix the damn thing) and the voice notification system was worthy of
Mordor itself.  Eventually purchased by DRA.

   Columbia Library System - Another half-million lines of Microsoft
Pascal, and some C and Intel x86 Macro-Assembler, and it was DOS-based.
Very popular in itty-bitty schools and ran on any DOS compatible
network.  It used a network database manager by Peter Gulutzan called
Ocelot.  There was a follow-on product called Ocelet2 - The SQL, but CLS
never used it.  It was a pretty good non-MARC system, but MARC was never
cleanly implemented.  It had Acq and Serials modules, but heh, Orcs,
Goblins and Morder again - they were unusable, but sold and used
nonetheless.  The trick here was the follow-on system for Windows never
materialized, because the library product was yoked to a larger
development effort to create a common database platform for school
administration, testing and libraries.  It took 4+ years and $20 million
dollars to decide on a RDBMS, leaving only 6 months to design, test,
document, code and deliver a half-dozen, interrelated, enterprise
products for schools.  Guess what, it didn't work.  CASPR was left to
pick up the pieces.

All these products and their companies are gone, gone, gone.  I know
about them because I lived with these things for close to 10 years, and
this is just the tip of the iceberg.  I did the first 10 migrations from
INLEX/3000 and other systems to Taos; out of respect for the many, many
friends I have who pushed that rock up hill, I'll say "'nuff said."

Aside from their obvious (and not so obvious) flaws of these products,
they all allowed people to find stuff in libraries.  They all checked
things in and out.  They all had some semblance of reports.  And all
were, eventually, cast aside.  My point is we can't a) discuss the next
generation catalog in a vacuum.  Who does it serve?  How long will it
live?  Is it worth expanding or should we just re-write new products
from scrap every 5 years either because a) we can't deliver working
product on time or b) the world has change so much that no amount of
wishful thinking will expand a product beyond its original design scope?
You could call a product "Circus Monkey 3000" and make it out of cans
and string, as long as you can deliver attractive, working product, that
demonstrably solves a problem, that people can actually USE.  I am
becoming less and less interested in the insides of this stuff.  Having
seen the sausage made, I find I don't want any more.

Lets keep the focus on user needs in changing markets, where libraries
are reacting to technological change just like we, as individuals, do.
Tired of reacting?  Then we have to PAY to be innovators and pay keen
attention to the real drivers to technological change in our industry
and the world-over.

Mark
-----------------
Mark Andrews, MLS
Systems Librarian
DoIT Academic and eLearning Technologies
L 32 Reinert Memorial Alumni Library
402.280.3065
mja30807_at_creighton.edu
AIM: mja30807
-----------------

> I also pointed out, without detail, that the trivial data
> normalization I applied might not be the best relational database
> architecture.  Simply it will not scale.  This doesn't mean, as you
> suggest, that a relational database is an inappropriate standard to
> use.  For example, OCLC's WorldCat uses the Oracle relational database
which comprises 60+ million records.
> Proper database architecture can go a long way...

But do you know how much of its inner workings are RDB and how much is
proprietary code piled on top of it?
The Pica software, dominant in Europe, also uses an RDBS, but not for
searching. It just uses the engine for storing records as long strings,
no attempt at tabulation of bib data. All manipulation is done by
non-SQL code, and for searching, there is some extra software
altogether. So in that case, to say they are using SQL doesn't mean a
lot.

It needs to be made clear, IOW, for what kind of tasks SQL is to be
preferred and for which it is inappropriate. As you indicate, some
operations just won't scale, most notably joins.
It needs to be shown, in yet other words, for what tasks we may not now
find appropriate standard software or off-the-shelf tools.