Re: Library Technologies and Library School (was Commercial Vendors and Open Source Software)

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Mon, 29 Sep 2008 16:39:03 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

I said (more than 6 days ago; what's going on here? I wrote and sent it
6 days ago, and I got a message yesterday it got approved?! Am I
on some kind of probation? Moderation?) ;

>> Where are the experts in cloud computing, in clustering, in
>> large-scale meta data management, in reducio indexing, in smart
>> spidering? Where are the geeks who enjoy semantic data modelling
>> across silos? Or the ones who knows all there is to know about digital
>> identity management? Things that's actually damn, seriouslly
>> you'll-all-go-down-in-flames-without-it technologies? Where is it?
>> Where's the direction you need to make to get to it? Where's your
>> passionate people who understands all this and wants to see it
>> through?
>
> While this is all very good, I would like to point out that it is still all experimental.

No, it is not. Now, I understand throwing these statements out there
perhaps could benefit some clarification, so, by the numbers ;

1. Cloud computing is just a new "Web 2.0" name for what many in the
industry have been doing for the last 10-15 years (and no, 10 years is
not recently; that's *ancient* in Internet-time :), with a shift from
local to global (and networked) maintaining and extending our
applications (and even changing the definition of what an application
is) spaces. In my previous industry (non-librarian, obviously :) we
we're doing distributed load and application state globally with
CORBA, and that's, um, about 15 years ago? It is the very problem
libraries today struggle with in terms of sharing and extending their
meta data (including the processes for distribution and
quality-assurance). Those who know me will struggle to accept I just
wrote this, but doing Z39.50 was exactly what you should have done; it
was the right thing to do, but - and here's the big but you know is
coming! - then rely on the 1.0 version of it and to let it stagnate
into this bizarre old legacy library-world-only technology is the real
crime. You could have taken the best of Z39.50 and extended it and
adopted it for the future, have outside adoption in the process, and
would today perhaps be a real player in distributed search and meta
data retrieval systems, but alas you're not. You're the opposite.

2. Clustering. You're relying on your vendors too much. Clustering is
one of those things that's been around for ages, librarians know
nothing about, and doesn't have anything to do with what you learn at
library school. It's technology and techniques that are paramount to
grow an infrastructure (mostly plain network traffic) organically
instead of in leaps and bounds dictated by bolting a new ILS server at
the end somewhere. Shouldn't there be clusters (and they've been
around for about 20 years now) of servers making sure important
systems stay up, share the load and make computing cheaper, and to
grow and shrink capacity on the cheap as our needs changes? Should
library systems be able to utilize this technology? Shouldn't every
larger library who's got a lot of meta data creation invest in
infrastructures that support the future? And shouldn't technologists
inside the library be experts at building alternative networks as our
needs changes? Where are the cluster-geeks? Where are the
virtualization experts that makes clustering affordable and a lot
simpler?

3. Large-scale meta data management. You should already be kings in
this area, but you're not. The good; MARC. The bad; still keeping MARC
around. MARC was designed to be a simple container for meta data,
especially in the bibliographic area. And you did well; it was a good
starting point, but the *second* someone out there thought that
creating their own version of MARC was a good idea some bells should
have gone off. Again you started something really good and let it slip
into maintainability-hell and legacy obscurity. MARC could have
evolved into the defacto meta data format for the worlds needs, but
instead it turned into the unmaintainable records with a
cult-following it is today. Also, people who care about these issues
on a *large* scale knows complex records is the wrong way. We've known
this since the 80's, yet that was probably the time that MARC as it is
today was manifested and became the library standard of the library
world. Reason? Your MARC standard didn't have processes for change.

4. Reducio indexing. Sure, mapReduce is a fairly new and hip way to
deal with a very old problem, but that doesn't hide the problem
itself. Most techie librarians doesn't even know too much about
recursive binary b-trees in functional spaces (they leave all that
voodoo to the RDBMS vendors), little less how to pull together a
cross-join of indexes with using nothing but pointer magic. About 20
years ago I worked with a company that were doing something which was
fancy at the time, where they could do cross-joins over 10 indexes of
over 10 Mb each in less than 5 seconds. Those were the days, but it
wasn't in the library world; it was military. But I did play in the
community that was into these kind of large indexing. Today Lucene is
all the rage, even in the library world, and that's good, but there
isn't a lot of geeky talk about the benefits of anonymous indexes as
middle-tiers to multi-gigabyte indexes and so on. Where are the
indexing geeks? The bibliographic space is ripe with indexing problems
that no one else is going to care about, even less your vendors.

5. Smart spidering. Who knows what to spider? We can't spider it all,
so someone needs to know better what to gobble up and what to ignore.
Isn't this what librarians already claim they're good at? You need a
lot better tooling than the current manual processes. is there an
expert-group that hacks up models, geeks who share a passion for
intelligent rules-based spidering? No, it's the usual rules-based
Lucene-based fare, making it a bad Google.

6. Semantic data modeling across silos. This is what the library world
tries to do, yet no one seems to be an expert in how to do it. The
mind boggles. Every effort in the library world in this area is the
odd prototype, the dead mailing-lists, the few initiaties around that
dies a slow death in the lack of understanding in the community at
large. And this technology (and even this type of thinking) has been
around for more than 30 years. 30 years! It is a field that librarians
*should* be the absolute best in, yet they're no where to be seen
(except that before-mentioned odd project or prototype). Even
old-timers understand the importance of flexible models. heck, the
library world's combined information knowledge *is* a model of working
with the meta data, yet there's no experts in data modeling. I can't
understand this one.

7. digital identity management. Hoo-boy, this one dips its toes into
pretty much anything and everything the library world tries to do.
Your best efforts so far? Authority records. Wrapped in MARC. Oh dear,
not even sure where to begin with this one. Maybe I just shouldn't, as
it is so depressing. Right now the best ID control you've got is ISBN
numbers, and we know how perfect they are. 'Nough said.

It's not my point here to disagree (much) with you, but none of the
things that (I feel) are essential to library world survival is not
experimental, new, fancy and on-the-edge. It's stuff that keeps
evolving. I've talked at lengths here before about Artificial
Intelligence systems which are something very different from what
people think (or used to think) they are. We're entering an age where
your problems are beyond human capacity to solve, and you *must*
utilize what you can from the technology to help you. Now, I'm of
course not saying technology is everything; in fact, those who know me
know that I speak warmly and passionately about librarians (especially
catalogers, bless their black cold hearts ...:), their culture, their
knowledge and the meeting-point between them and technology to save
humanity from itself. The world needs people who understand both
things; humanity *and* technology, and those people best suited to do
this is you guys, which is why I'm so sad to see the library lag
behind and leaving the meta data tasks of the future in the hands of
big corporations.

> We still don't know which way to go, and it may turn out that
> the solution will be some method discovered six months from now.

The answer is to sit around and wait for others do do some magic for you?

> There is nothing surprising about this since we are in a time of major
> changes and attempts must be made to decide what works and
> what does not work.

No, no, no! There is no time that sits still. The world is always
evolving, always changing. Well, except perhaps inside the library
walls, but as you point out, it can and does happen within. In leaps
and bounds. Perhaps that's the problem, though; leaps and bounds does
not cultivate a future direction as much as making you reliant on
sneaker-shoes and flame-retardant underwear.

> Library administrators facing tight budgets can find it very difficult
> to justify experimentation which automatically means that there can
> be failure and resultant "waste" (at least in a strictly budgetary sense).

Lots of people talk about how this could be fixed by consortium
thinking. Come on, you guys *love* this kind of thinking. Why can't
each big library around the world dedicate a resource or two, send
everyone to a nice but cheap place for a year or so, and solve some
big problems? Why can't libraries fix some of the big problems,
shuffle them out of they way to make room for library geeks to
specialize in high-tech? I know the low-budget world is a tough one to
create stimulation for geeks in, especially when the jobs there are so
utterly painfully boring, but the library world is so full of passion
for what's right and making a difference to humanity that if you
create a place for that sort of humane / intellectual stimulation, you
could be on par with universities in attracting people who don't only
care about the big bucks. I mean, c'mon, who becomes a library for the
pay? And when the geek-track in the library sector is boring to boot,
where do you think the good geeks will go?

> I would also like to point out that individual libraries
> have normally left these affairs up to the vendors since
> open-source solutions are relatively new.

I've been an open-source programmer for 15 years (no kidding). It's
not new in the outside world, even if traction in this area in the
library world is building. You're just late to the party, that's all.

Anyways, we agree more than disagree. We share the goal and the
passion. Give us executive power now! :)

(...which reminds me that executive power in the library world is only
given to people who have proven themselves to be very, very careful,
very patient, and good at not rocking the boat. Good luck with all
that. :)

Alex
-- 
---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------