Re: information organization, systems, and terrorists

From: Joe Hourcle <oneiros_at_nyob> Date: Mon, 4 Jan 2010 10:15:28 -0500 To: NGC4LIB_at_LISTSERV.ND.EDU

On Sun, 3 Jan 2010, Miksa, Shawne wrote:

> It’s Sunday morning and I’m listening to CNN’s State of the Union while 
> working on a chapter of my book.  I’m hearing mostly about the thwarted 
> December 25th terrorist attack on the plane from Amsterdam to Detroit 
> and the breakdown in the intelligence community as it concerns making 
> the connections between information they had concerning the individual 
> terrorist, his connection to Yemen, how and why things were missed that 
> may have prevented him from being able to get on the plane in the first 
> place, etc.  At the same time I’m trying to write a chapter on our 
> library information systems, what they are, what are library catalogs 
> today, objectives, FRBR, etc.  All of this brings to mind the complexity 
> of an information system and the effect of that complexity on making 
> those connections, the reliance and over-reliance on the technology 
> (i.e., on commentator asked why the computer technology didn’t make the 
> connection), and on human ability to make the connections, and so on. 
> If we shift that complexity to the kinds of information systems we in 
> LIS create, populate, manipulate, maintain, and cross-connect to other 
> systems---it begs the question of how much complexity can we expect to 
> be able to maintain or work within?

I'm going to take a different approach to the problem, because the systems 
I design aren't your typical "library" systems.

The catalogs that I deal with fall into two broad catagories*.  They might 
go by other names in other science fields, but for solar physics:

 	data catalogs :
 		a record of the data collected and/or its processed forms

 	feature and event catalogs :
 		a science product, where a scientist (or software written
 		by a scientist) has identified something of interest

There is a *lot* of data.  For some missions, it's a few million 
observations, and there might be a few different forms of it on disk, so 
some scientists go to the event catalogs, pick out a region & time period 
of interest, and then do study on that part.

... but there are people who want us to merge all of the event & feature 
information back into the data catalogs.  Not even getting into the issues 
of data normalization, we still have a lot of questions:

 	1. Was this event found by the PI team from the instrument?
 		(if yes, well, the PI team can add it to 'their' data
 		catalog)
 	2. Was this event found using data from this instrument?
 		(if yes, they might have a legitimate argument)
 	3. Does the community agree about the methods used to identify
 		the event?
 		(eg, has it been peer-reviewed?)
 	4. Could the instrument have even seen the event?
 		(if it was a magnetic event, should it be listed in
 		telescope data ... well, yes, if someone wanted to follow
 		up and try to determine if there was evidence in other
 		instruments, but it could confuse people)

I was at a science team meeting, and we had one of the scientists ask us 
if our system could tell them if two events were the "same" -- and quite 
simply, it can't, because that's for the scientists to decide.

We can tell them that two events were seen at the same time, in the same 
place, but we won't tell that they are the same event.  Our system is 
there to make it easier for scientists to do their job, but not for us to 
replace the scientists, as there are so many edge cases and the science 
changes that it'd be impossible for us to deal with.

...

So, you're asking how this relates to intelligence ...

Say for instance that a scientist is analyzing data from two instruments, 
A and B ... They see a strange reading in A ... they cross-check the data 
in B, and find that B looks perfectly normal.

Two weeks later, they realize that the calibration on one of the 
instruments is wrong.  Suddenly, we might have the case where the event in 
A is scrubbed out, or we have the case where B confirms the findings of A.

Likewise, in intelligence, you might have cases where informants are found 
to be unreliable, and you have to go and re-assess all of the decisions 
made using that bad information.  This can have major cascading effects 
... but I don't know if any systems actually do this -- they answer the 
questions that are asked now, and so unless the question gets asked again, 
no one might ever know that the response has changed.

You would have to design a system that could respond to not only every 
question asked of it, but every time new information is added, to 
determine if it needs to re-process previous responses.  You'd run out of 
resources.

...

You then get people who want to run the analysis using alternate inputs. 
In the case of science, it might be that they want to use a different 
scientific model for ground water flow, or assume a different propogation 
model for coronal mass ejections.  In intelligence, I could see someone 
wanting to re-run analysis assuming that some accepted 'truth' didn't 
exist.

... but those aren't for the system-builders to decide; we can provide the 
tools, but the scientists / analysts / whomever have to decide to actually 
run the proper analysis.

...

As for why so much technology is brought in for intelligence.  Well, 
there's still a lot of humans involved**, but the cynic in me believes 
that it's easier for contractors to sell a pre-packaged technological 
"solution" than it is to convince some people to just hire more or more 
qualified people.

... which reminds me of an article the other day 
discussing Israel's approach to airport security :

 	http://www.thestar.com/iphone/news/world/article/744199---israelification-high-security-little-bother

... yes, there's technology in there (bomb proof areas, etc.), but the 
common sense apprach doesn't give someone an opportunity to sell more 
expensive bomb-sniffing gear, or full-body imagers, or whatever the 
security product of the week is.

-Joe

* We have other catalogs as well (eg, publications), but we don't 
currently link between publications and data, although we'd like to. 
There are proposals to require articles in Solar Physics (journal) to 
register an "event" so there'd be an unambiguous identifier to ... 
something, but that something has issues for long-lived, re-occurant 
features and those that might be at varying heights & depths, or that 
don't track with the normal solar rotation.

ps.  I have no experience in the intelligence community, but I have talked 
with some people at what used to be NGA about how they deal with the 'best 
size' of images to return to their analysts, and I've talked to some of 
the folks dealing with provenance tracking to deal with tracking when bad 
data was used to answer analysis questions, and with some people building 
large-scale triple stores to deal with tracking 'quality' of the 
information.

** Okay, I said I had no experience, but I did work summers in high school 
at an annex to the pentagon, and so was there twice a day with mail around 
the time of the first gulf war, and recently, I went to a retirement event 
at a secured facility where they wouldn't tell me specifically what they 
did, but based on the size of the building, there were a lot of people 
working there.  And unlike the pentagon, there was high security for the 
whole building.