On Sun, 3 Jan 2010, Miksa, Shawne wrote:
> It’s Sunday morning and I’m listening to CNN’s State of the Union while
> working on a chapter of my book. I’m hearing mostly about the thwarted
> December 25th terrorist attack on the plane from Amsterdam to Detroit
> and the breakdown in the intelligence community as it concerns making
> the connections between information they had concerning the individual
> terrorist, his connection to Yemen, how and why things were missed that
> may have prevented him from being able to get on the plane in the first
> place, etc. At the same time I’m trying to write a chapter on our
> library information systems, what they are, what are library catalogs
> today, objectives, FRBR, etc. All of this brings to mind the complexity
> of an information system and the effect of that complexity on making
> those connections, the reliance and over-reliance on the technology
> (i.e., on commentator asked why the computer technology didn’t make the
> connection), and on human ability to make the connections, and so on.
> If we shift that complexity to the kinds of information systems we in
> LIS create, populate, manipulate, maintain, and cross-connect to other
> systems---it begs the question of how much complexity can we expect to
> be able to maintain or work within?
I'm going to take a different approach to the problem, because the systems
I design aren't your typical "library" systems.
The catalogs that I deal with fall into two broad catagories*. They might
go by other names in other science fields, but for solar physics:
data catalogs :
a record of the data collected and/or its processed forms
feature and event catalogs :
a science product, where a scientist (or software written
by a scientist) has identified something of interest
There is a *lot* of data. For some missions, it's a few million
observations, and there might be a few different forms of it on disk, so
some scientists go to the event catalogs, pick out a region & time period
of interest, and then do study on that part.
... but there are people who want us to merge all of the event & feature
information back into the data catalogs. Not even getting into the issues
of data normalization, we still have a lot of questions:
1. Was this event found by the PI team from the instrument?
(if yes, well, the PI team can add it to 'their' data
catalog)
2. Was this event found using data from this instrument?
(if yes, they might have a legitimate argument)
3. Does the community agree about the methods used to identify
the event?
(eg, has it been peer-reviewed?)
4. Could the instrument have even seen the event?
(if it was a magnetic event, should it be listed in
telescope data ... well, yes, if someone wanted to follow
up and try to determine if there was evidence in other
instruments, but it could confuse people)
I was at a science team meeting, and we had one of the scientists ask us
if our system could tell them if two events were the "same" -- and quite
simply, it can't, because that's for the scientists to decide.
We can tell them that two events were seen at the same time, in the same
place, but we won't tell that they are the same event. Our system is
there to make it easier for scientists to do their job, but not for us to
replace the scientists, as there are so many edge cases and the science
changes that it'd be impossible for us to deal with.
...
So, you're asking how this relates to intelligence ...
Say for instance that a scientist is analyzing data from two instruments,
A and B ... They see a strange reading in A ... they cross-check the data
in B, and find that B looks perfectly normal.
Two weeks later, they realize that the calibration on one of the
instruments is wrong. Suddenly, we might have the case where the event in
A is scrubbed out, or we have the case where B confirms the findings of A.
Likewise, in intelligence, you might have cases where informants are found
to be unreliable, and you have to go and re-assess all of the decisions
made using that bad information. This can have major cascading effects
... but I don't know if any systems actually do this -- they answer the
questions that are asked now, and so unless the question gets asked again,
no one might ever know that the response has changed.
You would have to design a system that could respond to not only every
question asked of it, but every time new information is added, to
determine if it needs to re-process previous responses. You'd run out of
resources.
...
You then get people who want to run the analysis using alternate inputs.
In the case of science, it might be that they want to use a different
scientific model for ground water flow, or assume a different propogation
model for coronal mass ejections. In intelligence, I could see someone
wanting to re-run analysis assuming that some accepted 'truth' didn't
exist.
... but those aren't for the system-builders to decide; we can provide the
tools, but the scientists / analysts / whomever have to decide to actually
run the proper analysis.
...
As for why so much technology is brought in for intelligence. Well,
there's still a lot of humans involved**, but the cynic in me believes
that it's easier for contractors to sell a pre-packaged technological
"solution" than it is to convince some people to just hire more or more
qualified people.
... which reminds me of an article the other day
discussing Israel's approach to airport security :
http://www.thestar.com/iphone/news/world/article/744199---israelification-high-security-little-bother
... yes, there's technology in there (bomb proof areas, etc.), but the
common sense apprach doesn't give someone an opportunity to sell more
expensive bomb-sniffing gear, or full-body imagers, or whatever the
security product of the week is.
-Joe
* We have other catalogs as well (eg, publications), but we don't
currently link between publications and data, although we'd like to.
There are proposals to require articles in Solar Physics (journal) to
register an "event" so there'd be an unambiguous identifier to ...
something, but that something has issues for long-lived, re-occurant
features and those that might be at varying heights & depths, or that
don't track with the normal solar rotation.
ps. I have no experience in the intelligence community, but I have talked
with some people at what used to be NGA about how they deal with the 'best
size' of images to return to their analysts, and I've talked to some of
the folks dealing with provenance tracking to deal with tracking when bad
data was used to answer analysis questions, and with some people building
large-scale triple stores to deal with tracking 'quality' of the
information.
** Okay, I said I had no experience, but I did work summers in high school
at an annex to the pentagon, and so was there twice a day with mail around
the time of the first gulf war, and recently, I went to a retirement event
at a secured facility where they wouldn't tell me specifically what they
did, but based on the size of the building, there were a lot of people
working there. And unlike the pentagon, there was high security for the
whole building.
Received on Mon Jan 04 2010 - 10:16:02 EST