Re: Cablegate from Wikileaks: a case study

From: Bernhard Eversberg <ev_at_nyob>
Date: Wed, 8 Dec 2010 07:57:18 +0100
To: NGC4LIB_at_LISTSERV.ND.EDU
Am 07.12.2010 18:09, schrieb Jonathan Rochkind:
>
> Are you suggesting that Google's code is not an 'algorithm'?
It depends on how you define the term. Purists would reject it,
and I'll try to explain why, below.

> That
> doens't make any sense. ALL software yields predictable, reproducible
> results -- in a theoretical sense. Including Google.
>
In a theoretical sense, yes. But in reality, as soon as you can never
find enough time to do the reproduction, and this is the case with
Google's, this theoretical possibility is of no use.

> Some algorithms are so complicated that it may be hard for humans to
> predict exactly what they'll do in a specific given case.
But the distinction between deterministic algorithms (da) and
nondeterministic ones has nothing to do with their complexity. A "da"
can be hideously complex, an "nda" can be very simple. It suffices
that an external variable, say the time, enters into the algorithm to
make it non-deterministic. (And here's the point where purists would
deny it the name of "algorithm" because they hold it should be a
completely self-contained sequence of steps.) In Google's case, not
just one variable enters into the algorithmic processing (any sequence
of steps within the process is of course algorithmic!) but the content
of an entire database which itself is nonpredictable since it is
highly dynamic. That database contains statistical evaluations of
billions of queries and choices made by users. Only if you had a
snapshot copy of this database at a specific moment in time, you could,
given a lot of time, reconstruct the result of one specific query
issued at exactly that moment. But the next moment, the result of
that same query may be different because of intermediate changes
in the database. Again, every sequence of steps in the process is
deterministic, but the external variables that enter it are volatile.
In addition, of course, there are many settings of internal variables
made all the time, at no predictable points, by humans. But these change
the deterministic parts of the algorithm because they then are not
external variables.

One may call that query evaluation database in Google an automated
opinion poll with results updated all the time. A query result in one
particular moment may thus be influenced by an average opinion, but it
cannot be called an opinion itself, it is a very artificial construct
influenced by many decisions you cannot reconstruct in practice.

>
> If you insist only on simple algorithms, you will get only simple software.
>
That's far from what I said. Again, it is definitely not about simple
or complex, and I hope to have made it clearer why.

B.Eversberg
Received on Wed Dec 08 2010 - 01:59:34 EST