William Dembski and the Discovery Institute, Renewing Science and Culture by Re-Inventing the Wheel
Pete Dunkelberg wrote to tell me that William Dembski, senior fellow at the
Discovery Institute, the Mathematical Great White Hope of the "Intelligent
Design" school of creationism, had a new pre-print out on information theory.
So, for my sins, I downloaded it.
- William A. Dembski, "Information as a Measue of Variation" [PDF
link]
- Abstract: Within information theory, information typically
measures the reduction of uncertainty that results from the knowledge that an
event has occurred. But what if the item of knowledge learned is not the
occurrence of an event but, rather, the change in probability distribution
associated with an ensemble of events? This paper takes the usual account of
information, which focuses on events, and generalizes it to probability
distributions/measures. In so doing, it encourages the assignment of
"generalized bits" to arbitrary state transitions of physical systems. In
particular, it provides a theoretical framework for characterizing the
informational continuity of evolving systems and for rigorously assessing the
degree to which such systems exhibit, or fail to exhibit, continuous
change.
Having now read this production in both the original (7 July 2004) and
lightly revised (23 July 2004) version, my considered judgment is the same as
my first reaction: Sweet suffering Jesus.
First, two points for style, and then the substance.
- Ordinary information theory has a perfectly good way of measuring the
amount by which we learn from a changing in the distribution over an ensemble,
called the Kullback-Leibler divergence, or the relative entropy, or simply the
information gain. Dembski ought to know this, because what he talks about as
the "reduction in uncertainty that results from the knowledge that an event has
occurred" is the information gain in going from the unconditional distribution,
to the distribution conditional on the event. Since he's read Cover
and Thomas's standard textbook on information theory, and this is made
perfectly clear in chapter 2, this should not be an issue.
- Similarly, physicists and dynamical systems theorists have long had
absolutely no problem with looking at the informational properties of quite
arbitrary dynamics. Dembski is supposedly a mathematician, so I can understand
if he finds books like Complexity,
Entropy and the Physics of Information, or journals like Open Systems and
Information Dynamics, insufficiently rigorous. (He'd be wrong, but
that's another story.) But he might have thought to look around the math
library and turn up Patrick Billingsley's wonderful 1965 book on Ergodic
Theory and Information, or some back issues of
Ergodic
Theory and Dynamical Systems and Journal of
Statistical Physics. (Incidentally, Dembski's "generalized bits"
are just bits.)
- The mathematical core of the paper, such as it is, is the definition of an
information measure, which Dembski calls the "variational information", and
whose defining formula is as follows (I've slightly modified his notation,
replacing Greek letters with Roman):
where P is the old or reference measure, and Q the new
measure we get after some change, assumed to be absolutely continuous with
respect to P, so that the Radon-Nikodym
derivative dQ/dP is well-defined. If P is the
ordinary, uniform or Lesbegue measure, then dQ/dP is just the
probability density of Q, usually written
as q(x).
Now, this is a perfectly respectable generalization of the regular Shannon
information, and in fact one with many interesting properties; it will prove
very useful in connection with coding theory, hypothesis testing, and the study
of dynamical systems. I can say this with complete confidence because this
functional is in fact one of the Rényi informations, introduced by Alfred
Rényi in a famous 1960 paper, "On Measures of Entropy and
Information", in Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability, vol. I, pp. 547--561. (Was
Dembski even
born in 1960?) In Dembski's notation, the Rényi information of
order a, for non-negative real a is
which approaches the Shannon information in the limit as a goes to 1.
Dembski's "variational information" is clearly just the special case a
= 2. Dembski correctly derives some of the more basic properties of this
quantity, which Rényi established for arbitrary a in his
original paper. There does not seem to be any new mathematics in this section
whatsoever. (Compare this part of his paper with, e.g., Imre Varga and
János Pipek, "Rényi entropies characterizing the shape and the
extension of the phase space representation of quantum wave functions in
disordered systems", Physical Review E 68 (2003):
026202 [link].)
One of the best reasons to study these information measures goes roughly as
follows. In 1953, the great Soviet probabilist A. I. Khinchin
published a list of four reasonable-looking axioms for a measure of
information, and proved that the Shannon information was the unique functional
satisfying the axioms (up to an over-all multiplicative constant). (I) The
information is a functional of the probability distribution (and not of other
properties of the ensemble). (II) The information is maximal for the
distribution where all events are equally probable. (III) The information is
unchanged by enlarging the probability space with events of zero probability.
The trickiest one is (IV) If the probability space is divided into two
sub-spaces, A and B, the total information is equal to the information content
of the marginal distribution of one sub-space, plus the mean information of the
conditional distribution of the other sub-space: I(A,B) = I(A) + E[I(B|A)].
(The paper is re-printed in his book on
Mathematical
Foundations of Information Theory.) If we relax axiom (IV) to
require only that I(A,B) = I(A) + I(B) when A and B are statistically
independent, then we get a continuous family of solutions, namely the
Rényi informations. This, along with their many applications, has lead
to a great deal of attention being paid to the Rényi in the
information-theory literature. A quite crude search of the abstracts
of IEEE Transactions on Information Theory reveals an average of
at least five papers a year over the last ten years. It's even introduced,
though briefly, in Cover and Thomas's textbook (p. 499). Of particular note is
the well-established use of Rényi information in establishing results on
the error rates of hypothesis tests, a problem on which Dembski, notoriously,
claims to be an expert. (The locus classicus here is Imre
Csiszár, "Generalized cutoff rates and Rényi's information
measures", IEEE Transactions on Information
Theory 41 (1995): 26--34.) In nonlinear dynamics and
statistical physics, the Rényi informations play crucial roles in the
so-called "thermodynamic formalism", one of the essential tools of the rigorous
study of complex systems. See, in particular, the excellent and standard book
by Remo Badii and Antonio Politi, Complexity: Hierarchical Structures and
Scaling in Physics (reviewed here). Naturally
enough, Dembski also claims to be an expert on the measurement of
complexity.
- The so-called "continuity spectrum" seems to be nothing more than a
confused (and admittedly conjectural) grope towards the idea of distance and
divergence measures on manifolds of probability distributions, a topic
well-explored in information geometry,
which have perfectly respectable quantum versions (see chapter 7 of Amari and
Nagaoka's Methods
of Information Geometry, or this paper by R. F. Streater), without any of
the weirdness that Dembski conjectures. (Dembski's discussion of quantum
dynamics in any case is very confused; I can best rationalize it by supposing
he thinks of quantum time evolution as something like a combination of
classical diffusion and cadlag processes, with the cadlag jump-points
representing moments of wave-function collapse. This would bad be pretty bad
physics, but in any case the notion of "collapse of the wave function" is
very dubious. More modern treatments of quantum mechanics seem to manage to
eliminate it in favor of continuous processes of decoherence, as described in,
e.g., D. Giulini et al., Decoherence
and the Appearance of a Classical World in Quantum Theory, or at a
popular level, David Lindley's Where
Does the Weirdness Go?.)
Dembski's paper seriously mis-represents the nature and use of information
theory in a wide range of fields. What he puts forward as a new construction
is in fact a particular case of a far more general idea, which was published
forty-four years ago. That construction is extremely well-known and widely
used in a number of fields in which Dembski purports to be an expert, namely
information theory, hypothesis testing and the measurement of complexity. The
manuscript contains exactly no new mathematics. Such is the work of a man described on one of
his book jackets as "the Isaac Newton of information theory". His home page says this is the first in
a seven-part series on the "mathematical foundations of intelligent design"; I
can't wait. Or rather, I can.
Complexity;
Enigmas
of Chance;
Creationism
Posted by crshalizi at August 10, 2004 16:45 | permanent link