My "Computing Science" column
Scientist, "The Bootstrap",
available for your reading pleasure. Hopefully, this will assuage your
curiosity about how to use the same data set not just to fit a statistical
model but also to say how much uncertainty there is in the fit. (Hence
my recent musings about the cost of
bootstrapping.) And then the rest of
issue looks pretty good, too.
I have been reading American Scientist since I started graduate
school, lo these many years ago, and throughout that time one of the highlights
for me has been the "Computing Science" column
by Brian Hayes; it was quite thrilling to
be asked about being one of the substitutes while he's on sabbatical, and I
hope I've come close to his standard.
After-notes to the column itself:
- Efron's original paper is now open access.
- Of course, the time series is serially dependent, so I should really use a
bootstrap which handles that, as
Using either a moving block bootstrap or stationary bootstrap actually gave
almost the same confidence bands as the one in the article, obtained by
resampling consecutive pairs (perhaps because the optimal block length,
was just 4). The original version of the column
went into that, but it had to be cut to fit the space.
- A bigger issue is that the data set is really not stationary. Like
everyone else, I pretend.
- Originally, I wanted to
use turbulent flow time series for
the example, since it turns out that they are
actually pretty well predicted by
linear models, once you allow for very non-Gaussian driving
noises. But I couldn't find any suitable data sets which wouldn't involve some
tricky work to get permissions; perhaps I just didn't look in the right places.
So I fell back on something which is publicly available, even though it's from
a domain which if anything has gotten too much attention from statisticians and
probabilists, and required some disclaimers.
Enigmas of Chance;
Posted by crshalizi at April 19, 2010 08:45 | permanent link