From the all-too-small Department of Unambiguously Good Things Happening to People Who Thoroughly Deserve Them, Judea Pearl has won the Turing Prize for 2011. As a long-time admirer*, I could not be more pleased, and would like to take this opportunity to recommend his "Causal Inference in Statistics" again.

I realize it edges into "I liked Feynman before he joined the Manhattan Project; the Williamsburg Project was edgier" territory, but I have very vivid memories of reading Probabilistic Reasoning in Intelligent Systems in the winter months of early 1999, and being correspondingly excited to hear that the first edition of Causality was coming out...

Posted by crshalizi at March 30, 2012 15:30 | permanent link

Attention conservation notice: Only of interest if you (1) care about high-dimensional statistics and (2) will be in Pittsburgh over the next two weeks.

I am not sure how our distinguished speakers would feel at being called sorcerers, but since one of them is using sparsity to read minds, and the other to infer causation from correlation, it is hard to think of a more appropriate word.

- Bin Yu, "Sparse Modeling: Unified Theory and Movie Reconstruction Based on Brain Signal"
*Abstract*: Information technology has enabled the collection of massive amounts of data in science, engineering, social science, finance and beyond. Statistics is the science of data and indispensable for extracting useful information from high-dimensional data. After broad successes of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity is used as its proxy. With the virtues of both regularization and sparsity, L1 penalized Least Squares (e.g. Lasso) has been intensively studied by researchers from statistics, applied mathematics, and signal processing. Lasso is a special case of sparse modeling and has also been the focus on compressive sensing lately.- In this talk, I would like to cover both theory and practice of Lasso and its extensions. First, I will present an insightful unified analysis of M-estimation with decomposable penalties under sparse high dimensional statistical models. Second, I will present collaborative research with the Gallant Neuroscience Lab at Berkeley on understanding human visual pathway. In particular, I will show how we use non-linear sparse models (SPAM) to improve encoding and decoding results for the visual cortex area V1, and I will explain how Lasso and ridge methods enter our movie reconstruction algorithm from fMRI brain signals (dubbed by TIME Magazine as "mind-reading computers" and selected as one of its 50 Best Inventions of 2011).
*Time and place*: 4--5 pm on Monday, 2 April 2012, in Scaife Hall 125- Peter Bühlmann, "Predicting Causal Effects in High-Dimensional Settings"
*Abstract*: Understanding cause-effect relationships between variables is of great interest in many fields of science. An ambitious but highly desirable goal is to infer causal effects from observational data obtained by observing a system of interest without subjecting it to interventions. This would allow to circumvent severe experimental constraints or to substantially lower experimental costs. Our main motivation to study this goal comes from applications in biology.- We present recent progress for prediction of causal effects with direct
implications on designing new intervention experiments, particularly for
high-dimensional, sparse settings with thousands of variables but based on only
a few dozens of observations. We highlight exciting possibilities and
fundamental limitations. In view of the latter, statistical modeling needs to
be complemented with experimental validations: we discuss this in the context
of molecular biology for yeast (
*Saccharomyces cerevisiae*) and the model plant*Arabidopsis thaliana*. *Time and place*: 4--5 pm on Wednesday, 11 April 2012, in Scaife Hall 125

As always, the talks are free and open to the public; hecklers will, however, be turned into newts.

Posted by crshalizi at March 29, 2012 13:10 | permanent link

Attention conservation notice: Only of interest if you (1) care about statistical models of networks or collective information-processing, and (2) will be in Pittsburgh this week.

I am behind in posting my talk announcements:

- Andrew Thomas, "Marginal-Additive Models and Processes for Network-Correlated Outcomes"
*Abstract*: A key promise of social networks is the ability to detect and model the correlation of personal attributes along the structure of the network, in either static or dynamic settings. The basis for most of these models, the Markov Random Field on a lattice, has several assumptions that may not be reflected in real network data, namely the assumptions that the process is stationary on the lattice, and that the ties in the model are correctly specified. Additionally, it is less than clear how correlation over longer distances on networks can be adequately specified under the lattice mechanism, given the assumption of a stationary process at work.- Based on concepts from generalized additive models and spatial/geostatistical methods, I introduce a class of models that is more robust to the failure of these assumptions, more flexible to different definitions of network distance, and more generally applicable to large-scale studies of network phenomena. I apply this method to outcomes from two large-scale social network studies to demonstrate its use and versatility.
*Time and place*: 4--5 pm on Monday, 26 March 2012, in Scaife Hall 125- Sewoong Oh, "Learning from the Wisdom of the Crowd: Efficient Algorithms and Fundamental Limits"
*Abstract*: This talk is on designing extremely efficient and provably order-optimal algorithms to extract meaningful information from societal data, the kind of data that comes from crowdsourcing platforms like Amazon Mechanical Turk, or recommendation systems like the Netflix Challenge dataset. Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms where large-scale projects are broken into small tasks that are electronically distributed to numerous on-demand contributors. Because these low-paid workers can be unreliable, we need to devise schemes to increase confidence in our answers, typically by assigning each task multiple times and combining the answers in some way. I will present the first rigorous treatment of this problem, and provide both an optimal task assignment scheme (using a random graph) and an optimal inference algorithm (based on low-rank matrix approximation and belief propagation) for that task assignment. This approach significantly outperforms previous approaches and, in fact, is asymptotically order-optimal, which is established through comparisons to an oracle estimator. Another important problem in learning from the wisdom of the crowd is how to make product recommendations based on past user ratings. A common and effective way to model these user ratings datasets is to use low-rank matrices. In order to make recommendations, we need to predict the unknown entries of a ratings matrix. A natural approach is to find a low-rank matrix that best explains the observed entries. Motivated by this recommendation problem, my approach is to provide a general framework for recovering a low-rank matrix from partial observations. I will introduce a novel, efficient and provably order-optimal algorithm for this matrix completion problem. The optimality of this algorithm is established through a comparison to a minimax lower bound on what the best algorithm can do.*Time and place*: 10--11 am on Wednesday, 28 March 2012, in Gates Hall 6115- Lise Getoor, "Collective Graph Identification"
*Abstract*: The importance of network analysis is growing across many domains, and is fundamental in understanding online social interactions, biological processes, communication, ecological, financial, transportation networks, and more. In most of these domains, the networks of interest are not directly observed, but must be inferred from noisy and incomplete data, data that was often generated for purposes other than scientific analysis. In this talk, I will introduce the problem of graph identification, the process of inferring the hidden network from noisy observational data. I will describe some of the component steps involved, and then I will describe a collective approach to graph identification, which interleaves the necessary steps in the accurate reconstruction of the network. Time permitting, I will also survey some of the work in my group on probabilistic databases, privacy, visual analytics, and active learning.*Time and place*: 4:30--5:30 pm on Thursday, 29 March 2012, in Gates Hall 6115

Enigmas of Chance; Networks; The Collective Use and Evolution of Concepts

Posted by crshalizi at March 26, 2012 10:00 | permanent link

You are a theoretical physicist, trying to do data analysis, and "Such
a *Shande far de Goyim*!" is all I can think after reading your
manuscript. Even if it turns out we
are playing
out this touching scene (which never fails to bring tears to my eyes)
— no.

(SMBC via Lost in Transcription)

**Update**: Thanks to reader R.K. for correcting my Yiddish.

Posted by crshalizi at March 21, 2012 11:49 | permanent link

Homework 7: A little theory, a little methodology, a little data analysis: these keep growing young statisticians healthily balanced.

assignment, `n90_pol.csv` data

Posted by crshalizi at March 20, 2012 10:31 | permanent link

Simulation: implementing the story encoded in the model, step by step, to produce something data-like. Stochastic models have random components and so require some random steps. Stochastic models specified through conditional distributions are simulated by chaining together random variables. How to generate random variables with specified distributions. Simulation shows us what a model predicts (expectations, higher moments, correlations, regression functions, sampling distributions); analytical probability calculations are short-cuts for exhaustive simulation. Simulation lets us check aspects of the model: does the data look like typical simulation output? if we repeat our exploratory analysis on the simulation output, do we get the same results? Simulation-based estimation: the method of simulated moments.

*Reading*: Notes, chapter 16; R

Posted by crshalizi at March 20, 2012 10:30 | permanent link

My paper with Aaron Clauset and Mark Newman on power laws has just passed 1000 citations on Google Scholar, slightly ahead of schedule. (Actually, the accuracy of Aaron's prediction is a little creepy.)

I am spending the day reading over my student Daniel McDonald's dissertation draft. The calendar tells me that I was in the middle of writing up my own dissertation in mid-March 2001. But this is impossible, since I could swear that was just a few months ago at most, not eleven years.

Most significant of all, one of my questions has been answered by Guillaume the adaptationist goat.

Posted by crshalizi at March 15, 2012 11:00 | permanent link

The desirability of estimating not just conditional means, variances, etc., but whole distribution functions. Parametric maximum likelihood is a solution, if the parametric model is right. Histograms and empirical cumulative distribution functions are non-parametric ways of estimating the distribution: do they work? The Glivenko-Cantelli law on the convergence of empirical distribution functions, a.k.a. "the fundamental theorem of statistics". More on histograms: they converge on the right density, if bins keep shrinking but the number of samples per bin keeps growing. Kernel density estimation and its properties: convergence on the true density if the bandwidth shrinks at the right rate; superior performance to histograms; the curse of dimensionality again. An example with cross-country economic data. Kernels for discrete variables. Estimating conditional densities; another example with the OECD data. Some issues with likelihood, maximum likelihood, and non-parametric estimation.

*Reading*: Notes, chapter 15

Posted by crshalizi at March 08, 2012 10:30 | permanent link

Reminders about multivariate distributions. The multivariate Gaussian distribution: definition, relation to the univariate or scalar Gaussian distribution; effect of linear transformations on the parameters; plotting probability density contours in two dimensions; using eigenvalues and eigenvectors to understand the geometry of multivariate Gaussians; conditional distributions in multivariate Gaussians and linear regression; computational aspects, specifically in R. General methods for estimating parametric distributional models in arbitrary dimensions: moment-matching and maximum likelihood; asymptotics of maximum likelihood; bootstrapping; model comparison by cross-validation and by likelihood ratio tests; goodness of fit by the random projection trick.

*Reading*: Notes, chapter 14

Posted by crshalizi at March 06, 2012 09:25 | permanent link

Building a weather forecaster for Snoqualmie Falls, Wash., with logistic regression. Exploratory examination of the data. Predicting wet or dry days form the amount of precipitation the previous day. First logistic regression model. Finding predicted probabilities and confidence intervals for them. Comparison to spline smoothing and a generalized additive model. Model comparison test detects significant mis-specification. Re-specifying the model: dry days are special. The second logistic regression model and its comparison to the data. Checking the calibration of the second model.

*Reading*: Notes, second half of chapter 13; Faraway, chapters 6 and 7

Posted by crshalizi at March 01, 2012 10:30 | permanent link