September 25, 2009

Miniature Pearl

It would be wrong to say that Judea Pearl knows more about causal inference than anyone else — I can think of some rivals very close to where I'm writing this — but he certainly knows a lot, and has worked tirelessly to formulate and spread the modern way of thinking about the subject, centered around graphical models and their associated structural equations. I remember spending many happy hours with his book Causality when it came out in 2000, and look forward to spending more with the new edition, which is making its way to me through the mail now. In the meanwhile, however, there is what he describes as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)":

"Causal Inference in Statistics: An Overview", forthcoming in Statistics Surveys 3 (2009): 96--146 [Free PDF]
Abstract: This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.

The paper assumes a reader who's reasonably well-grounded in statistics, though not necessarily in the causal-inference literature. (Of such readers, I imagine applied economists might have more unlearning to do than most, because they will keep asking "but when do I start estimating beta?") It's not ideally calibrated for an reader coming from, say, machine learning.

One theme running through the paper is the futility of trying to define causality in purely probabilistic terms, and the fact that cases where it looks like one can do so are really cases where causal assumptions have been smuggled in. Another is that once you realize counterfactual or mechanistic assumptions are needed, the graphical-models/structural equation framework makes it immensely easier to reason about them than does the rival "potential outcomes" framework. In fact, the objects which the potential outcomes framework takes as its primitives can be constructed within the structural framework, so the correct part of the former is a subset of the latter. And by reasoning on graphical models it is easy to see that confounding can be introducing by "controlling for" the wrong variables, something explicitly denied by leading members of the potential-outcomes school. (Pearl quotes them making this mistake, and manages to pull off a more-in-sorrow-than-in-glee tone while doing so.) Mostly, however, the paper is about showing off what can be done within the new framework, which is really pretty impressive, and ought to be part of the standard tool-kit of data analysis. If you are not already familiar with it, this is an excellent place to begin, and if you are you will enjoy the elegant and comprehensive presentation.


Looking back over what I write in this blog, I feel like, on the one hand, there's too little of it lately, and on the other hand, it's too tilted towards negative, critical stuff. While not regretting at all being negative and critical about stupid ideas that need to be criticized (or, really, pulverized), I will try to expand and balance my output by posting at least once a week on some good science. We'll see how this goes.


Enigmas of Chance

Posted by crshalizi at September 25, 2009 10:12 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems