## April 15, 2012

### Graphical Causal Models (Advanced Data Analysis from an Elementary Point of View)

Probabilistic prediction is about passively selecting a sub-ensemble, leaving all the mechanisms in place, and seeing what turns up after applying that filter. Causal prediction is about actively producing a new ensemble, and seeing what would happen if something were to change ("counterfactuals"). Graphical causal models are a way of reasoning about causal prediction; their algebraic counterparts are structural equation models (generally nonlinear and non-Gaussian). The causal Markov property. Faithfulness. Performing causal prediction by "surgery" on causal graphical models. The d-separation criterion. Path diagram rules for linear models.

Posted by crshalizi at April 15, 2012 20:03 | permanent link

### Exam: Is This Test Really Necessary? (Advanced Data Analysis from an Elementary Point of View)

In which the analysis of multivariate data is recursively applied.

Posted by crshalizi at April 15, 2012 20:02 | permanent link

### Graphical Models (Advanced Data Analysis from an Elementary Point of View)

Conditional independence and dependence properties in factor models. The generalization to graphical models. Directed acyclic graphs. DAG models. Factor, mixture, and Markov models as DAGs. The graphical Markov property. Reading conditional independence properties from a DAG. Creating conditional dependence properties from a DAG. Statistical aspects of DAGs. Reasoning with DAGs; does asbestos whiten teeth?

Posted by crshalizi at April 15, 2012 20:01 | permanent link

### Mixture Models (Advanced Data Analysis from an Elementary Point of View)

From factor analysis to mixture models by allowing the latent variable to be discrete. From kernel density estimation to mixture models by reducing the number of points with copies of the kernel. Probabilistic formulation of mixture models. Geometry: planes again. Probabilistic clustering. Estimation of mixture models by maximum likelihood, and why it leads to a vicious circle. The expectation-maximization (EM, Baum-Welch) algorithm replaces the vicious circle with iterative approximation. More on the EM algorithm: convexity, Jensen's inequality, optimizing a lower bound, proving that each step of EM increases the likelihood. Mixtures of regressions. Other extensions.

Extended example: Precipitation in Snoqualmie Falls revisited. Fitting a two-component Gaussian mixture; examining the fitted distribution; checking calibration. Using cross-validation to select the number of components to use. Examination of the selected mixture model. Suspicious patterns in the parameters of the selected model. Approximating complicated distributions vs. revealing hidden structure. Using bootstrap hypothesis testing to select the number of mixture components.