April 25, 2013
Discovering Causal Structure from Observations (Advanced Data Analysis from an Elementary Point of View)
How do we get our causal graph? Comparing rival DAGs by testing selected conditional independence relations (or dependencies). Equivalence classes of graphs. Causal arrows never go away no matter what you condition on ("no causation without association"). The crucial difference between common causes and common effects: conditioning on common causes makes their effects independent, conditioning on common effects makes their causes dependent. Identifying colliders, and using them to orient arrows. Inducing orientation to enforce consistency. The SGS algorithm for discovering causal graphs; why it works. The PC algorithm: the SGS algorithm for lazy people. What about latent variables? Software: TETRAD and pcalg; examples of working with pcalg. Limits to observational causal discovery: universal consistency is possible (and achieved), but uniform consistency is not.
Reading: Notes, chapter 24
Posted by crshalizi at April 25, 2013 10:30 | permanent link
April 24, 2013
"Going Online: Should we do it? How? Why? What do we gain? What do we lose?" (Next Week, Instead of the Statistics Seminar)
Next week, instead of the regular seminar, the CMU statistics department will be hosting a panel on experience with online statistics education, including massive open online courses:
Posted by crshalizi at April 24, 2013 22:54 | permanent link
April 23, 2013
Growth and Debt (Advanced Data Analysis from an Elementary Point of View)
In which the relationship (if any) between GDP growth and government debt forms a bridge between causal inference and time series analysis.
Posted by crshalizi at April 23, 2013 11:50 | permanent link
Estimating Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)
Estimating graphical models: substituting consistent estimators into the formulas for front and back door identification; average effects and regression; tricks to avoid estimating marginal distributions; propensity scores and matching and propensity scores as computational short-cuts in back-door adjustment. Instrumental variables estimation: the Wald estimator, two-stage least-squares. Summary recommendations for estimating causal effects.
Reading: Notes, chapter 23
Posted by crshalizi at April 23, 2013 10:30 | permanent link
April 16, 2013
Brought to You by the Letters D, A, and G (Advanced Data Analysis from an Elementary Point of View)
In which the arts of estimating causal effects from observational data are practiced on Sesame Street.
Posted by crshalizi at April 16, 2013 11:50 | permanent link
Identifying Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)
Reprise of causal effects vs. probabilistic conditioning. "Why think, when you can do the experiment?" Experimentation by controlling everything (Galileo) and by randomizing (Fisher). Confounding and identifiability. The back-door criterion for identifying causal effects: condition on covariates which block undesired paths. The front-door criterion for identification: find isolated and exhaustive causal mechanisms. Deciding how many black boxes to open up. Instrumental variables for identification: finding some exogenous source of variation and tracing its effects. Critique of instrumental variables: vital role of theory, its fragility, consequences of weak instruments. Irremovable confounding: an example with the detection of social influence; the possibility of bounding unidentifiable effects. Summary recommendations for identifying causal effects.
Reading: Notes, chapter 22
Optional reading: Pearl, "Causal Inference in Statistics", sections 3.3--3.5, 4, and 5.1
Posted by crshalizi at April 16, 2013 10:30 | permanent link
April 11, 2013
Graphical Causal Models (Advanced Data Analysis from an Elementary Point of View)
Probabilistic prediction is about passively selecting a sub-ensemble, leaving all the mechanisms in place, and seeing what turns up after applying that filter. Causal prediction is about actively producing a new ensemble, and seeing what would happen if something were to change ("counterfactuals"). Graphical causal models are a way of reasoning about causal prediction; their algebraic counterparts are structural equation models (generally nonlinear and non-Gaussian). The causal Markov property. Faithfulness. Performing causal prediction by "surgery" on causal graphical models. The d-separation criterion. Path diagram rules for linear models.
Reading: Notes, chapter 21
Optional reading: Cox and Donnelly, chapter 9; Pearl, "Causal Inference in Statistics", section 1, 2, and 3 through 3.2
Posted by crshalizi at April 11, 2013 10:30 | permanent link
April 09, 2013
Choosing a Better History (Advanced Data Analysis from an Elementary Point of View)
Exam 2: in which we examine how the citizens of ex-communist country X look at history and human rights, as a way of practicing multivariate data analysis.
Assignment; the data set is still confidential and so not public.
Posted by crshalizi at April 09, 2013 11:50 | permanent link
Graphical Models (Advanced Data Analysis from an Elementary Point of View)
Conditional independence and dependence properties in factor models. The generalization to graphical models. Directed acyclic graphs. DAG models. Factor, mixture, and Markov models as DAGs. The graphical Markov property. Reading conditional independence properties from a DAG. Creating conditional dependence properties from a DAG. Statistical aspects of DAGs. Reasoning with DAGs; does asbestos whiten teeth?
Reading: Notes, chapter 20
Posted by crshalizi at April 09, 2013 10:30 | permanent link
April 04, 2013
"Prediction in Complex Networks" (Next Week at the Statistics Seminar)
All of the statistics department's seminars are, of course, fascinating presentations of important work, but next week's could hardly be more relevant to my interests if I had arranged it myself.
Posted by crshalizi at April 04, 2013 13:29 | permanent link
April 02, 2013
Mixture Models (Advanced Data Analysis from an Elementary Point of View)
From factor analysis to mixture models by allowing the latent variable to be discrete. From kernel density estimation to mixture models by reducing the number of points with copies of the kernel. Probabilistic formulation of mixture models. Geometry: planes again. Probabilistic clustering. Estimation of mixture models by maximum likelihood, and why it leads to a vicious circle. The expectation-maximization (EM, Baum-Welch) algorithm replaces the vicious circle with iterative approximation. More on the EM algorithm: convexity, Jensen's inequality, optimizing a lower bound, proving that each step of EM increases the likelihood. Mixtures of regressions. Other extensions.
Extended example: Precipitation in Snoqualmie Falls revisited. Fitting a two-component Gaussian mixture; examining the fitted distribution; checking calibration. Using cross-validation to select the number of components to use. Examination of the selected mixture model. Suspicious patterns in the parameters of the selected model. Approximating complicated distributions vs. revealing hidden structure. Using bootstrap hypothesis testing to select the number of mixture components.
Posted by crshalizi at April 02, 2013 10:30 | permanent link
March 31, 2013
Books to Read While the Algae Grow in Your Fur, March 2013
Attention conservation notice: I have no taste.
Posted by crshalizi at March 31, 2013 23:59 | permanent link
March 28, 2013
Factor Analysis (Advanced Data Analysis from an Elementary Point of View)
Adding noise to PCA to get a statistical model. The factor model, or linear regression with unobserved independent variables. Assumptions of the factor model. Implications of the model: observable variables are correlated only through shared factors; "tetrad equations" for one factor models, more general correlation patterns for multiple factors. Our first look at latent variables and conditional independence. Geometrically, the factor model says the data cluster on some low-dimensional plane, plus noise moving them off the plane. Estimation by heroic linear algebra; estimation by maximum likelihood. The rotation problem, and why it is unwise to reify factors. Other models which produce the same correlation patterns as factor models.
Posted by crshalizi at March 28, 2013 10:30 | permanent link
March 26, 2013
How the Recent Mammals Got Their Size Distribution (Advanced Data Analysis from an Elementary Point of View)
Homework 8: in which returning to paleontology gives us an excuse to work with simulations, and to compare distributions.
Posted by crshalizi at March 26, 2013 11:50 | permanent link
Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View)
Principal components is the simplest, oldest and most robust of dimensionality-reduction techniques. It works by finding the line (plane, hyperplane) which passes closest, on average, to all of the data points. This is equivalent to maximizing the variance of the projection of the data on to the line/plane/hyperplane. Actually finding those principal components reduces to finding eigenvalues and eigenvectors of the sample covariance matrix. Why PCA is a data-analytic technique, and not a form of statistical inference. An example with cars. PCA with words: "latent semantic analysis"; an example with real newspaper articles. Visualization with PCA and multidimensional scaling. Cautions about PCA; the perils of reification; illustration with genetic maps.
Posted by crshalizi at March 26, 2013 10:30 | permanent link