April 25, 2013

Discovering Causal Structure from Observations (Advanced Data Analysis from an Elementary Point of View)

How do we get our causal graph? Comparing rival DAGs by testing selected conditional independence relations (or dependencies). Equivalence classes of graphs. Causal arrows never go away no matter what you condition on ("no causation without association"). The crucial difference between common causes and common effects: conditioning on common causes makes their effects independent, conditioning on common effects makes their causes dependent. Identifying colliders, and using them to orient arrows. Inducing orientation to enforce consistency. The SGS algorithm for discovering causal graphs; why it works. The PC algorithm: the SGS algorithm for lazy people. What about latent variables? Software: TETRAD and pcalg; examples of working with pcalg. Limits to observational causal discovery: universal consistency is possible (and achieved), but uniform consistency is not.

Reading: Notes, chapter 24

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 25, 2013 10:30 | permanent link

April 24, 2013

"Going Online: Should we do it? How? Why? What do we gain? What do we lose?" (Next Week, Instead of the Statistics Seminar)

Next week, instead of the regular seminar, the CMU statistics department will be hosting a panel on experience with online statistics education, including massive open online courses:

"Going Online: Should we do it? How? Why? What do we gain? What do we lose?"
Panelists: Emma Brunskill; Brian Caffo; Jeff Leek; Marsha Lovett; Roger Peng; Chad Schafer
Moderators: Rebecca Nugent and Ryan Tibshirani
Time and place: 3:30--5:00 pm on Monday, 29 April 2013, in Baker Hall A51 ("Giant Eagle Auditorium")
I look forward to this very much, even if I do plan to channel Tim Burke and/or Adam Kotsko.

Corrupting the Young

Posted by crshalizi at April 24, 2013 22:54 | permanent link

April 23, 2013

Growth and Debt (Advanced Data Analysis from an Elementary Point of View)

In which the relationship (if any) between GDP growth and government debt forms a bridge between causal inference and time series analysis.

Assignment, debt.csv

Posted by crshalizi at April 23, 2013 11:50 | permanent link

Estimating Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)

Estimating graphical models: substituting consistent estimators into the formulas for front and back door identification; average effects and regression; tricks to avoid estimating marginal distributions; propensity scores and matching and propensity scores as computational short-cuts in back-door adjustment. Instrumental variables estimation: the Wald estimator, two-stage least-squares. Summary recommendations for estimating causal effects.

Reading: Notes, chapter 23

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 23, 2013 10:30 | permanent link

April 16, 2013

Brought to You by the Letters D, A, and G (Advanced Data Analysis from an Elementary Point of View)

In which the arts of estimating causal effects from observational data are practiced on Sesame Street.

Assignment, sesame.csv

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 16, 2013 11:50 | permanent link

Identifying Causal Effects from Observations (Advanced Data Analysis from an Elementary Point of View)

Reprise of causal effects vs. probabilistic conditioning. "Why think, when you can do the experiment?" Experimentation by controlling everything (Galileo) and by randomizing (Fisher). Confounding and identifiability. The back-door criterion for identifying causal effects: condition on covariates which block undesired paths. The front-door criterion for identification: find isolated and exhaustive causal mechanisms. Deciding how many black boxes to open up. Instrumental variables for identification: finding some exogenous source of variation and tracing its effects. Critique of instrumental variables: vital role of theory, its fragility, consequences of weak instruments. Irremovable confounding: an example with the detection of social influence; the possibility of bounding unidentifiable effects. Summary recommendations for identifying causal effects.

Reading: Notes, chapter 22

Optional reading: Pearl, "Causal Inference in Statistics", sections 3.3--3.5, 4, and 5.1

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 16, 2013 10:30 | permanent link

April 11, 2013

Graphical Causal Models (Advanced Data Analysis from an Elementary Point of View)

Probabilistic prediction is about passively selecting a sub-ensemble, leaving all the mechanisms in place, and seeing what turns up after applying that filter. Causal prediction is about actively producing a new ensemble, and seeing what would happen if something were to change ("counterfactuals"). Graphical causal models are a way of reasoning about causal prediction; their algebraic counterparts are structural equation models (generally nonlinear and non-Gaussian). The causal Markov property. Faithfulness. Performing causal prediction by "surgery" on causal graphical models. The d-separation criterion. Path diagram rules for linear models.

Reading: Notes, chapter 21

Optional reading: Cox and Donnelly, chapter 9; Pearl, "Causal Inference in Statistics", section 1, 2, and 3 through 3.2

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 11, 2013 10:30 | permanent link

April 09, 2013

Choosing a Better History (Advanced Data Analysis from an Elementary Point of View)

Exam 2: in which we examine how the citizens of ex-communist country X look at history and human rights, as a way of practicing multivariate data analysis.

Assignment; the data set is still confidential and so not public.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2013 11:50 | permanent link

Graphical Models (Advanced Data Analysis from an Elementary Point of View)

Conditional independence and dependence properties in factor models. The generalization to graphical models. Directed acyclic graphs. DAG models. Factor, mixture, and Markov models as DAGs. The graphical Markov property. Reading conditional independence properties from a DAG. Creating conditional dependence properties from a DAG. Statistical aspects of DAGs. Reasoning with DAGs; does asbestos whiten teeth?

Reading: Notes, chapter 20

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2013 10:30 | permanent link

April 04, 2013

"Prediction in Complex Networks" (Next Week at the Statistics Seminar)

All of the statistics department's seminars are, of course, fascinating presentations of important work, but next week's could hardly be more relevant to my interests if I had arranged it myself.

Jennifer Neville, "Prediction in complex networks: The impact of structure on learning and prediction"
Abstract: The recent popularity of online social networks and social media has increased the amount of information available about users' behavior — including current activities and interactions among friends and family. This rich relational information can be exploited to predict user interests and preferences even when individual data is sparse, as the relationships are a critical source of information that identify potential statistical dependencies among people. Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models — which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets.
Place and time: Scaife Hall 125, 4--5 pm on Monday, 8 April 2013

Enigmas of Chance; Networks

Posted by crshalizi at April 04, 2013 13:29 | permanent link

April 02, 2013

Mixture Models (Advanced Data Analysis from an Elementary Point of View)

From factor analysis to mixture models by allowing the latent variable to be discrete. From kernel density estimation to mixture models by reducing the number of points with copies of the kernel. Probabilistic formulation of mixture models. Geometry: planes again. Probabilistic clustering. Estimation of mixture models by maximum likelihood, and why it leads to a vicious circle. The expectation-maximization (EM, Baum-Welch) algorithm replaces the vicious circle with iterative approximation. More on the EM algorithm: convexity, Jensen's inequality, optimizing a lower bound, proving that each step of EM increases the likelihood. Mixtures of regressions. Other extensions.

Extended example: Precipitation in Snoqualmie Falls revisited. Fitting a two-component Gaussian mixture; examining the fitted distribution; checking calibration. Using cross-validation to select the number of components to use. Examination of the selected mixture model. Suspicious patterns in the parameters of the selected model. Approximating complicated distributions vs. revealing hidden structure. Using bootstrap hypothesis testing to select the number of mixture components.

Reading: Notes, chapter 19; mixture-examples.R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 02, 2013 10:30 | permanent link

March 31, 2013

Books to Read While the Algae Grow in Your Fur, March 2013

Attention conservation notice: I have no taste.

John Levi Martin, Social Structures
The best approach to a theory of social networks I have ever seen from the hands of a sociologist. Specifically, it is about relating the content of different social relationships to the form of larger network to which they give rise, and how both form and content are linked to the ways participants think about the relationship — but not just to how they think. Martin draws very deeply on a huge range of scholarship, everything from studies of American teenagers at summer camps through the ethology of dominance hierarchies (including the origins of the concept "pecking order") to the history of European militaries and the development of party politics in colonial New York and Virginia. Astonishingly, he really pulls it all together, and writes much more than decently.
To over-simplify a lot, Martin wants to identify what allows some types of relationships to ramify into very large social structures, where the participants can nonetheless have some grasp of the structure and how it is organized. (It's not enough if the organization only becomes evident to an outsider after detailed sociometric analysis.) Principles like homophily don't work, because they would lead merely to cliques, or at best to dense clusters, and people are simply unable to handle dense social networks at any large scale; real networks are, and must be, sparse. "Balance" — the transitive closure of the idea that "the enemy of my enemy is my friend, and the friend of my enemy is my enemy" fails because it's strategic suicide. Relationships of exchange (of gifts, children in marriage, etc.) are more promising, but are also necessarily fragile, and hard to expand. Human beings simply don't do pecking orders, unless they are confined without possibilities of escape (like schoolchildren), and even then sorting a truly large pool of people by peck-ability doesn't scale.
The most robust possibility for creating large-scale, comprehensible social structures is an anti-symmetric patron-client relationship, grounded in some inequality pre-existing inequality of status or resources (or ideally both), where clients offer services to patrons and patrons protect clients from other members of the patron class. If patrons can be clients of more important patrons, such ties can be concatenated into vast pyramids. In doing so, they do not lose comprehensibility (everyone remains a client of a single patron, who is superior on some recognized dimension), or requiring dense networks, or a carefully balanced flow of resources, or vast efforts on the part of participants. Patronage is not a transitive relationship (my patron's patron is not my patron, whom I must serve), but patronage pyramids can as it were harden into command hierarchies, where subordination is transitive; Martin explores the role of this process in creating the modern corporation, army, political party, and state. (On the corporation, he is particularly good [pp. 236--241] on the fact that many people like being in control, and technical or economic efficiency be damned.)
To some extent, I feel that he leaves off just where things are getting interesting, by mentioning that in the case of parties, people create transitive relationships with each other by imagining that they have a binary relationship with the party, or perhaps with its ideology. Parties and ideologies, in this sense, are, to use the poet's words, "consensual hallucinations", but nonetheless very important to how any really large social organization happens. I wish this book said more about how they worked, but Martin seems to treat them as black boxes. (The phrase "imagined community" does not, to the best of my recollection, ever appear in the text.) I hope that this will be dealt with in a later book.
(Read on Kieran Healy's recommendation)
Update: "JLM" has a page of replies to reviews; the tone of the book itself is rather more staid.
Elizabeth Bear, Shattered Pillars
Sequel to the magnificent Range of Ghosts, continuing the action where that book left off. And there is a lot of action: escapes, betrayals, ambushes, storms, fires, eruptions, un-natural plagues, miraculous births; also cities (variously thriving, burning, seething with discontent, rebuilding and ruined), wondrous beasts, ghouls, forbidden books, slave poetesses, the intersection of chemical thermodynamics with wizardry, and cramped tunnels and vast skies. As an act of story-telling, what strikes me most, having just put the book down, is how much of the novel is told from perspectives of those who are (not to put too fine a point on it) villains, yet in such a way that the reader is invited to comprehend and even sympathize, though not approve. (Imagine key parts of The Two Towers being told from the perspective of Grima Wormtongue, who thought he was shouldering a burden by doing unpleasant things for the good of Rohan.) At the same time, Temur and Samarkar only grow on me as heroes. There will be a third book, for which I can hardly wait.
Elizabeth Bear, Bone and Jewel Creatures
Mind candy about dueling necromancers, but candy of a high grade. (How often, in mind candy or elsewhere, is the protagonist an old woman with arthritis?) In the same world as Range of Ghosts and Shattered Pillars, but thematically distinct.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Networks; Commit a Social Science

Posted by crshalizi at March 31, 2013 23:59 | permanent link

March 28, 2013

Factor Analysis (Advanced Data Analysis from an Elementary Point of View)

Adding noise to PCA to get a statistical model. The factor model, or linear regression with unobserved independent variables. Assumptions of the factor model. Implications of the model: observable variables are correlated only through shared factors; "tetrad equations" for one factor models, more general correlation patterns for multiple factors. Our first look at latent variables and conditional independence. Geometrically, the factor model says the data cluster on some low-dimensional plane, plus noise moving them off the plane. Estimation by heroic linear algebra; estimation by maximum likelihood. The rotation problem, and why it is unwise to reify factors. Other models which produce the same correlation patterns as factor models.

Reading: Notes, chapter 18; factors.R and sleep.txt

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 28, 2013 10:30 | permanent link

March 26, 2013

How the Recent Mammals Got Their Size Distribution (Advanced Data Analysis from an Elementary Point of View)

Homework 8: in which returning to paleontology gives us an excuse to work with simulations, and to compare distributions.

Assignment; MOM_data_full.txt

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 26, 2013 11:50 | permanent link

Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View)

Principal components is the simplest, oldest and most robust of dimensionality-reduction techniques. It works by finding the line (plane, hyperplane) which passes closest, on average, to all of the data points. This is equivalent to maximizing the variance of the projection of the data on to the line/plane/hyperplane. Actually finding those principal components reduces to finding eigenvalues and eigenvectors of the sample covariance matrix. Why PCA is a data-analytic technique, and not a form of statistical inference. An example with cars. PCA with words: "latent semantic analysis"; an example with real newspaper articles. Visualization with PCA and multidimensional scaling. Cautions about PCA; the perils of reification; illustration with genetic maps.

Reading: Notes, chapter 17; pca.R, pca-examples.Rdata, and cars-fixed04.dat

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 26, 2013 10:30 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems