December 20, 2011

Self-Evaluation and Lessons Learned (Introduction to Statistical Computing)

Attention conservation notice: Academic statistico-algorithmic navel-gazing.

With the grading done, but grades not yet posted while we wait for the students to fill out faculty evaluations, it's time to reflect on the class just finished. (Since this is the third time I've done a post like this, I guess it's now one of my traditions.)

Overall, it went a lot better than my worst fears, especially considering this was the first time the class was offered. There was a lot of attrition initially, both from students who had taken a lot of programming, and from students who had done no programming at all. (I was truly surprised by how many students had never used a command-line before.) The ones who stuck around all (I think) learned a lot --- more for those who knew less about programming to start with, naturally. Most of the credit for this goes to Vince, naturally.

Some stuff didn't work well:

Stuff that worked well:

Stuff I'd try to do next time:

Over-all assessment: B; promising, but with clear areas for definite improvement.

Obligatory disclaimer: Don't blame Vince, or anyone else, for what I say here.

Introduction to Statistical Computing

Posted by crshalizi at December 20, 2011 09:35 | permanent link

December 18, 2011

Homework: Baseball Salaries (Introduction to Statistical Computing)

Assignment, database (large!)

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:35 | permanent link

Databases II (Introduction to Statistical Computing)

Lecture 26: Aggregation in databases is like split/apply/combine. Joining tables: what it is and how to do it. Examples of joinery. Accessing databases from R with the DBI package.

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:34 | permanent link

Databases I (Introduction to Statistical Computing)

Lecture 25: The idea of a relational database. Tables, fields, keys, normalization. Server-client model. Example of working with a database server. Intro to SQL, especially SELECT.

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:33 | permanent link

Homework: Get (the 400) Rich(est list) Quick (Introduction to Statistical Computing)

Assignment; solutions (R)

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:32 | permanent link

Importing Data from Webpages II (Introduction to Statistical Computing)

Lecture 24: Scraping by constructing and debugging regular expressions. R

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:31 | permanent link

Importing Data from Webpages I (Introduction to Statistical Computing)

Lecture 23: Importing data from webpages. Example: scraping weblinks. Using regular expressions again (with multiple capture groups). Example: how long does a random surfer take to get to Facebook? Exception handling. R

Introduction to Statistical Computing

Posted by crshalizi at December 18, 2011 16:30 | permanent link

December 07, 2011

My Work Here Is Done (Introduction to Statistical Computing)

One of the final projects was to build first- and second- order Markov models based on the text of Heart of Darkness. I present their last slide:

(Whatever merit this might have is due to the students: Jason Capehart, Seung Su Han, Alexander Murray-Watters, and Elizabeth Silver.)

Update, 18 December: Of course, what I should have titled this post is "I'm now becoming my own self-fulfilled prophecy". (I'm really not very good at quotation-capping.)

Introduction to Statistical Computing

Posted by crshalizi at December 07, 2011 11:56 | permanent link

November 30, 2011

Books to Read While the Algae Grow in Your Fur, November 2011

Attention conservation notice: I have no taste.

F For Fake
Watched after Jessa Crispin's recommendation, which I cannot improve upon. An utterly delightful movie.
(There is an essay, if not a dissertation, to be written about the male gaze in this movie. How much of this is due to Welles being taken with Ms. Kodar's [admittedly stunning] legs, how much was aimed at mere commercial sex-appeal, and how much was a deliberate manipulation and distraction of half or so of the audience? The way the spectators are made to look foolish in the hidden-camera sequence, and the plot of the last third or so, incline me towards thinking a lot of it was deliberate, but without much confidence.)
Richard Hofstadter, The Age of Reform: From Bryan to F.D.R.
A sympathetic, if lightly skeptical, look at the three great movements for political reform in America in the late 19th and early 20th centuries: the Populists, the Progressives, and the New Deal. Hofstadter presumes that the reader is already familiar with the narrative history, and is more interested in the history of ideas, and even more, of attitudes and moral values, than of practical political struggles. Particularly well-drawn is the contrast between the values of Progressive reformers and their urban middle-class supporters, and those of urban machine bosses and their immigrant supporters, in ch. V. (This part of the book was, of course, adapted for the movies as The Great McGinty.) He does, however, go into some detail about the economic background to the earlier two movements, and especially the Populists, dismissing the idea that it had anything to do with the closing of the frontier, instead emphasizing the world-wide distress inflicted on commercial agriculture, which included almost all American farmers, by decades-long deflation. (He does not, however, otherwise have much to say about the international context, unlike some people.) Hofstadter has no time for Populist conspiracy-mongering, but also leaves the reader in no doubt that the farmers were, in fact, getting screwed. Conservatives are ignored except as background figures (he quotes Lionel Trilling's quip about how American conservatives have not so much ideas as "irritable mental gestures which seek to resemble ideas" in his introduction), though he is quite good at bringing out how much all these movements saw themselves as restoring a republic which had become corrupted. On this basis, one might say that the default condition of the American dream is "betrayed".
All in all, it's both an impressive work of history, and an excellent piece of writing. I have no doubt that real historians consider it utterly obsolete — we are now further away from Hofstadter, writing in 1955, than he was from the Progressives — but I still found it worthwhile.
Mark A. R. Kleiman, When Brute Force Fails: How to Have Less Crime and Less Punishment
Kleiman's own precis in Washington Monthly gives all the highlights of the book in an admirably lucid way. If, after reading that, you are fascinated want to see how he deals with the details, this book will be worth your time. Otherwise, you've read the Good Parts Version. (Also, what LizardBreath said.)
Philip Kitcher, The Ethical Project
This is a very substantial book which attempts to re-cast the nature and history of ethics as a form of "social technology", aimed at remedying "altruism failures", and generally moving humanity beyond the kind of social life endured by other primates — nasty, poor, and brutish, but not solitary. (Though he doesn't mention it, this is almost an inversion of Brecht's line "grub first, then ethics".) The guiding stars are Dewey (especially Human Nature and Conduct), John Stuart Mill (especially On Liberty and The Subjection of Women), and modern work on the evolution of cooperation. Kitcher builds from here to an examination of what counts as ethical progress, appropriate method and substance for meta-ethics, and appropriate method and recommendations for actual substantive ethics at the present day. The latter are strongly egalitarian, and not just founded on the "expanding circle" of empathy notion.
I have a lot of sympathy for Kitcher's over-all position, and even for many of specifics. I have a very deep respect for his work in the philosophy of science (The Advancement of Science and The Nature of Mathematical Knowledge have both been actually useful to me in practice). Nonetheless, I found this book unsatisfying, and increasingly unsatisfying as it went on. He set himself too easy a task by showing that his "pragmatic naturalism" is no more hopeless than the approaches to ethics now dominant in academic philosophy in English-speaking countries; those same approaches have far too much influence on his ideas about how to think ethically now (as opposed to how our ancestors might have done so back in the day); his acceptance of population thinking is inadequate; and he did not really come to grips with anti-egalitarian positions in their strongest forms. All of these points, obviously, deserve fuller fleshing-out, but who knows when or if I'll get around to that.
A rather partial set of points of unhappiness: The reification of discrete societies (cf. Tilly). The assumption that each society has one, and only one, ethical code, which is explicit, or can be made so, and to which all subscribe. (He does not push population thinking far enough, despite his citation of Sperber. Also, cf. under Hofstader above.) Running together a society's ethical codes with its institutions, and even with the consequences of its institutions. (An extreme example: sex traffic is an extremely institutionalized and organized crime, but not even those benefiting from it claim it's ethically justified.) The rather bizarre recommendation to think about ethics by carrying out imaginary conversations with people one does not know, by imagining what they would think and want, if their situation were very different, and big chunks of their identity (especially religion) were replaced by something more agreeable. (This seems like the pernicious influence of contemporary ethicists.) Broadly: fails to provide any convincing explanation of why his preferred function for a social technology of normative guidance, namely fixing failures of altruism, should over-ride any other conceivable function. (For instance: "the increase of man's power over nature and the abolition of man's power over man".) Even if he is right and his function came first, why should that have any influence over us now?
Relatedly: The consideration of how to answer the elitist "free spirit" (call him Fritz) doesn't allow Fritz enough imagination. Kitcher allows as how developing one's talents and potentialities is a good thing, but says that even if only an elite can do it, only equality of opportunity can recruit that elite, and regards this as settling the matter. But Fritz can reply equality of opportunity means spreading resources so thin that everyone merely has the opportunity to be a yokel, and that for his fellow free spirits to truly develop and manifest their potential, inequality of resources and even domination are required. He could go on to claim that the making of those splendid, free-spirited lives is intrinsically valuable, and if it means exploiting the human herds, what of it? The latter are valuable only to the extent that the help the elite. If in some petty mathematical sense this is not the "optimal" elite, who cares? Let those people think how to renew their ranks for the next generation; Kitcher doesn't rate a say. (I don't believe any of what I'm putting in Fritz's mouth, but I did read Nietzsche as a teenager.)
Jeannine Hall Gailey, She Returns to the Floating World
Poetry; more mythology, fairy tales, science fiction, and fall-out from growing up in Oak Ridge and the shadow of the bomb. The whole forms a love letter to Japan. (Samples of the poetry; "Introduction to California Poetics" is not in this collection, but also very nice.)
Howard Andrew Jones, The Desert of Souls
Mind candy. Historical fantasy resulting from blending Robert Howard with The Arabian Nights; not noticeably orientalist in the bad way. Leaves open the door for sequels, which I'd read, but complete in itself.
Kate Beaton, Hark! A Vagrant
Because you are an intelligent being of taste and refinement with a working Internet connection, you already read Hark! A Vagrant. Wouldn't you like to support that most worthy artist by buying a handsome compilation of her work?

The Commonwealth of Lettersl Scientifiction and Fantastica; Writing for Antiquity; The Progressive Forces; Philosophy; Natural Science of the Human Species; The Collective Use and Evolution of Concepts; Commit a Social Science; Linkage; The Beloved Republic

Posted by crshalizi at November 30, 2011 23:59 | permanent link

"Tidy Data" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) do statistical computing and (2) will be in Pittsburgh on Monday.

For those of us who use R all the time, next week's speaker needs no introduction. I can't make the kids in statistical computing attend next week's seminar, but I probably ought to.

Hadley Wickham, "Tidy Data"
Abstract: It's often said that 80 percent of the effort of analysis is spent just getting the data ready to analyze, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light. Despite the amount of time it takes up, there has been little research on how to do clean data well. Part of the challenge is the breadth of activities that cleaning encompasses, from outlier checking to data parsing to missing value imputation. To get a handle on the problem, this talk focuses on a small, but important, subset of data cleaning that I call data "tidying": getting the data in a format that is easy to manipulate, model, and visualize.
In this talk you'll see some of the crazy data sets that I've struggled with over the years, and learn the basic tools for making messy data tidy. I'll also discuss tidy tools, tools that take tidy data as input and return tidy data as output. The idea of a tidy tool is useful for critiquing existing R functions, and will help to explain why some tasks that seem like they should be easy are in fact quite hard. This work ties together reshape2, plyr and ggplot2 with a consistent philosophy of data. Once you master this data format, you'll find it much easier to manipulate, model and visualize your data.
Time and place: 4--5 pm on Monday, 5 December 2011, in Doherty Hall A310

As always, the talk is free and open to the public. R groupies should however contain themselves while Prof. Wickham is speaking.

Enigmas of Chance

Posted by crshalizi at November 30, 2011 15:00 | permanent link

November 24, 2011

"They've traded more for cigarettes / than I've managed to express"; or, Dives, Lazarus, and Alice

Attention conservation notice: 1000+ words on the limits of welfare economics, in the form of a thought experiment or parable superficially tuned to the holiday (and brooding on my hard-disk for months). Gloomy, snarky, heavy-handed, academic, and obvious to anyone who knows enough about the subject to care. Have you no friends and family to whom you should be showing your love (perhaps in the form of food)?

Let us consider a simple economy with three individuals. Alice is a restaurateur; she has fed herself, and has just prepared a delicious turkey dinner, at some cost in materials, fuel, and her time.

Dives is a wealthy conceptual artist1, who has eaten and is not hungry, but would like to buy the turkey dinner so he can "feed" it to the transparent machine he has built, and film it being "digested" and eventually excreted2. To achieve this, he is willing and able to spend up to $5000. Dives does not care, at all, about what happens to anyone else; indeed, as an exponent of art for art's sake, he does not even care whether his film will have an audience.

Huddled miserably in a corner of the gate of Dives's condo is Lazarus, who is starving, on the brink of death, but could be kept alive for another day by eating the turkey. The sum total of Lazarus's worldly possession consist of filthy rags, of no value to any one else, and one thin dime. Since, however, he is starving, there is no amount of money which could persuade Lazarus to part with the turkey, should he gain possession of it.

Assume that everyone is a rational agent, with these resources and preferences. What does economics tell us about this situation?

First, whatever Alice has spent preparing the turkey is a sunk cost, and irrelevant to deciding what to do next.

Second, Alice would be better of selling the turkey to either Dives or Lazarus than keeping it for herself, and either trade would also benefit the buyer, so that's a win-win. Either trade would be Pareto-improving. However, neither trade is strictly better for everyone than the other: if she sells to Lazarus, Dives is disappointed, and if she sells to Dives, Lazarus starves. Of course, if we are being exact, Lazarus starves to death whether Alice keeps the turkey or sells it to Dives, so that trade makes Lazarus no worse off.

Third, Lazarus can only offer ten cents. Since Dives would be willing to spend up to $5000, Alice will prefer to sell to Dives. Since Dives, being a rational agent, knows how much Lazarus can pay, he will offer 11 cents, which Alice will accept as the superior offer. (Alternately, we add in a Walrasian auctioneer, and reach this price by tatonnement.) [Update: See below.] The market clears, Alice is 11 cents better off, Dives enjoys a consumer surplus of $4999.89, and Lazarus starves to death in the street, clutching his dime.Nothing can be changed without making someone worse off, so this is Pareto optimal.

And so, in yet another triumph, the market mechanism has allocated a scarce resource, viz., the turkey, to its most efficient use, viz., being turned into artificial shit. What makes this the most efficient use of the scarce resource? Why, simply that it goes to the user who will pay the highest price for it. This is all that economic efficiency amounts to. It is not about meeting demand, but meeting effective demand, demand backed by purchasing power.

(Incidentally, nothing in this hinges on some failure of perfect competition arising from having only three agents in the market. If we had another copy of Alice, another copy of Dives, and another copy of Lazarus, both Alices will sell their turkeys to the Diveses, and both Lazaruses will starve. By induction, then, the same allocation will be replicated for any finite number of Alices, Diveses, and Lazaruses, so long as there are at least as many Diveses as there Alices.)

You may be refusing to take this seriously, objecting that I have loaded the rhetorical deck pretty blatantly --- and I have! (Though not more than is customary in teaching economics.) But this is the core of Amartya Sen's model of famines, which grows from the observation that food is often exported, at a profit, from famine-stricken regions in which people are dying of hunger. This occurs not just in cases like the USSR in the 1930s, but in impeccably capitalist situations, like British India. This happens, as Sen shows, because the hungry, while they have a very great need for food, do not have the money to buy it, or, more precisely, people elsewhere will pay more. It is thus not economically efficient to feed the hungry, so the market starves them to death.

I do not, however, want to end this on a completely gloomy note. As Sen said, the same market would feed the hungry if they could afford it, so the way to combat famines is to make sure they have money or paying work or both. (If in this country we don't have to worry about famine, it's because we've arranged things so that most of us do have those resources; we still have a hunger problem because our arrangements are imperfect.) The larger point is that while what is technologically efficient depends on facts of nature, what is economically efficient is a function of our social arrangements, of who owns how much of what. Economic efficiency may be a good tool, but it is perverse to serve your own tools, and monstrous to be ruled by them. Let us be thankful for the extent to which we escape perversion and monstrosity.

Update, 27 November: Yes, I was presuming an ascending-price auction to get a price of 11 cents. If the auctioneer uses a descending-price auction, Alice could extract up to $5000 from Dives, driving his consumer surplus to zero; Lazarus, of course, starves at any price which clears the market. No, I did not say (and do not think) that we should abolish the market and replace it with a National Turkey Allocation Board. No, Dives having orders of magnitude more money than Lazarus is not essential; Dives just needs to be willing and able to spend 11 cents.

Also, further to the theme of delicious food and the invisible hand.

Manual trackback: Quomodocumque, MetaFilter; The Edge of the American West; The Browser; Aluation; Nanopolitan; Crooked Timber; I Got Here on My Bike; Oook; Siris; Slacktivist; Wolfgang Beirl; Andrew Gelman

1: It's a thought experiment.

2: I actually saw such a machine at the modern art museum in Lyon in 2003, fed in turn by the city's leading restaurants, but I cannot now remember the artist's name. Perhaps this is just as well. Update: Cris Moore, with whom I saw it, reminds me that the work in question was "Cloaca", by Wim Delvoye.

The Dismal Science; Modest Proposals

Posted by crshalizi at November 24, 2011 10:48 | permanent link

November 22, 2011

"What Do We Want?" "Quantitative Data!" " When Do We Want It?" "Soon Would Be Good!"

Someone, somewhere, has assembled a fairly reliable, comprehensive and machine-readable data set on contentious politics in the United States over the 20th century, or some large part of it. A detailed event catalog would be ideal, but I would settle for an annual index-number time series if need be. Who has done there, where are the results, and how can I get them? Leads will be rewarded with acknowledgments and/or citations, as appropriate.

In the meanwhile:

Commit a Social Science; Writing for Antiquity; The Beloved Republic

Posted by crshalizi at November 22, 2011 14:00 | permanent link

November 15, 2011

Why Think, When You Can Do the Experiment?

Attention conservation notice: Puffery about a paper in statistical learning theory. Again, if you care, why not let the referees sort it out, and check back later?

Now this one I really do not have time to expound on (see: talk in Chicago on Thursday), but since it is related to the subject of that talk, I tell myself that doing so will help me with my patter.

Daniel J. McDonald, CRS, and Mark Schervish, "Estimated VC dimension for risk bounds", arxiv:1111.3404
Abstract: Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of classifiers, because there were no bounds for the estimation error of the VC dimension itself. We rectify this omission, providing high probability concentration results for the proposed estimator and deriving corresponding generalization bounds.

VC dimension is one of the many ways of measuring how flexible a class of models are, or their capacity to match data. Specifically, it is the largest number of data-points which the class can always (seem to) match perfectly, no matter how the observations turn out, by judiciously picking one model or another from the class. It is called a "dimension" because, through some clever combinatorics, this turns out to control the rate at which the number of distinguishable models grows with the number of observations, just as Euclidean dimension governs the rate at which the measure of a geometrical body grows as its length expands. Knowing the number of effectively-distinct models, in turn, tells us about over-fitting. The true risk of a fitted model will generally be higher than its in-sample risk, precisely because it was fitted to the data and so tuned to take advantage of noise. High-capacity model classes can do more such tuning. One can actually bound the true risk in terms of the in-sample risk and the effective number of models, and so in terms of the VC dimension.

How then does one find the VC dimension? Well, the traditional route was through yet more clever combinatorics. As someone who has never quite gotten the point of the birthday problem, I find this unappealing, especially when the models are awkward and fractious, as the interesting ones generally are.

An alternative, due to Vapnik, Levin and LeCun, is to replace math with experiment. Roughly, the idea is this: make up simulated data, fit the model class to it, see how variable the fit is from run to run, and then plug this average discrepancy into a formula relating it to the VC dimension and the simulated sample size. Simulating at a couple of sample sizes and doing some nonlinear least squares then yields an estimate of the VC dimension, which is consistent in the right limits. (If you really want details, see the papers.)

The problem with the experimental approach is that it doesn't tell you how to use the estimated VC dimension in a risk bound, which is after all what you want it for. The estimate, after all, is not perfectly precise, and how is one to account for that imprecision in the bound?

This turns out to be an eminently solvable problem. One can use the estimated VC dimension, plus a comparatively-small-and-shrinking safety margin, and plug it into the usual risk bound, with just a small hit to the confidence level. Showing this qualitatively relies on the results in van de Geer's Empirical Processes in M-Estimation, which, pleasingly, was one of the first books I read on statistical learning theory lo these many years ago. Less pleasingly, getting everything needed for a bound we can calculate (and weakening some assumptions) meant re-proving many of those results, excavating awkward constants previously buried in big C's and K's and little oP's.

In the end, however, all is as one would hope: estimated VC dimension concentrates (at a calculable rate) around the true VC dimension, and the ultimate risk bound is very close to the standard one. As someone who likes learning theory and experimental mathematics, I find this pleasing. It is also a step in Cunning Plan, which I will not belabor, as it will be obvious to readers who go all the way through the paper.

Update, 22 November: The title is a catch-phrase of my mother's; I believe she got it from one of her biochemistry teachers.

Self-Centered; Enigmas of Chance

Posted by crshalizi at November 15, 2011 21:25 | permanent link

November 14, 2011

A Nudge Is as Good as a Wink to a Blind Bat

Attention conservation notice: The only thing more pathetic than a writer whining about editorial decisions is a writer whining about negative reviews and being misunderstood. Also, nothing which is both so geeky and so careless as to begin with a mis-quotation of Monty Python can end well.

So, Henry Farrell and I have an opinion piece in New Scientist about how the "libertarian paternalism" of Sunstein and Thaler, and policy-making by "nudging" more generally, are Bad Ideas. The reason we think they are Bad Ideas is that they try to do good by stealth, and thereby break the feedback mechanisms which (1) keep policy-makers accountable to those over whom they exercise power, and (2) allow policy-makers to tell whether what they are doing is working, and revise their initial policies and plans in light of experience. (And by this we very much include the experience of getting something you think you want, and discovering that it is no good for you at all.) Granting the best will in the world on the part of the nudgers, it is putting a very high value on one's own conjectures to deliberately break the most important mechanism for improving them.

In short, I thought we were making a Popperian point about how democracy is best understood not in terms of "the people's will" or the like, but accountability and rational policy revision. I also thought we were making a Popperian point about the dangers of top-down social engineering. Indeed, I was strongly tempted to quote chapter (10 and 9, respectively) and verse from The Open Society and Its Enemies for both points, but the constraints of space, and of not sounding like complete pedants, prevailed. It would, I thought, be tolerably plain what our objections were.

I had not counted on two things. First, we were, evidently, nowhere near as clear in our writing as I thought. (I take full blame for this.) Second, whoever is in charge of such matters at New Scientist gave us the headline "Nudge Policies Are Another Name for Coercion". This was so far from being our objection that we rather deliberately did not use the word "coercion" (or "coerce", etc.) at all. Everyone who is not a complete anarchist, after all, believes that some coercion is legitimate, and so the question is what sorts, to what ends, under what conditions, etc. And I regard the usual right-libertarian attempt to claim that deploying coercion only and always in favor of the interests of the rich is somehow minimizing it to be simply confused, when it is not deliberate sophistry. (As Sunstein put it in a good book with Stephen Holmes, "liberty depends on taxes".) It is, I suppose, a testimony to the hegemony of right-wing ideas that when we said something which amounted to a paraphrase and dilution of the third thesis on Feuerbach, the headline writer heard Milton Friedman, or perhaps Ayn Rand. This did not help get our point across, and the only two responses I've seen which obviously got it are two comments at Crooked Timber, by Scott Martens and by Salient.

I would also like to add that I had no idea New Scientist would syndicate our piece to Slate. The latter changed the headline to the less actively-misleading "Nudge No More", but added a gratuitous cheesecake photo, and provoked some ribbing on the part of friends who recalled my stated views about the magazine. Those views, for the record, remain unchanged; if anything, I am disturbed that Slate thought we fit either their editorial line or their tone. I console myself with thoughts Dahlia Lithwick and Jordan Ellenberg.

Morals:

  1. None of the reactions persuade me that we're wrong in our basic points. (Though Scott's remarks are well-taken.)
  2. Many of the reactions persuade me that this kind of writing is much harder than I thought.
  3. I wish to apologize to the journalists of the world, for having entertained uncharitable thoughts when they tried to dissociate themselves from their headlines.

Disclaimer: Henry is not responsible for this post.

Self-Centered; The Progressive Forces; Commit a Social Science

Posted by crshalizi at November 14, 2011 23:30 | permanent link

Projection as a Defense Mechanism for Social Network Models

Attention conservation notice: Puffery about a new manuscript on the statistical theory of some mathematical models of networks. In the staggeringly unlikely event this is actually of interest to you, why not check back later, and see if peer review has exposed it all as a tissue of fallacies?

A new paper, which I flatter myself is of some interest to those who care about network models, or exponential families, or especially about exponential families of network models:

CRS and Alessandro Rinaldo, "Consistency under Sampling of Exponential Random Graph Models", arxiv:1111.3054
Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

Obligatory Disclaimer: Ale didn't approve this post.

This started because Ale and I shared an interest in exponential family random graph models (ERGMs), whose basic idea is sheer elegance in its simplicity. You want to establish some distribution over graphs or networks; you decree some set of functions of the graph to be the sufficient statistics; and then you make the log probability of any given graph proportional to a weighted sum of these statistics. The weights are the parameters, and this is an exponential family.1. They inherit all of the wonderfully convenient mathematical and statistical properties of exponential families in general, e.g., finding the maximum likelihood estimator by equated expected and observed values of the sufficient statistics. (This is also the maximum entropy distribution, though I set little store by that.) They are also, with judicious choices of the statistics, quite spiffy-looking network models. This paper by Goodreau et al., for instance, is exemplary in using them to investigate teenage friendship networks and what they can tell us about general social mechanisms, and deserves a post of its own. (Indeed, a half-written post sits in my drafts folder.) This is probably the best class of statistical models of networks now going, which I have happily taught and recommended to students, with a special push for statnet.

What Ale and I wanted to do was to find conditions under which maximum likelihood estimation would be consistent --- when we saw more and more data from the same source, our estimates of the parameters would come closer and closer to each other, and to the truth. The consistency of maximum likelihood estimates for independent observations is classical, and but networks, of course, are full of dependent data. People have proved the consistency of maximum likelihood for some kinds of models of time series and of spatial data, but those proofs (at least the ones we know) mostly turned on ordering or screening-off properties of time and space, lacking in arbitrary graphs. Those which didn't turned on the "blocking" trick, where one argues that widely-separated events are nearly independent, and so approximates the dependent data by independent surrogates, plus weak corrections. This can work with random fields on networks, as in this excellent paper by Xiang and Neville, but it doesn't seem to work for models of networks, where distance itself is endogenous.

I remember very distinctly sitting in Ale's office on a sunny October afternoon just over a year ago2, trying to find some way of making the blocking trick work, when it occurred to us that maybe the reason we couldn't show that estimates converged as we got more and more data from the same ERGM was that the very idea of "more and more data from the same ERGM" did not, in general, make sense. What exactly prompted this thought I do not recall, though I dare say the fact that we had both recently read Lauritzen's book on sufficiency, with its emphasis on repetitive and projective structures, had something to do with it.

The basic point is this. Suppose we observe a social network among (say) a sample of 500 students at a high school, but know there are 2000 students in all. We might think that the whole network should be described by some ERGM or other. How, however, are we to estimate it from the mere sample? Any graph for the whole network implies a graph for the sampled 500 students, so the toilsome and infeasible, but correct, approach would be to enumerate all whole-network graphs compatible with the observed sample graph, and take the likelihood to be the sum of their probabilities in the whole-network ERGM. (If you do not strictly know how large the whole network is, then I believe you are strictly out of luck.) This is not, of course, what people actually do. Rather, guided by experience with problems of survey sampling, regression, time series, etc., they have assumed that the same ERGM, with the same sufficient statistics and the same parameter values, applies to both the whole network and to the sample. They have assumed, in other words, that the ERGMs form a projective family.

Once you recognize this, it turns out to be straightforward to show that projectibility imposes very strong restrictions on the sufficient statistics --- they have to obey a condition about how they "add up" across sub-graphs which we called3 "having separable increments". This condition is "physically" reasonable but not automatic, and I will not attempt to write it out in HTML. (Read the paper!) Conversely, so long as the statistics have such "separable increments", the exponential family is projectible. (Pinning down the converse was the tricky bit.) Once we have this, conditions for consistency of maximum likelihood turn out to be straightforward, as all the stuff about projectibility implies the change to the statistics when adding new data must be unpredictable from the old data. The sufficient statistics themselves form a stochastic process with independent increments, something for which there is a lot of convergence theory. (This does not mean the data must be independent, as we show by example.) All of these results prove to be perfectly general facts about exponential families of dependent variables, with no special connection to networks.

The punch-line, though, is that the most commonly used specifications for ERGMs all include — for good reasons! — statistics which break projectibility. Models with "dyadic independence", including the models implicit or explicit in a lot of community discovery work, turn out to be spared. Anything more sophisticated, however, has got a very real, though admittedly somewhat subtle, mathematical pathology. Consistency of estimation doesn't even make sense, because there is no consistency under sampling.

We have some thoughts on where this leaves statistical models of networks, and especially about how to actually move forward constructively, but I will let you read about them in the paper.

Update, next day: fixed typos, clarified a sentence and added a reference.

1: Or if, like me, you were brought up in statistical mechanics, a Boltzmann-Gibbs ensemble, with the statistics being the extensive thermodynamic variables (think "volume" or "number of oxygen molecules"), and the parameters their conjugate intensive variables (think "pressure" or "chemical potential of oxygen"). If this line of thought intrigues you, read Mandelbrot.

2: With merely a year between the idea and the submission, this project went forward with what is, for me, unseemly haste.

3: We couldn't find a name for the property the statistics needed to have, so we made one up. If you have encountered it before, please let me know.

Self-Centered; Networks; Enigmas of Chance

Posted by crshalizi at November 14, 2011 22:00 | permanent link

Lab: Regular Expressions I (Introduction to Statistical Computing)

Assignment, partial solutions.

Introduction to Statistical Computing

Posted by crshalizi at November 14, 2011 10:31 | permanent link

Regular Expressions II (Introduction to Statistical Computing)

Lecture 22: More regular expressions. Worked example from last time. Extracting matches. Worked example: scraping web links. Tagged expressions and capture groups. R for examples.

Introduction to Statistical Computing

Posted by crshalizi at November 14, 2011 10:30 | permanent link

November 11, 2011

American Traditions

Today seems like a good idea to propose that setting up a stela on the National Mall in Washington, with a celebratory inscription in Maya and a Long Count date, to be inaugurated on 21 December 2012, or rather 13.0.0.0.0.

Modest Proposals

Posted by crshalizi at November 11, 2011 11:11 | permanent link

Final Project Descriptions (Introduction to Statistical Computing)

Project options.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:38 | permanent link

Regular Expressions I (Introduction to Statistical Computing)

Lecture 21: Regular expressions are descriptions of patterns. Why we want to use them. Search, search and replace.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:37 | permanent link

Basics of Character Manipulation (Introduction to Statistical Computing)

Lecture 20: Overview of character data. Basic string operations: extract and concatenate.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:36 | permanent link

Lab: Changing My Shape, I Feel Like an Accident (Introduction to Statistical Computing)

In which we practice simulating from Markov chains. (Solutions.)

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:35 | permanent link

Homework: Sampling Accidents (Introduction to Statistical Computing)

In which we use Markov chain Monte Carlo to do statistical inference.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:34 | permanent link

Simulation III: Mixing and Markov Chain Monte Carlo (Introduction to Statistical Computing)

Lecture 19: Mixing times and correlation time. Continuous-valued Markov processes. The Metropolis algorithm for Markov chain Monte Carlo.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:33 | permanent link

Homework: 'Tis the Season to Be Unemployed (Introduction to Statistical Computing)

Assignment, data.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:32 | permanent link

Lab: Split-Apply-Combine (Introduction to Statistical Computing)

In which we practice using plyr.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:31 | permanent link

Simulation II: Monte Carlo and Markov Chains (Introduction to Statistical Computing)

Lecture 18: the Monte Carlo method for numerical integration; Monte Carlo for expectation values; importance sampling. Markov chains: definition, the roots of the Markov property; asymptotics of Markov chains via linear algebra; Markov chains and graphs; the law of large numbers (ergodic theorem) for Markov chains.

Introduction to Statistical Computing

Posted by crshalizi at November 11, 2011 10:30 | permanent link

November 02, 2011

First and Second City Sloth

I mentioned trips for upcoming talks, didn't I?

"When Can We Learn Exponential Random Graph Models from Samples?"
Abstract: Typically, statistical models of network structure are are models for the entire network, while the data is only a sampled sub-network. Parameters for the whole network, which are what we care about, are estimated by fitting the model on the sub-network. This assumes that the model is "consistent under sampling", or, in terms of the theory of stochastic processes, that it forms a projective family. For the deservedly-celebrated class of exponential random graph models (ERGMs), this apparently trivial condition is in fact violated by many popular and scientifically appealing models; satisfying it drastically limits ERGM's expressive power. These results are special cases of more general ones about exponential families of dependent variables, which we also prove. As a consolation prize, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses. (Joint work with Alessandro Rinaldo.)
Time and place: 4--5 pm on Thursday, 3 November 2011, in the "Interschool Lab", room 750 in the Schapiro Center for Engineering and Physical Science Research, Columbia University
"Nonparametric Bounds on Time Series Prediction Risk for Model Evaluation and Selection", University of Chicago Econometrics and Statistics Seminar
Abstract: Everyone wants their time series model to predict well. Since how well it did on the data you used to fit it exaggerates how well it can be expected to do in the future, and since penalties like AIC are only correct asymptotically (if then), controlling prediction risk with finite data needs something different. Combining tools from machine learning and ergodic theory lets us build a bound on prediction risk for state-space models in terms of historical performance, a measure of the model's capacity to fit arbitrary data, and a measure of how much information is actually in the time series. The result applies even at small samples, places minimal restrictions on the data source, and is agnostic about mis-specification. These bounds can then be used to evaluate and compare models. (Joint work with Daniel McDonald and Mark Schervish.)
Time and place: 1:30--2:50 pm on Thursday, 17 November 2011, in "HC 3B" (I don't know where that is but hopefully I will by then)

The Columbia talk is free and open to the public. I will be disillusioned unless Chicago not only charges admission, but uses a carefully optimized scheme of price discrimination.

Self-Centered

Posted by crshalizi at November 02, 2011 10:30 | permanent link

October 31, 2011

Books to Read While the Algae Grow in Your Fur, October 2011

Attention conservation notice: I have no taste.

Cristopher Moore and Stephan Mertens, The Nature of Computation [book website]
This is, simply put, the best-written book on the theory of computation I have ever read; one of the best-written mathematical books I have ever read, period. I am horribly biased in its favor, of course — Cris is a collaborator, and, even more, an old friend — but from beginning to end, and all the 900+ pages in between, this was lucid, insightful, just rigorous enough, alive to how technical problems relate to larger issues, and above all, passionate and human. (There were many pages where I could hear Cris, full of enthusiasm for the latest puzzle to catch his attention and wanting to share.) I will try to write a proper review later, but in the meanwhile, let me recommend this most strongly to anyone who remembers a little calculus and how vectors add, and finds this blog at all interesting, whether you think you care about computational complexity or not.
Ariana Franklin, The Serpent's Tale and Grave Goods
Mind candy; historical forensic mysteries in medieval England. Entertaining, though the heroine is rather too much a product of the Enlightenment to be believable for the period. (Previously.)
Brian Keene and Nick Mamatas, The Damned Highway: Fear and Loathing in Arkham: A Savage Journey into the Heart of the American Nightmare
Mind candy. A homage to both Hunter S. Thompson and H. P. Lovecraft, merged through the soul-destroying horror that was Richard M. Nixon. They do a good job at capturing not just Thompson's stylistic tics, but also his real themes, and melding them with both the squamous eldritch horrors and political satire.
Scott Westerfeld, Behemoth and Goliath
A pair of delusional and emotionally mangled child soldiers trace a circle of blood around the world. In an appalling lapse of taste on the part of the publisher, marketed as mind candy for teenagers.
(Previously.)
Nick Bostrom and Milan M. Cirkovic (eds.), Global Catastrophic Risks
An edited collection ranging over wrangling about "what counts as a global catastrophic risk to humanity?", a survey of such risks, some more fanciful than others, and general reflections on what our attitudes and policies towards them should be. Edited collections usually have a high variance, but it's perhaps appropriate that the distribution here is rather more weighted towards the extremes than your usual volume of academic papers. (Compare, for instance, the contribution by J. J. Hughes to the two chapters by Yudkowsky, and don't get me started on Richard Posner, Robin Hanson, or Bryan Caplan.) Even making appropriate allowances for this, it's full of fascinating information, and it's a very worthwhile effort to think through these issues.
Disclaimer: Dr. Cirkovic is an on-line acquaintance, and, in a somewhat odd turn of events, I critiqued drafts of pretty much every chapter of the manuscript. Not all of my suggestions were followed, which was probably for the best, but to some extent I had a hand in making this.
Emmanuel Farjoun and Moshé Machover, Laws of Chaos: A Probabilistic Approach to Political Economy
My brief notes grew out of control: A Marxian Econophysics.
Taylor Anderson, Firestorm
Mind candy. "What these lemurs need is a boat-load of vintage honkeys", continued At some point, the guilty pleasure of the series will no doubt pall, but not yet, not least because there are real setbacks for Our Heroes, and because the villains are becoming actual characters.
Michelle Sagara, Cast in Ruin
Mind candy. While I found this really quite unreasonably enjoyable, I can't help thinking that it would be a Good Thing for the series if at some point Kaylin had to try to understand the conflict from the Shadow's perspective. What does it want to break out of the dungeon dimensions (to use a Pratchettism) for?
(There is also an essay to be written about just how urban this fantasy series is, and how its vision of the city reflects a sort of early-21st-century multiculturalism which is quite different from the way that, say, Leiber imagined Lankhmar. But I will leave that for someone else.)
(Previously.)
John M. Chambers, Software for Data Analysis: Programming with R (errata)
The best thing I have encountered on real programming in R, and on why the language is the way it is. It's really quite elegant, even inspiring, but probably works best after some day-to-day acquaintance with R, and with general ideas of programming. Coming to it after the other books on R is like spending a long time reading about the comparative properties of different sorts of cement, and the specifications for various pipes, and then stumbling into a discussion of architecture. It's important that the walls stand up and the toilets flush, yes, but there needs to be a design, too, and that's where this comes in.
— An optional book for Introduction to Statistical Computing; if I've done my job, by the end of th semester some of The Kids will be able to appreciate it.
W. John Braun and Duncan J. Murdoch, A First Course in Statistical Programming with R (selected solutions, errata, etc.)
This is, indeed, a very first course in programming and in R, assuming no previous programming knowledge whatsoever. (In principle, it doesn't even assume prior use of a terminal, but that transition seems, empirically, bigger than they anticipate, and makes me remember In the Beginning was the Command Line more fondly than before.) Required for Introduction to Statistical Computing, where the first half or so of the course closely follows chapters 1--4 (the language, essential commands for numerical manipulation, graphics, writing and debugging functions). Later chapters cover distributions, random variables and simulation; numerical linear algebra; and optimization. I would have liked coverage of functions-as-objects, and of data manipulation, but we're providing that ourselves. It has the three great virtues of being short, selecting the most important points, and being adapted to the meanest understanding.
Paul Teetor, R Cookbook
Best thought of as a reverse index to R's help: instead of "how does this command work, and what can I do with it?", it answers "what commands do I need to do this?". Not suitable as an introduction to the language, but a handy reference. Required for Introduction to Statistical Computing.

Books to Read While the Algae Grow in Your Fur; Enigmas of Chance; Scientifiction and Fantastica; Statistical Computing; The Dismal Science; Physics; Kith and Kin; Cthulhiana; The Beloved Republic; Philosophy; The Natural Science of the Human Species; Complexity; Mathematics

Posted by crshalizi at October 31, 2011 23:59 | permanent link

October 28, 2011

Friday Cat Blogging (The Violence Inherent in the System Edition)

By now, you have probably heard about how the Washington Post decided to illustrate a news story about the Oakland police using tear gas on peaceful demonstrators, breaking skulls, etc., with a picture of a police officer "pet[ting] a cat that was left behind by protestors". There is now an Oakland Riot Cat tumblr, naturally (this is my favorite so far), and I wouldn't be surprised if the animal becomes a minor icon of the movement — and Scott Olsen, the veteran of our war in Iraq who got his head cracked open by the police, becomes a major icon.

What I keep thinking though, is that somebody in Oakland must be very upset not just at having been assaulted by the cops while exercising their rights, but at losing their cat at the same time. How is that cat ever going to get home? Is anybody even trying to get it back to its owner?

— The Oakland Riot Cat tumblr is via Jon Wilkins, who has a rather more important message about helping Olsen pay his medical expenses (!), and who is himself doing his part back east.

Friday Cat Blogging; The Beloved Republic; The Progressive Forces

Posted by crshalizi at October 28, 2011 23:00 | permanent link

October 27, 2011

Nothing to See Here, Move Along

Between now and mid-December, I have a class to teach, a grant proposal to fabricate, and four trips to take and five talks to give. As for manuscripts to referee, letters of recommendation to write, and papers to finish, it would be futile to try counting them; only mass nouns are appropriate. Since it is unlikely that you will see much here other than teaching materials for the next few months, look elsewhere:

(These are some of what I happen to have been reading recently. I should really update my blog-roll, apparently last touched in 2006.)

Linkage

Posted by crshalizi at October 27, 2011 23:55 | permanent link

Bayesianism Not Banned in Britain

Attention conservation notice: 4600 words on a legal ruling in another country, from someone who knows nothing about the law even in his own country. Contains many long quotations from the ruling, plus unexplained statistical jargon; written while trapped in an airport trying to get home, and so probably excessively peevish.

Back at the beginning of the month, as constant readers will recall, there was a bit of a kerfluffle over newspaper reports — starting with this story in the Guardian, by one Angela Saini — to the effect that a judge had ruled the application of Bayes's rule was inadmissible in British courts. This included much wailing and gnashing of teeth over the innumeracy of lawyers and the courts, anti-scientific obsurantism and injustice, etc., etc. At the time, I was skeptical that anything like this had actually happened, but had no better information than the newspaper reports themselves. A reader kindly sent me a copy of the judgment by the court of appeals [PDF], and US Airlines kindly provided me with time to read it.

To sum up what follows, the news reports were thoroughly misleading: the issue in the case was the use not of Bayes's rule but of likelihood ratios; the panel of three judges (not one judge) affirmed existing law, rather than new law; the existing law allows for the use of likelihood ratios and of Bayes's theorem when appropriate; and the court gave sound reasons for thinking that their use in cases like this one would be mere pseudo-science. We are, then, listening to the call-in show on Radio Yerevan:

Question to Radio Yerevan: Is it correct that Grigori Grigorievich Grigoriev won a luxury car at the All-Union Championship in Moscow?

Answer: In principle, yes. But first of all it was not Grigori Grigorievich Grigoriev, but Vassili Vassilievich Vassiliev; second, it was not at the All-Union Championship in Moscow, but at a Collective Farm Sports Festival in Smolensk; third, it was not a car, but a bicycle; and fourth he didn't win it, but rather it was stolen from him.

Taking advantage again of the generous opportunities provided to me by US Airlines, I will try to explain the case before the court, and what it decided and why. [Square brackets will indicate the numbered paragraphs of the judgment.] I will not fisk the news story (you can go back and read it for yourself), but I will offer some speculations about who found this eminently sensible ruling so upsetting, and why, that we got treated to this story.

The Judgment

The case (Regina vs. T.) was an appeal of a murder conviction. The appeal apparently raised three issues, the only one of which is not redacted in the public judgment is "the extent to which evaluative expert evidence of footwear marks is reliable and the way in which it was put before the jury" [1]. One of — and it fact it seems to be the main — pieces of evidence claimed to identify T. as the murder was the match between shoe marks found at the scene of the murder and those of a pair of "trainers" (what I believe we'd call "sneakers") "found in the appellant's house after his arrest" [19]. A forensic technician, one Mr. Ryder, compared the prints and concluded, in a written report, that there was "a moderate degree of scientific evidence to support the view that the [Nike trainers recovered from the appellant] had made the footwear marks" [24]. This report was entirely qualitative and contained no statistical formulas or results of any kind. This, however, did not reflect how the conclusion was actually reached, as I will come to shortly.

Statistics were mentioned during the trial. T.'s lawyers (who seem rather hapless and were not retained on appeal) cross-examined Ryder about

figures in the UK over 7--8 years for the distribution of Nike trainers of the same model as that found in the appellant's house; some figures had been supplied to him by the defence lawyers the day before. Mr. Ryder gave evidence that there were 1,200 different sole patterns of Nike trainers; the pattern of Nike trainers that made the marks on the floor was encountered frequently and had been available since 1995; distribution figures for the pattern were only available from 1999. In the period 1996--2006 there would have been 786,000 pairs of trainers distributed by Nike. On those figures some 3% were size 11 [like those in question: CRS]. The pattern could also have been made by shoes distributed by Foot Locker and counterfeits of Nike shoes for which there were no figures. In answer to the suggestion that the pattern on the Nike trainers found at the appellant's house was of a common type, he said: "It is just one example of the vast number of different shoes that are available and to put the figures into context, there are around 42 million pairs of shoes sold every year so if you put that back over the previous 7 or 8 years, sports shoes alone, that multiplies up to nearly 300 million pairs of sports shoes so that particular number of shoes, produced which is a million, based on round numbers, is a very small proportion." [42]
These figures were repeated, with emphasis, by the trial judge in his instructions to the jury [44].

I said a moment ago that Ryder's written report, pre-trial, was entirely qualitative. This turns out to not really reflect what he did. In addition to looking at the shoes and the shoe-prints, he also worked through a likelihood ratio calculation, as follows [34--38]. The two hypotheses he considered were, as nearly as I can make out, "These prints were made by these shoes", and "These prints were made by some other shoe, randomly selected from all of the UK". (I will come back to these alternatives.) He considered that there were four variables he could work with: the pattern of the print, the size, the amount of wear, and the amount of damage to the shoe.

Pattern
The pattern of the marks at the scene matched that of the shoes recovered from T.'s house. Presumably this had probability (close to) 1 if those shoes left those prints. What probability did they have under the alternative? Ryder took this to be the frequency of that pattern in a database maintained by the Forensic Science Service (FSS), contained "shoes received by the FSS" [36i], and not intended to be a representative sample. This pattern was the most common one in the FSS database, and in fact had a frequency of 20%. So this gave a contribution to the likelihood ratio of 1/0.2 = 5.
Size
Both the shoes and the prints were size 11 (roughly, for the prints), and 3% of the shoes in a database run by a shoe trade association were of that size. (It is not clear to me if this was conditional on the pattern, or if Ryder assumed independence between pattern and size.) Ryder used a likelihood ratio of not 1/0.03 but merely 1/0.10, apparently to allow for imprecision in guessing the size of a shoe from a print.
Wear
"Ryder considered that the wear on the trainers meant that he could exclude half of the trainers of this pattern type and approximate size/configuration. He therefore calculated the likelihood ratio ... as 1/0.5" [36iii].
Damage
"He concluded that he could exclude very few pairs of shoes that could not previously have been excluded by the other factors" [36iv].
Putting this together, Ryder came up with a likelihood ratio of 5*10*2=100 in favor of the shoes at the crime scene being those from T.'s house.

He then turned to a scale which had been plucked from the air (to put it politely) by some forensics policy entrepreneurs a few years before, which runs as follows [31]:

Likelihood ratio Verbal
>1--10 Weak or limited support
10--100 Moderate support
100--1,000 Moderately strong support
1,000--10,000 Strong support
10,000--1,000,000 Very strong support
>1,000,000 Extremely strong support
This is where Ryder's phrase "a moderate degree of scientific evidence" came from. Or, sort of:
In Mr Ryder's reports for the trial... there was no reference at all to any of these statistics, the formula [for the likelihood ratio], or to the use of a likelihood ratio or to the scale of numerical values set out [above]. The conclusion in his first report, which was supported by the statistics, formula, and resulting likelihood ratio, was expressed solely in terms of the verbal scale... this was dated one day after the notes in which he had recorded his calculations. Mr Ryder's explanation for the omission was that it was not standard practice for the detail relating to the statistics and likelihood ratios to be included in a report. He made clear that the data were not available to an exact and precise level and it was only used to confirm an opinion substantially based on his experience and so that it could be expressed in a standardised form. [38]

There are a couple of things to note about this, not all of which the court did.

First, the numbers Ryder used were vastly different from those mentioned during the trial. "He made clear that the pattern was the one that was encountered most frequently in the laboratory, but he did not give the actual figures used by him... even though the figures in the database which he used in his formula were more favorable to the appellant". With those numbers, the likelihood ratio would be not 100:1 but 13,200:1 in favor of T.'s shoes having left the marks. But what's two orders of magnitude in a murder trial between friends?

Second, neither set of numbers is anything like a reliable basis for calculation:

It is evident from the way in which Mr Ryder identified the figures to be used in the formula for pattern and size that none has any degree of precision. The figure for pattern could never be accurately known. For example, there were only distribution figures for the UK of shoes distributed by Nike; these left out of account the Footlocker shoes and counterfeits. The figure for size again could not be any more than a rough approximation because of the factors specified by Mr Ryder. Indeed, as Mr Ryder accepted, there is no certainty as to the data for pattern and size.

More importantly, the purchase and use of footwear is also subject to numerous other factors such as fashion, counterfeiting, distribution, local availability and the length of time footwear is kept. A particular shoe might be very common in one area because a retailer has bought a large number or because the price is discounted or because of fashion or choice by a group of people in that area. There is no way in which the effect of these factors has presently been statistically measured; it would appear extremely difficult to do so, but it is an issue that can no doubt be explored for the future. [81--82]

(The Guardian, incidentally, glossed this as "The judge complained that he couldn't say exactly how many of one particular type of Nike trainer there are in the country", which is not the point at all.)

Third, the use of the likelihood ratio and statistical evidence is more than a bit of a bureaucratic fiction.

Mr Lewis [the "principal scientist as the FSS responsible for Case Assessment and Interpretation"] explained that in relation to footwear the first task of the examiner was to decide whether the mark could have been made by the shoe. If it could have been made, then what the FSS tried to do was to use the likelihood ratio to convey to the court the meaning of "could have been made" and how significant that was.

As Mr Lewis accepted, numbers were not put into reports because there was a concern about the accuracy and robustness of the data, given the small size of the data set and factors such as distribution, purchasing patterns and the like. It was therefore important that the emphasis on the use of a numerical approach was to achieve consistency; the judgment on likelihood was based on experience. [57--58]

Or, shorter: the examiners go by their trained judgments, but then work backwards to the desired numbers to satisfy bureaucratic mandates, even though everyone realizes the numbers don't bear scrutiny.

Fourth, to the extent that likelihood ratios and related statistics actually are part of the forensic process, they need to be presented during the trial, so that they can be assessed like any other evidence. Using them internally for the prosecution, but then sweeping them away, is a recipe for mischief. "It is simply wrong in principle for an expert to fail to set out the way in which he has reached his conclusion in his report.... [T]he practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice." [108] This, ultimately, was the reason for granting the appeal.

So where do we get to the point where (to quote The Guardian again) "a mathematical formula was thrown out of court"? Well, nowhere, because, to the extent that the court limited the use of Bayes's rule and likelihood ratios, it was re-affirming long-settled British law. As the judgment makes plain, "the Bayesian approach" and this sort of use of likelihood ratios were something "which this court had robustly rejected for non-DNA evidence in a number of cases" starting with R. vs. Dennis Adams in 1996 [46]. The basis for this "robust rejection" is also old, and in my view sound:

The principles for the admissibility of expert evidence [are that] the court will consider whether there is a sufficiently reliable scientific basis for the evidence to be admitted, but, if satisfied that there is a sufficiently reliable scientific basis for the evidence to be admitted, then it will leave the opposing views to be tested in the trial before the jury. [70]

In the case of DNA evidence, "there has been for some time a sufficient statistical basis that match probabilities can be given" [77]. But for footwear,

In accordance with the approach to expert evidence [laid down by previous judgments], we have concluded that there is not a sufficiently reliable basis for an expert to be able to express an opinion based on the use of a mathematical formula. There are no sufficiently reliable data on which an assessment based on data can properly be made... An attempt to assess the degrees of probability where footwear could have made a mark based on figures relating to distribution is inherently unreliable and gives rise to a verisimilitude of mathematical probability based on data where it is not possible to build that data in a way which enables this to be done; none in truth exists for the reasons we have explained. We are satisfied that in the area of footwear evidence, no attempt can realistically be made in the generality of cases to use a formula to calculate the probabilities. The practice has no sound basis.

It is of course regrettable that there are, at present, insufficient data for a more certain and objective basis for expert opinion on footwear marks, but it cannot be right to seek to achieve objectivity by reliance on data which does not enable this to be done. We entirely understand the desire of the experts to try and achieve the objectivity in relation to evidence of footwear marks, but the work done has never before, as we understand it, been subject to open scrutiny by a court. [86--87]

It is worth repeating that, despite the newspapers, this is not new law: "It is quite clear therefore that outside the field of DNA (and possibly other areas where there is a firm statistical base), this court has made it clear that Bayes theorem and likelihood ratios should not be used" [90]. Nonetheless, this does not amount to an obscurantist rejection of Bayes's theorem:

It is not necessary for us to consider ... how likelihood ratios and Bayes theorem should be used where there is a sufficient database. If there were a sufficient database in footwear cases an expert might be able to express a view reached through a statistical calculation of the probability of the mark being made by the footwear, very much in the same way as in the DNA cases subject to suitable qualification, but whether the expert should be permitted to go any further is, in our view, doubtful. [91]
The judgment goes on [91--95] to make clear that experts can have a sound scientific basis for their opinions even if these cannot be expressed as statistical calculations from a database. The objection rather is to spurious precision, and spurious claims to a scientific status [96].

There is a legitimate criticism to make of the court here, which is that it is not very specific about what would count as a "sufficient database", or "firm" statistics. It may be that the earlier cases cited fill this in; I haven't read them. This didn't matter for DNA, because people other than the police had other reasons for assembling the relevant data, but for something like shoes it's hard to see who would ever do it other than something like the FSS, and they are not likely to do so without guidance about what would be acceptable to the courts. On the other hand, the judges might have felt that articulating a specific standard simply went beyond what was needed to decide this case.

There is more in the judgment, including a discussion of what the court thought footwear examiners legitimately can and cannot generally say based on the evidence (drawing heavily on how this is done in the US). Rather than go into that, I will mention some more technical issues suggested by, but not discussed in, the judgment.

Some Statistical Commentary

  1. Nobody involved in the case used Bayes's rule. The unfortunate1 Mr. Ryder simply calculated a likelihood ratio. A properly Bayesian approach would have required at least the posterior odds, which would have meant putting a prior probability on the hypothesis that the shoes taken from T.'s house made the marks. (The prior probability of the alternative hypothesis would presumably have been one minus this.) What probability, though, would that have been?
    The current population of the UK is about 60 million. If we thus took the prior odds of T. being the murder as 60 million to 1, then after Ryder's calculation of the likelihood ratio, the posterior odds climb to 600,000 to 1. If one calculates the likelihood ratio from the numbers mentioned at the trial, it comes to 13,200, pushing the posterior odds all the way to 4500 to 1. Presumably the prosecutors would say that the prior odds were a lot better than that, but that hardly helps the case for using Bayes's rule. Two Bayesians, seeing the same evidence and using the same likelihood function, can have posterior odds which are arbitrarily far apart, if their priors are sufficiently different.
    Without those prior probabilities, however, this use of the likelihood ratio is in fact a classic case of base-rate neglect, which is one of the things Bayes's rule is supposed to guard us against2. Of course, one can treat the prior as a testable part of the model, but doing so means giving up on the simple "probability that the hypothesis is true given the evidence" ideology at play here.
  2. Wishing this away, there is still an issue about specifying the alternatives whose likelihood ratio is to be calculated. In this case, the two hypotheses were that the marks were made by the pair of shoes from T.'s house and the marks were made by some other pair of shoes. This was the source of the 100:1 likelihood ratio. If the second hypothesis was the marks were made by some other, equally worn pair of shoes of the same pattern and size, the likelihood ratio would presumably have been pretty close to 1. (Close, because there might be differences due damage, or people carving "for a good time, follow me" or "hah hah, coppers, you'll never prove it!" into their soles, etc.) In the terms used by American footwear examiners [65], the (semi-fictional) likelihood calculation would bear on "class" characteristics, not "identifying" characteristics. Yet one doubts, somehow, that any prosecutor would be inclined to state that "there is extremely weak scientific support for the print having been made by these shoes, rather than others of the same type", which is what the general formulas would entail. But perhaps that is unfair: "It is important to emphasise that the evidence [in DNA cases] is not directed to whether DNA came from the suspect, but the probability of obtaining a match that came from an unknown person who is unrelated to the suspect but has the same profile" [77].
  3. The issue the court correctly raises, about all the factors which could alter the local frequency of shoes, and the difficulty of measuring them, is related to the classic "reference class problem". This is a difficulty confronting simple relative-frequency theories of probability, namely, relative frequency in which "reference class" of instances: shoes sold this year in Britain? shoes sold over the last eight years in Britain? shoes in Bristol? Shoes within a mile of the Clifton Bridge? Shoes worn by respectable Cliftonians? By disreputable Cliftonians?3 Etc.
    Bayesians solve the reference class problem by fiat modeling assumptions. As Aris Spanos points out, so do modern frequentists4. In both cases, though, one then has to justify the model. (Andy is right to keep saying that thinking the likelihood function is just given and beyond question is a serious mistake.) This is not impossible in principle — it's been pretty much done with DNA, for instance — but it would plainly be very hard, for all the reasons the judges list and more besides.

So, we have a situation where the "Bayesian approach" supposedly being taken by the forensic specialists was not noticeably Bayesian, in addition to being based on hopelessly vague numbers and more than a bit of an administrative fiction.

Where Did This Story Come From?

The verbal scale for likelihoods I mentioned above was the brain-child of a trade organization of British forensic specialists [52--53] in the 2000s. It grew out of a movement to formalize the evaluation of forensic evidence through likelihood ratios, which participants described as "the Bayesian approach". "On the evidence before us this development occurred in the late 1990s and was based on the approach to expert evidence on DNA. It was thought appropriate to translate [that] approach... to other areas of forensic evidence" [49]. Several of the leading participants in this movement were evidently employees of the FSS, or otherwise closely affiliated with it. They seem to have been the ones responsible for insisting that all evaluative opinions be justified for internal consumption by a likelihood ratio calculation, and then expressed on that verbal scale.

That they started pushing for that just a few years after the British courts had ruled that such calculations were inadmissible when based on unreliable (or no) data might explain why these calculations were kept internal, rather than being exposed to scrutiny. That they pushed such calculations at all seems to be explained by a very dogmatic case of Bayesian ideology, expressed, e.g., in an extraordinary statement of subjectivism [75] that out-Savages Savage. Why they thought likelihood ratios were the Bayesian approach, though, I couldn't begin to tell you. (It would certainly be news to, say, Neyman and Pearson.) It would be extraordinary if these people were confusing likelihood ratios and Bayes factors, but that's the closest I can come to rationalizing this.

Sociologically considered, "forensic science", so called, is a relatively new field which is attempting to establish itself as a profession, with legitimate and recognized claims to authority over certain matters, what Abbott, in the book linked to just now, calls "jurisdiction". Part of professionalization is convincing outsiders that they really do need the specialized knowledge of the professionals, and it's very common, in attempts at this, for people to try to borrow authority from whatever forms of knowledge are currently prestigious. I suppose it's a good thing for us statisticians that Bayesian inference currently seems, to a would-be profession, like a handy club with which to beat down those who would claim its desired territory.

Still, if this aspect of professionalization often seems like aping the external forms of real science, while missing everything which gives those forms meaning, I think that's because it is. Forensics people making a fetish of the probability calculus when they have no basis for calculation is thus of a piece with attempts to turn cooking into applied biochemistry, or eliminate personality conflicts through item response theory. One has to hope that if a profession does manage to establish itself, it grows out of such things; sometimes they don't.

Naturally, being comprehensively smacked down by the court is going to smart for these people. I imagine prosecutors are unhappy as well, as this presumably creates grounds for appeals in lots of convictions. Expert witnesses (such as those quoted in the Guardian story) are probably not best pleased at having to admit that when they give precise probabilities, it is because their numbers are largely made up. I can sympathize with these people as human beings in an awkward and even, in some cases, deeply unenviable position, and certainly understand why they'd push back. (If I had to guess why a decision dated October 2010 got written up, in a thoroughly misleading way, in a newspaper in October 2011, it would be that it took them a while to find a journalist willing to spin it for them.) But this doesn't change the fact that they are wrong, and the judges were right. If they really want to use these formulas, they need to get better data, not complain that they're not allowed to give their testimony — in criminal trials, no less! — a false air of scientific authority.

Update, next day: Typo fixes, added name and link for the journalist.

Update, 29 October: Scott Martens points to a very relevant paper, strikingly titled "Is it a Crime to Belong to a Reference Class?" (Mark Colyvan, Helen M. Regan and Scott Ferson, Journal of Political Philosophy 9 (2001): 168--181; PDF via Prof. Colyvan). This concerns a US case (United States vs. Shonubi). There, the dispute was not about whether Shonubi was smuggling drugs (he was), or had been convicted fairly (he had), but about whether his sentence could be based on a statistical model of how much he might have smuggled on occasions when he was not caught. The appeals court ruled that this was not OK, leading to a parallel round of lamentations about "the legal system's failure to appreciate statistical evidence" and the like. The paper by Colyvan et al. is a defense of appeals court's decision, largely on the grounds of the reference class problem, or, as they equivalently note (p. 179 n. 27) of model uncertainty (as well as crappy figures), though they also raise some interesting points about utilities.

Manual trackback: Abandoned Footnotes

1: I say "unfortunate", because, while the court makes clear he was just following standard procedure as set by his bosses and is not to be blamed in any way, cannot be a popular man with those bosses after all this.

2: To drive home the difference between more likely and more probable, recall Kahnemann and Tversky's famous example of Linda the feminist bank teller:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable? Linda is a bank teller, or Linda is a bank teller and is active in the feminist movement.
The trick is that while Linda is more likely to be as described if she is a feminist bank teller than if she is a bank teller with unknown views on feminism, she is nonetheless more probable to be a bank teller. Of course in the legal case the alternatives are not nested (as here) but mutually exclusive.

3: I have no reason to think this murder case had anything to do with Bristol in general or Clifton in particular, both of which I remember fondly from a year ago.

4: I think one could do more with notions like ergodicity, and algorithmic, Martin-Löf randomness, than Spanos is inclined to, but in practice of course one simply uses a model.

Enigmas of Chance; Bayes, anti-Bayes

Posted by crshalizi at October 27, 2011 23:50 | permanent link

October 26, 2011

A Marxian Econophysics

Attention conservation notice: 2300 words about an odd, un-influential old book on radical political economy and statistical mechanics, plus gratuitous sniping at respectable mainstream economics.

Rendered insomniac by dental complications, naturally I read about a venture to claim econophysics for Marxism, before econophysics as we know it was even a glimmer in the eye of bourgeois ideology:

Emmanuel Farjoun and Moshé Machover, Laws of Chaos: A Probabilistic Approach to Political Economy. London: Verso, 1983 [Full text online]

I tracked down this down because I somehow ran across a link to a conference devoted to it. It appears to have emerged from the debates provoked among heterodox economists by input-output analysis, especially as employed by Sraffa and his followers.

A word about input-output analysis. This is a technique, developed largely by the great economist Wassily Leontief, for analyzing the technological interdependencies of different sectors of the economy, and especially physical resource flows. Start with some good, say (because I am writing this as I do laundry) washing machines. Making a washing machine calls for certain inputs: so much steel, rubber, glass, wire, a motor, switches, tubing, paint, ball-bearings, etc.; also electric power for the factory, workers, wear and tear on assembly-line machinery. To provide each one of those inputs in turn requires other inputs. Ultimately, one can imagine (if not actually estimate) a gigantic matrix which shows, for each distinct good in the economy, the physical quantity of all other goods required to produce one unit of that commodity. (At least, in a linear approximation.) Given an initial vector of inputs, this defines the range of possibilities of production. Conversely, given a desired vector of outputs, this defines the minimum required inputs. (It is no coincidence that input-output analysis fits so well together with linear programming.)

If one takes input-output analysis seriously, and assumes (following Marx, and indeed the whole tradition of classical economics back to Adam Smith at least) a uniform rate of profit across industries and even firms, then one runs into insuperable difficulties for the labor theory of value. Put simply, prices, at least equilibrium prices, are then determined by the uniform profit rate and the coefficients in the input-output matrix, with no real relation to how much labor goes into different commodities.

The authors — mathematicians who are, plainly, Marxian socialists, if not perhaps strictly Marxists — deny the premise that the rate of profit is uniform. ("Profit" here is defined as money received for goods sold, minus money paid for wages, raw materials, rent, and wear-and-tear on capital assets. It is thus before taxes, repayment of loans, and investment.) They agree that firms and industries where it is above average will tend to attract investment, and those where it is below average will tend to shed capital, and that these forces tend to equalize the profit rate. But they deny that there is any reason to think that this force should produce complete uniformity, or even very close uniformity. After all, there is a tendency for the speed of molecules in a gas to equalize, but that doesn't mean they all end up with the same speed. This is their main, driving analogy, and they think it so important that they devote chapter two [of eight] to accurately expounding the elements of the kinetic theory of gases and of statistical mechanics. They suggest that there should be a random distribution of profit rates, and that (on the analogy with statistical mechanics again, and no deeper reason that I noticed) that it should be a gamma distribution. (Why not a beta? Why not a log-normal? Why any of the cookbook distributions?)

They then try to mesh this with something very much like a labor theory of value, though they are careful not to actually assert such a theory. Starting from the assumption that "labor" is a universal input into the production of all commodities, they define the "labor content" of a commodity as total amount of labor needed to produce it using current technology (and summing over all the goods needed to produce that technology, all the goods needed to produce those goods, and so on). Because this is defined with respect to current technology, this is not the same as the amount of labor which, historically, happened to have gone into any one good. (By design, it is however reminiscent of Marx's attempt to define the value of a commodity as the quantity of "socially necessary" labor time which went into producing it.) They further claim that there will be a certain characteristic distribution of labor content over the commodities bought and sold in a given economy over a given span of time.

With these two distributions, they then argue as follows.

  1. The ratio of market prices to labor contents, both measured in appropriate units, should be a random variable with a small dispersion around 1. This is not quite a labor theory of value, but plainly very close.
  2. The efforts of capitalists to reduce costs, while not aimed at reducing labor content, will tend to do so with high probability, especially over many cycles of technological or organizational innovation. (Since, by (1), switching to lower-priced inputs will tend to switch to ones which also have less labor content.) This is a way of formalizing the idea that "labor becomes increasingly productive" under capitalism.
  3. The global rate of profit, averaged over the entire economy, will vary inversely with the amount of capital per worker, with the amount of capital measured in terms of its labor content, rather than market prices. (That is, they claim the right measure is "How much work would be it be to replace all our capital assets?", and not "How many dollars would we have to spend to replace them?") They observe that there is no reason to think that capital per worker, so defined, tends to increase over time, and indeed much to doubt it (because of (2), the increasing productivity of labor). Thus the average rate of profit should not tend to fall, which is good because, pace Ricardo, Marx, Luxemburg, Lenin, etc., it empirically doesn't.
It should be noted that the whole discussion abstracts away from taxes, social-welfare expenditures by governments, savings by those who sell their labor-power, and rent on land or other natural resources. One could of course try to work through the complications which would result from adding in such "frictions".

It is only in the last chapter that they present any sort of empirical evidence whatsoever. This is scanty, and it is not clear that the compilations they found on profit rates really are using a definition of "profit", much less of "capital", which matches theirs, but the comparison between the histograms and their fitted gamma distributions isn't visually painful. It shows that realized profit rates, from firms which are large enough, and live long enough, to be included in directories of companies have a wide dispersion and are somewhat right-skewed. Even this does not quite settle the matter of uniformity of profit rates. Because investments must be made now for profit later, what the forces of competition should equalize are not these realized, ex-post profit rates, but rather predicted, ex ante rates. Even if everyone agreed in their predictions of profitability (obviously not the case), and even if ex ante rates were uniform, one would expect the ex-post profit rates to have non-trivial dispersion, though a stable distribution for the latter is another story.

To sum up what's gone so far: I am happy with the idea that there is no uniform rate of profit, though their case is hardly air-tight. I am utterly unpersuaded of the attempts to rehabilitate even a shadow of the labor theory of value on this basis. There are, it seems to me, to be two key points where it fails. One is the traditional problem that labor is not really a homogeneous commodity. The other is that labor does not have any unique role in their formal framework.

The traditional issue here is that they have to assume there is a single commodity called "labor" (or "labor-power" or "abstract labor"), and that producing one unit of this requires the same inputs, no matter where in the economic system the labor is applied, i.e., what type of work it is really doing. This has long been recognized as a huge problem with labor theories of value; they devote Appendix II to acknowledging it; and they wave it away. This seems to me to make no more economic sense than lumping together all the different fuels produced by an oil refinery, electricity from a wind-mill, and fields of beets as the commodity of "energy" (or even "abstract energy").

Granting, for the sake of argument, that we can treat all forms of labor as equivalent (including equality in what's needed to produce them), there is still another problem. They can define a labor-content for every commodity because labor is "universal", a direct or indirect input to the production of every other commodity. But this is the only feature of labor which they really use in their arguments. So any other universal input would do as well. Water, for instance, is an input into the production of labor, and so one could just as well go through everything in their analysis in terms of water-content rather than labor-content. Indeed, water and electricity, being much more nearly homogeneous physical substances than "labor", would seem to make an even better basis for the analysis. So to the extent that they have a basis for saying that the ratio between the prices of commodities and their labor content is nearly constant, I could equally say the same of the ratio between prices and water content, or electric content1. They were, I think, aware of this objection to at least some degree, since they single out labor on the grounds that economists should be interested in the metabolism of the social organism, which necessarily involves labor. But I fail to see why materialist economists, studying the social metabolism, should not be equally interested in water, or electricity, or indeed thermodynamic free energy in general.

At a deeper level, Farjoun and Machover think economics suffers from assuming economic variables have deterministic relationships, which we just measure imperfectly; they want to take stochastic models as basic. (They want to introduce noise into the dynamics, and not just into observations2.) I am, naturally, very sympathetic to this, but they fail to convince me that it really would make as much difference as they claim. Someone like Haavelmo could, I think, have accepted this postulate with no change at all in his econometric practice. On the other hand, something like John Sutton's approach of finding inequalities which hold across huge ranges of economic models actually seems to lead to real insights into how the economy is organized and evolves, and is a much bigger departure, methodologically, from the mainstream approach than what Farjoun and Machover advocated.

If you want to understand how capitalism works, I think you are no worse off spending your time reading Farjoun and Machover than, say, Kydland and Prescott3. The math is fine, and where sketchy could be elaborated endlessly by clever graduate students, but in neither case does it really support a valuable understanding of the mechanisms and processes of the real economy, because the mathematical structure is raised along lines laid down by a tradition which is irrelevant when not actively misguided. One might ask, then, why one of these efforts languishes in obscurity, and the other does not, but that's because one of them is very congenial to both right-wing politics and to a well-entrenched style of economics, and the other is not a question I will leave to the competence of the historians of social science.

Manual trackback: Mostly Not About Llamas; Blog Pra falar de coisas; Peter Frase; Abandoned Footnotes

1: This would imply a nearly-constant ratio between labor content and water content, which I suspect would be the ratio of the entries for labor and water in the dominant eigenvector of the input-output matrix. But that's just a guess based on the Frobenius-Perron theorem. (It does not seem worthwhile to pursue this to a definite answer.)

2: Note that in a dynamic stochastic general equilibrium model, the "stochastic" part comes solely from an unobserved, and generically unobservable, "shock" process. (This process may be vector-valued, and its projections along some preferred basis may be given suggestive names, like "technology".) The actions of the agents in such models are however deterministic functions of the state of the system facing them, which leads to the use of complicated, face-saving machinery for observational noise.

3: It may be worth noting that Kydland and Prescott, and their intellectual tradition, also assume homogeneous abstract labor. In fact, the Kydland-Prescott "real business cycle" model further assumes homogeneous abstract "capital", and a single homogeneous abstract consumption good. (One could even argue that it has an embedded labor theory of value.) In fairness, this was to some degree inherited from earlier approaches like Solow's growth model; in further fairness, Solow is too wise to mistake his model for a deep, "structural" description of the economy.

Furthermore, all models in the real business cycle/DSGE tradition have a huge, but generally ignored, measurement problem, since it is by no means obvious that the model variables called "output", "capital", "labor", etc., correspond exactly to standard statistics like GDP, market capitalization, and recorded hours worked (respectively), though almost all attempts to connect these models to data assumes that they do. At most, typically, one allows for IID Gaussian measurement error. (Boivin and Giannoni's "DSGE Models in a Data-Rich Envirnment" is a notable exception, and even they handle this systematic mis-match between theoretical variables and empirical measurements through an ad hoc factor model.) The point being, while Farjoun and Machover's scheme has serious issues with the definition of its variables and their measurement, it is not as though such defects stop economists from adopting modeling approaches they otherwise find attractive, or even bothers them very much.

The Dismal Science

Posted by crshalizi at October 26, 2011 09:45 | permanent link

October 24, 2011

Simulation I: Generating Random Variables (Introduction to Statistical Computing)

Lecture 16: Why simulate? Generating random variables as first step. The built-in R commands: rnorm, runif, etc.; sample. Transforming uniformly-distributed random variables into other distributions: the quantile trick; the rejection method; illustration of the rejection method. Understanding pseudo-random number generators: irrational rotations; the Arnold cat map as a toy example of an unstable dynamical system; illustrations of the Arnold cat map. Controlling the random number seed.

Introduction to Statistical Computing

Posted by crshalizi at October 24, 2011 13:54 | permanent link

Abstraction and Refactoring (Introduction to Statistical Computing)

Lecture 15: Abstraction as a way to make programming more friendly to human beings. Refactoring as a form of abstraction. The rectification of names. Consolidation of related values into objects. Extracting common operations. Defining general operations. Extended example with the jackknife. R.

Introduction to Statistical Computing

Posted by crshalizi at October 24, 2011 13:53 | permanent link

Split, Apply, Combine: Using plyr (Introduction to Statistical Computing)

Lecture 14: Implementing the split/apply/combine pattern with the plyr package. Advantages over implementations in base R. Drawbacks. Examples. Limitations of the split/apply/combine pattern. R and data.

Introduction to Statistical Computing

Posted by crshalizi at October 24, 2011 13:52 | permanent link

Midterm Exam (Introduction to Statistical Computing)

Exam and solutions

Introduction to Statistical Computing

Posted by crshalizi at October 24, 2011 13:51 | permanent link

Split, Apply, Combine: Using Base R (Introduction to Statistical Computing)

Lecture 12: Design patterns and their benefits: clarity on what is to be done, flexibility about how to do it, ease of adapting others' solutions. The split/apply/combine pattern: divide big structured data sets up into smaller, related parts; apply the same analysis to each part independently; combine the results of the analyses. Trivial example: rowSums, colSums. Further examples. Iteration as a verbose, painful and clumsy implementation of split/apply/combine. Tools for split/apply/combine in basic R: the apply function for arrays, lapply for lists, mapply, etc.; split. Detailed example with a complicated data set: Masters 2011 Golf Tournament. R, data.

Introduction to Statistical Computing

Posted by crshalizi at October 24, 2011 13:50 | permanent link

October 08, 2011

Outlier-Robust Linear Regression (Introduction to Statistical Computing)

In which we estimate the parameters of a linear regression by minimizing the median absolute error, rather than the mean squared error, so as to reduce the influence of outliers (and to practice using functions as arguments and as return values).

Assignment (R)

Introduction to Statistical Computing

Posted by crshalizi at October 08, 2011 17:30 | permanent link

Lab: Likelihood (Introduction to Statistical Computing)

In which we made our cats a likelihood function, but we maximized it*.

Assignment, solutions.

*: I was tempted to title this lab "I can has likelihood surface?", but resisted.

Introduction to Statistical Computing

Posted by crshalizi at October 08, 2011 17:20 | permanent link

October 07, 2011

"Learning Richly Structured Representations from Weakly Annotated Data" (This Year at the DeGroot Lecture)

Attention conservation notice: Only of interest if you (1) care about learning complex stochastic models from limited data, and (2) are in Pittsburgh.

The CMU statistics department sponsors an annual distinguished lecture series in memory of our sainted founder, Morris H. DeGroot. This year, it comes at the end of the workshop on Case Studies in Bayesian Statistics and Machine Learning. We are very happy to have as the lecturer Daphne Koller.

"Learning Richly Structured Representations from Weakly Annotated Data"
Abstract: The solution to many complex problems require that we build up a representation that spans multiple levels of abstraction. For example, to obtain a semantic scene understanding from an image, we need to detect and identify objects and assign pixels to objects, understand scene geometry, derive object pose, and reconstruct the relationships between different objects.
Fully annotated data for learning richly structured models can be obtained in very limited quantities; hence, for such applications and many others, we need to learn models from data where many of the relevant variables are unobserved. I will describe novel machine learning methods that can train models using weakly labeled data, thereby making use of much larger amounts of available data, with diverse levels of annotation. These models are inspired by ideas from human learning, in which the complexity of the learned models and the difficulty of the training instances tackled changes over the course of the learning process. We will demonstrate the applicability of these ideas of various problems, focusing on the problem of holistic computer vision.
Time and place:: 4:15 pm on Friday, 14 October 2011, in the McConomy Auditorium in the University Center

As always, the talk is free and open to the public.

Update, after the talk: We more than filled the auditorium; I had to sit on the stairs.

Enigmas of Chance

Posted by crshalizi at October 07, 2011 18:00 | permanent link

October 05, 2011

Functions as Return Values (Introduction to Statistical Computing)

Lecture 11: Functions in R are objects, just like everything else, and so can be returned by other functions, with no special machinery required. Examples from math (especially calculus) of operators, which turn one function into another. The importance of scoping when using functions as return values. Example: creating a linear predictor. Example: implementing the gradient operator (two different ways). Example: writing surface, as a two-dimensional analog to the standard curve. The use of eval and substitute to control when and in what context an expression is evaluated. Three increasingly refined versions of surface, employing eval. — R for examples.

Introduction to Statistical Computing

Posted by crshalizi at October 05, 2011 11:50 | permanent link

October 03, 2011

Functions as Arguments (Introduction to Statistical Computing)

Lecture 10: Functions in R are objects, just like everything else, and so can be both arguments to and return values of functions, with no special machinery required. Examples from math (especially calculus) of functions with other functions as arguments. Some R syntax relating to functions. Examples with curve. Using sapply to extend functions of single numbers to functions of vectors; its combination with curve. We write functions with lower-level functions as arguments to abstract out a common pattern of operations. Example: calculating a gradient. Numerical gradients by first differences, done two different ways. (Limitations of taking derivatives by first differences.) Incorporating this as a part of a larger algorithm, such as gradient descent. Using adapters, like wrapper functions and anonymous functions, to fit different functions together. — R for examples.

Introduction to Statistical Computing

Posted by crshalizi at October 03, 2011 10:30 | permanent link

Lab: Testing Our Way to Outliers (Introduction to Statistical Computing)

In which we use Tukey's rule for identifying outliers as an excuse to learn about debugging and testing.

Assignment, solutions (R)

Introduction to Statistical Computing

Posted by crshalizi at October 03, 2011 10:29 | permanent link

Is Bayesianism Legal in Britain?

Via Mason Porter, Danny Yee and others, I see a news story which my kith are glossing along the lines of a judge has ruled that Bayes's Theorem does not apply in Britain. Leave to one side my "tolerate/hate" relationship with Bayesianism; there are certainly cases, and ones of legal application at that, where Bayes's rule amounts to a simple arithmetic statement about population counts, so it would be very remarkable indeed if these were inadmissible in court. While I enjoy disparaging the innumeracy of the legal profession as much as the next mathematically-trained person, this seems like a distortion.

Let me quote from the Guardian story Mason linked to. (I can't find the actual opinion, at least not without more work than it's worth before lecture.) The story

begins with a convicted killer, "T", who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed.

But more importantly, as far as mathematicians are concerned, the judge also ruled against using similar statistical analysis in the courts in future. ...

In the shoeprint murder case, for example, [applying Bayes's rule] meant figuring out the chance that the print at the crime scene came from the same pair of Nike trainers as those found at the suspect's house, given how common those kinds of shoes are, the size of the shoe, how the sole had been worn down and any damage to it. Between 1996 and 2006, for example, Nike distributed 786,000 pairs of trainers. This might suggest a match doesn't mean very much. But if you take into account that there are 1,200 different sole patterns of Nike trainers and around 42 million pairs of sports shoes sold every year, a matching pair becomes more significant.

The data needed to run these kinds of calculations, though, isn't always available. And this is where the expert in this case came under fire. The judge complained that he couldn't say exactly how many of one particular type of Nike trainer there are in the country. National sales figures for sports shoes are just rough estimates.

And so he decided that Bayes' theorem shouldn't again be used unless the underlying statistics are "firm". The decision could affect drug traces and fibre-matching from clothes, as well as footwear evidence, although not DNA.

What I take from this is that the judge was asking for reasons to believe the numbers going in to Bayes's rule be accurate. This is, of course, altogether the right reaction. Unless the component numbers in the calculation --- the base rates and the likelihoods --- are right, the posterior probability has no value as evidence, because it has no connection whatsoever to the truth. Unless those components are validated, the differences between a witness who says "My posterior probability is 0.99" and one who says "I'm, like, really sure" are:

  1. The former carries an air of precision, apt to impress juries and judges, which the latter lacks; while
  2. That air of precision is entirely spurious.

To reinforce just how wrong a simple-minded application of Bayes's rule can go, I invite you to consider the saga of the Phantom of Heilbronn. The combined police forces of Europe spent years searching for a criminal known from high-quality forensic evidence (DNA) left at more 40 crime scenes across a wide swathe of Europe. In the end, it turned out that the reason all these different crime scenes turned up the same DNA, is that the swabs used to collect the DNA from the scenes all came from the same factory, and had been contaminated by DNA from a worker there. (Presumably the contamination was accidental.) The case unraveled because while the common DNA was female, it was recovered from a male corpse. If it had been recovered from some unfortunate woman, it's very likely that this would now be regarded as a closed case. No doubt we would then be hearing Bayesian calculations about the odds against the suspect being anyone other than the Heilbronn serial killer --- who, recall, did not exist. (In fact, it's instructive to do a back-of-the-envelope version of the calculation, ignoring the contamination of the swabs.) If you say "Well, of course those calculations are off, the likelihood of the suspect matching a crime-scene in the test when the suspect wasn't really there is all wrong", I can only reply, "Exactly", and add that sensitivity analysis is no substitute for actually understanding where and how the data arise. This is related, of course, to the certainty of the Bayesian fortune-teller.

It is never pleasant to have claims to professional authority checked, so I certainly feel where my learned British colleagues are coming from*. But I have to conclude that, in so far as the judge said that Bayes's rule "shouldn't ... be used unless the underlying statistics are 'firm'," he was being entirely reasonable. He may, of course, have gone on to establish unreasonable standards for what counts as "firm" statistics; the news stories don't say. Unless that can be shown, however, the most damning verdict we statisticians can return is (what else?) "not proven".

Update, later that day: A reader has kindly supplied me with a copy of the ruling. On a first scan, phrases like "Maths! Nasty, wicked, tricksy maths! We hates them, Precious, hates them forever!" are absent, but I will try to read it and report back.

Update, 27 October: More than you would ever want to know.

*: Let me remind them that one trick which is proven to help people use Bayes's rule rightly is to eschew talk of probabilities, and employ frequency formats. Since Gigerenzer and Hoffrage were able to get doctors — a tribe notorious for their mis-understanding and mis-use of inverse probability — to use Bayes's rule correctly this way, it would be rather surprising if lawyers weren't helped too.

Enigmas of Chance; Bayes, anti-Bayes

Posted by crshalizi at October 03, 2011 08:50 | permanent link

September 30, 2011

Books to Read While the Algae Grow in Your Fur, September 2011

Attention conservation notice: I have no taste.

(Out of sequence because I didn't get around to posting on the weekend.)

Despite how it looks, I actually put most of my reading time this month into a most wonderful mathematical book, but a review will have to wait until I am completely finished with it.

W. J. Cash, The Mind of the South
An intelligent and (in several senses) liberal white southerner's attempt, in 1940, to explain not just how his fellows think, but how they came to think that way; along the way, he engages in a lot of debunking of the then-received story of the South. It's persuasive in many respects, but I am utterly incompetent to speak to how much of its argument might have been superseded by later historians. (The link above is to a reprint with a recent introduction, but I read an edition that seems to be from about 1960.) Three things strike me about it, read at a distance of seventy years and a couple of cultural zones:
  1. The transformation of the South has been absolutely immense. Cash clearly did not expect lynching to end any time in the foreseeable future; something like the Civil Rights movement dragging the country, protesting, into an profound moral renovation wasn't even in contemplation. And, on a sheer material level, places like Atlanta, or the Research Triangle, never mind suburban northern Virginia, have been transformed out of all recognition since Cash wrote. Yet some of the cultural patterns persist, and not always just the most obvious ones.
  2. I'm tempted to say that Cash's descriptions of "Negroes", and, relatedly, his evaluation of Reconstruction, were enlightened for his time and background, but he strikes me as the kind of man who would have rejected that as condescending. So, I'll say instead that much of what he says about these matters, especially about the mentality of black southerners, is deeply disappointing, not just for its content but also its thoughtlessness, the way it flouts his own manifest intellectual standards. (Likewise, it grated, after a while, that when he spoke of "the South", without qualification, he meant its white inhabitants, though he knew perfectly well that numerous black people had been there the whole time, and were not just biologically but also culturally akin to the white people.) It's a sobering reminder of what, within living memory, was acceptable opinion, and this from someone whose contempt for the Klan, Jim Crow, and the whole pathology of "nigger-baiting and nigger-hazing" (his phrase) seems based equally on their stupidity and their wickedness. And I don't feel any intrinsic superiority to Cash, quite the contrary; I had the advantage of having grown up after a successful revolution, in an environment which took for granted that the Civil War was about treason in defense of slavery, and that Frederick Douglass was my home state's greatest son. This makes me wonder what I am blind to.
  3. Above everything else, the man could write.
Dana Priest and William Arkin, Top Secret America: The Rise of the New American Security State
An amplification of their outstanding series of articles, and databases, in the Washington Post. It's journalism of a very high order, requiring an immense amount of focus and time, about a very important and almost entirely negative development, viz., the creation of a vast, unaccountable, national surveillance state, complete with a social base in the form of a massive network of contracting firms, a military-intelligence-industrial complex. This serves no detectable, legitimate public purpose, and appears to be predicated on the proposition that there is no cost to pursuing false positives. It does, however, help erode civil liberties and democratic accountability here at home (going hand in glove with efforts to maintain irrational fear, and propagate racist idiocy), encourage counter-productive initiatives abroad, and contribute to the more-or-less-corrupt enrichment of private persons at the public expense. No good has come of it, and it's hard to see how any could.
— Something Priest and Arkin make clear in passing is that this sort of highly compartmented secrecy is really very bad for problem solving. It reduces the number of people who can contribute to solutions; it reduces their diversity; it makes it harder to learn from others. Perhaps most insidiously, it makes it harder to learn from one's own mistakes, by removing incentives for recognizing them.
One point Priest and Arkin do not press, though they could have, is that it is very dubious that such secrecy actually hides things from enemies, as opposed to the people supposedly being served. Anything reporters for the Post can find, and largely from public sources at that (e.g., job ads), could presumably be found by a foreign government's intelligence services. (Though they may find it more efficient to just buy members of the US apparatus.) Similarly, when the US launches drone attacks in other countries, these are not exactly secret in those countries, and they are certainly not secret from those attacked.
The biggest weakness I find in the book is that the national surveillance state is not just a post-9/11 growth. It is an extension of the national security state assembled for the Cold War, with many of the same organizations playing many of the same institutional games, both in the government and in private industry. This continuity is not explored, and I wish it had been.
It's idle to speculate about the future of this complex of organizations. The point is to either get rid of it, or re-direct it to ends which are actually worth achieving. But the political-economic obstacles to doing so look immense: no one participating in it now has any self-interested reason to want to shrink it or even reform it, and what politician wants to run on a platform of "doing less to keep us safe from terrorists"? (Perhaps the best that could be hoped for would be containment, and letting "Top Secret America" be gradually undermined by neglect, demoralization, Baumol's cost disease, and the attractive example of an open society in real America.) The situation would be even worse without the work of Priest and Arkin.
Patrick O'Brian, The Mauritius Command; Desolation Island; The Fortune of War; The Surgeon's Mate; The Ionian Mission
I am being very gluttonous with re-reading these books, yes.
"Elsinore itself? The very Elsinore? God bless my soul: and yours too, joy. A noble pile. I view it with reverence. I had supposed it to be merely ideal — hush, do not move. They come, they come!"

A flight of duck wheeled overhead, large powerful heavy swift-flying duck in files, and pitched between the castle and the ship.

"Eiders without a doubt," said Stephen, his telescope fixed upon them. "They are mostly young: but there on the right is a drake in full dress. He dives: I see his black belly. This is a day to mark with a white stone." A great jet of white water sprang from the surface of the sea. The eiders vanished. "Good God!" he cried, staring in amazement, "What was that?"

"They have opened on us with their mortars," said Jack. "That was what I was looking for." A puff of smoke appeared on the nearer terrace, and half a minute later a second fountain rose, two hundred yards short of the Ariel.

"The Goths," cried Stephen, glaring angrily at Elsinore. "They might have hit the birds. These Danes have always been a very froward people. Do you know, Jack, what they did at Clonmacnois? They burnt it, the thieves, and their queen sat on the high altar mother-naked, uttering oracles in a heathen frenzy. Ota was the strumpet's name. It is all of a piece: look at Hamlet's mother. I only wonder her behaviour caused any comment."

But also: What the Hell did I really understand about these characters when I was twenty and first read the books?
ObLinkage: Jo Walton on the series; for instance, on Desolation Island, leading (via P.N.H.) to a link excerpting the extraordinary passage about the sinking of the Waakzaamheid.
John Scalzi, The Ghost Brigades; The Last Colony; Zoe's Tale
Sequels to Old Man's War, but I read them with great enjoyment after a gap of four years, having forgotten the details of the first book. Primarily this enjoyment came from Scalzi having a reasonably clever story, and making me care much more about the characters and their fates than I thought I would. But he also handles changes in his characters' internal perspectives deftly (and they do change), and manages to be matter-of-fact about their post-humanity not because he hasn't thought it through, but precisely because he obviously has. The last two books pull off the neat trick of telling the same story from two points of view, without being redundant.
There is a nice essay to be written about how thoroughly this set of books subverts the premises of Starship Troopers, while at the same time having so clearly learned from Heinlein, somewhat as with Panshin's Rite of Passage. (To a lesser extent that's also true of the relation between these books, especially The Ghost Brigades, and the loathsome Ender's Game, though there Scalzi is not, thankfully, even trying to push the buttons at which Card pounds.) ROT-13'd for spoilers: Fpnymv unf perngrq n fpranevb va juvpu bayl gubfr jub cnegvpvcngr va gur zvyvgnel, svtugvat ntnvafg nyy znaare bs nyvraf sbe gur rkcnafvba bs uhznavgl, unir nal erny fnl va tbireazrag, naq ner gur bayl barf, fhccbfrqyl, jub pna snpr gur uneq gehguf bs gur fvghngvba naq znxr gur arprffnel qvssvphyg qrpvfvbaf. Naq ol gur raq bs gur frevrf, guvf cbyvpl unf yrq gb vzzrafr pngnfgebcurf naq irel arneyl gur rkgvapgvba bs gur fcrpvrf, which does tend to take the glow off the idea. (It also reflects paying some attention to the experience of the 20th and for that matter of the 19th century, unlike some people.) But, again like Panshin's book, this series is not a simple satire or deflation of Heinlein; I think it would be deeply enjoyable for someone who had never encountered or even heard of the latter.
Errata: In chapter seven of Ghost Brigades, for "heirarch", read "hierarch" throughout.

Books to Read While the Algae Grow in Your Fur; The Commonwealth of Letters; Scientifiction and Fantastica; The Beloved Republic; Writing for Antiquity; The Continuing Crises

Posted by crshalizi at September 30, 2011 23:59 | permanent link

September 28, 2011

Rancorous Testing (Introduction to Statistical Computing)

In which we practice debugging and testing, while learning about measures of nonlinear association.

Assignment (R)

Introduction to Statistical Computing

Posted by crshalizi at September 28, 2011 15:16 | permanent link

Testing (Introduction to Statistical Computing)

Lecture 9: Our code implements a method for solving problems we expect to encounter in the future; but why should we trust those solutions? We establish the reliability of the code by testing it. To respect the interfaces of the code, we test the substance of the answers, not the procedure used to obtain them, even though it is the reliability of the procedure we ultimate care about. We test both for the actual answer in particular cases and by cross-checking different uses of the same code which should lead to the same answer. Because we do not allow our tests to give us any false alarms, their power to detect errors is limited, and must be focused at particular kinds of errors. We make a virtue of necessity by using a diverse battery of tests, and shaping the tests so that they tell us where errors arise. The testing-programming cycle alternates between writing code and testing its correctness, adding new tests as new errors are discovered. The logical extreme of this is test-driven development, where tests represent the specification of the software's behavior in terms of practical consequences. Drawbacks of testing. Some pointers to more advanced tools for writing, maintaining and using tests in R.

(Why yes, this lecture was something of a lay sermon on epistemology.)

Introduction to Statistical Computing

Posted by crshalizi at September 28, 2011 15:15 | permanent link

September 27, 2011

New "data scientist" is but old "statistician" writ large

Attention conservation notice: Defense of professional territory (or jursidiction) against potential rivals.

Cathy O'Neil has an interesting post up about "Why and how to hire a data scientist for your business". I confess that I have never been on the hiring end of such a decision, but everything she says sounds quite reasonable. What strikes me about it, though, is that the skills she's describing a good "data scientist" as having are a subset of the skills of a good statistician. At most, they are a subset of the skills of a good computationally competent statistician. These are even, at least here, undergraduate-level skills. Everyone who gets a bachelor's degree from our department has, after all, taken modern regression and advanced data analysis, and most of them respond to our promptings to take statistical graphics and visualization, data mining, and/or statistical computing. (IMHO, graphics and computing ought to be mandatory courses, but that's another story for another audience.) While I modestly admit to the unrivaled greatness of our undergrad program, I draw two conclusions:

  1. Other people re-inventing the job of statisticians under a new name is a sign that we really need to do better at spreading the word about what we know and what we can do.
  2. If you want a data scientist, get a CMU statistics major.

Obligatory disclaimer: I am, of course, speaking for myself and not for the department, much less the school.

Manual trackback: Mims's Bits

Enigmas of Chance; Corrupting the Young

Posted by crshalizi at September 27, 2011 13:31 | permanent link

September 26, 2011

Annual Call to the Adobe Tower (Dept. of Signal Amplification)

Yet again, the Santa Fe Institute is recruiting post-docs for three year appointments. If the idea of having the freedom to pursue your own interdisciplinary research in a remarkably stimulating, genuinely collaborative, and physical beautiful environment is appealing, then I strongly encourage you to apply. (Even though more applications will mean more for me to read during the evaluations.) Follow the link for details.

Signal Amplification; Complexity

Posted by crshalizi at September 26, 2011 12:25 | permanent link

Debugging (Introduction to Statistical Computing)

Lecture 8: Debugging is an essential and perpetual part of programming. Debugging as differential diagnosis: characterize the bug, localize it in the code, try corrections. Tactics for characterizing the bug. Tactics for localizing the bug: traceback, print, warning, stopifnot. Test cases and dummy input generators. Interactive debuggers. Programming with an eye to debugging: writing code with comments and meaningful names; designing the code in a top-down, modular, functional manner. A hint at the exception-handling system.

Introduction to Statistical Computing

Posted by crshalizi at September 26, 2011 12:20 | permanent link

Lab: Further Errors of the Cat Heart (Introduction to Statistical Computing)

In which we meet the jackknife, by way of seeing how much error there is in our estimates from the last lab.

Lab 4 (R), solutions (their R)

Introduction to Statistical Computing

Posted by crshalizi at September 26, 2011 09:32 | permanent link

Standard Errors of the Cat Heart (Introduction to Statistical Computing)

In which we meet the parametric bootstrap traveling incognito probe the precision of our estimation method from the last lab, by seeing how well it would work when the model is true and we know the parameters.

Assignment

Introduction to Statistical Computing

Posted by crshalizi at September 26, 2011 09:31 | permanent link

Scope (Introduction to Statistical Computing)

Lecture 7: R looks for the values of names in the current environment; if it cannot find a value, it looks for the name in the environment which spawned this one, and so on up the tree to the common, global environment. Assignment is modifying the name/value association list which represents the environment. The scope of a name is limited by the current environment. Implications: changes within the current scope do not propagate back to the larger environments; changes in the larger environment do propagate to all smaller ones which it encloses, unless over-ridden by local names. Subtlety: the larger environment for a function is the one in which it was defined, not the one in which it is called. Some implications for design. Examination of the last homework from this stance.

Introduction to Statistical Computing

Posted by crshalizi at September 26, 2011 09:30 | permanent link

September 24, 2011

Next Week at the Statistics Seminar; Week After Next at the Machine Learning Seminar

There's not much connection between the talks, other than that they should both be great, and I don't feel like writing two posts.

Ronald Coifman, "Analytic Organization of Observational Databases as a Tool for Learning and Inference"
Abstract: We describe a mathematical framework to learn and organize databases without incorporation of expert information. The database could be a matrix of a linear transformation for which the goal is to reorganize the matrix so as to achieve compression and fast algorithms. Or the database could be a collection of documents and their vocabulary, an array of sensor measurements such as EEG, or a financial time series or segments of recorded music. If we view the database as a questionnaire, we organize the population into a contextual demographic diffusion geometry and the questions into a conceptual geometry; this is an iterative process in which each organization informs the other, with the goal of entropy reduction of the whole data base.
This organization being totally data agnostic applies to the other examples thereby generating automatically a data driven conceptual/contextual pairing. We will describe the basic underlying tools from Harmonic Analysis for measuring success in extracting structure, tools which enable functional regression prediction and basically signal processing methodologies.
Time and Place: 4:30--5:30 pm on Monday, 26 September 2011 in Baker Hall, Giant Eagle Auditorium (A51)
Alex Smola, "Scaling Machine Learning to the Internet"
Abstract: In this talk I will give an overview over an array of highly scalable techniques for both observed and latent variable models. This makes them well suited for problems such as classification, recommendation systems, topic modeling and user profiling. I will present algorithms for batch and online distributed convex optimization to deal with large amounts of data, and hashing to address the issue of parameter storage for personalization and collaborative filtering. Furthermore, to deal with latent variable models I will discuss distributed sampling algorithms capable of dealing with tens of billions of latent variables on a cluster of 1000 machines.
The algorithms described are used for personalization, spam filtering, recommendation, document analysis, and advertising.
Time and Place: 3--4 pm on Thursday, 6 October in Gates Hall 8102

As always, both talks are free and open to the public.

Enigmas of Chance

Posted by crshalizi at September 24, 2011 16:06 | permanent link

September 19, 2011

Top-Down Design (Introduction to Statistical Computing)

Lecture 6: Top-down design is a recursive heuristic for solving problems by writing functions: start with a big-picture view of the problem; break it into a few big sub-problems; figure out how to integrate the solutions to each sub-problem; and then repeat for each part. The big-picture view: resources (mostly arguments), requirements (mostly return values), the steps which transform the one into the other. Breaking into parts: try not to use more than 5 sub-problems, each one a well-defined and nearly-independent calculation; this leads to code which is easy to understand and to modify. Synthesis: assume that a function can be written for each sub-problem; write code which integrates their outputs. Recursive step: repeat for each sub-problem, until you hit something which can be solved using the built-in functions alone. Top-down design forces you to think not just about the problem, but also about the method of solution, i.e., it forces you to think algorithmically; this is why it deserves to be part of your education in the liberal arts. Exemplification: how we could write the lm function for linear regression, if it did not exist and it were necessary to invent it.

Introduction to Statistical Computing

Posted by crshalizi at September 19, 2011 10:30 | permanent link

Lab: Of Big- and Small- Hearted Cats (Introduction to Statistical Computing)

In which we practice the arts of writing functions and of estimating distributions, while contemplating just how little room there is in the heart of a cat.

Lab; solutions

Introduction to Statistical Computing

Posted by crshalizi at September 19, 2011 10:29 | permanent link

September 18, 2011

"I was of three minds, / Like a tree / In which there are three blackbirds"

Attention conservation notice: 1900+ words of log-rolling promotion of an attempt by friends to stir up an academic controversy, in a matter where pedantic points of statistical theory intersect the artificial dilemmas of psychological experiments.

There's a growing interest among psychologists in modeling how people think as a process of Bayesian learning. Many of the papers that come from this are quite impressive as exercises in hypothetical engineering, in the Design for a Brain tradition, but long-time readers will be bored and unsurprised to hear that I don't buy them as psychology. Not only do I deny that Bayesianism is any sort of normative ideal (and so that Bayesian models are standards of rationality), but the obstacles to implementing Bayesian methods on the nervous system of the East African Plains Ape seem quite insurmountable, even invoking the computational power of the unconscious mind*. Nonetheless, there are all those experimental papers, and it's hard to argue with experimental results...

Unless, of course, the experimental results don't show what they seem to. This is the core message of a new paper, whose insight is completely correct and something I kick myself for not having realized.

Frederick Eberhardt and David Danks, "Confirmation in the Cognitive Sciences: The Problematic Case of Bayesian Models", Minds and Machines 21 (2011): 389--410, phil-sci/8778
Abstract: Bayesian models of human learning are becoming increasingly popular in cognitive science. We argue that their purported confirmation largely relies on a methodology that depends on premises that are inconsistent with the claim that people are Bayesian about learning and inference. Bayesian models in cognitive science derive their appeal from their normative claim that the modeled inference is in some sense rational. Standard accounts of the rationality of Bayesian inference imply predictions that an agent selects the option that maximizes the posterior expected utility. Experimental confirmation of the models, however, has been claimed because of groups of agents that "probability match" the posterior. Probability matching only constitutes support for the Bayesian claim if additional unobvious and untested (but testable) assumptions are invoked. The alternative strategy of weakening the underlying notion of rationality no longer distinguishes the Bayesian model uniquely. A new account of rationality — either for inference or for decision-making — is required to successfully confirm Bayesian models in cognitive science.

Let me give an extended quotation from the paper to unfold the logic.

In a standard experimental set-up used to confirm a Bayesian model, experimental participants are provided with a cover story about the evidence they are about to see. This cover story indicates (either implicitly or explicitly) the possible hypotheses that could explain the forthcoming data. Either the cover story or pre-training is used to induce in participants a prior probability distribution over this space. Eliciting participants' prior probabilities over various hypotheses is notoriously difficult, and so the use of a novel cover story or pre-training helps ensure that every participant has the same hypothesis space and nearly the same prior distribution. In addition, cover stories are almost always designed so that each hypothesis has equal utility for the participants, and so the participant should care only about the correctness of her answer. In many experiments, an initial set of questions elicits the participant's beliefs to check whether she has extracted the appropriate information from the cover story. Participants are then presented with evidence relevant to the hypotheses under consideration. Typically, in at least one condition of the experiment, the evidence is intended to make a subset of the hypotheses more likely than the remaining hypotheses. After, or sometimes even during, the presentation of the evidence, subjects are asked to identify the most likely hypothesis in light of the new evidence. This identification can take many forms, including binary or n- ary forced choice, free response (e.g., for situations with infinitely many hypotheses), or the elicitation of numerical ratings (for a close-to-continuous hypothesis space, such as causal strength, or to assess the participant's confidence in their judgment that a specific hypothesis is correct). Any change over time in the responses is taken to indicate learning in light of evidence, and those changes are exactly what the Bayesian model aims to capture.

These experiments must be carefully designed so that the experimenter controls the prior probability distribution, the likelihood functions, and the evidence. This level of control ensures that we can confirm the predictions of the Bayesian model by directly comparing the participants' belief changes (as measured by the various elicitation methods) with the mathematically computed posterior probability distribution predicted by the model. As is standard in experimental research, results are reported for a participant population (split over the experimental conditions) to control for any remaining individual variation. Since the model is supposed to provide an account of each participant in the population individually, experimental results must be compared to the predictions of an aggregate (or "population") of model predictions.

Here's the problem: in these experiments (at least the published ones...), there is a decent match between the distribution of choices made by the population, and the posterior distribution implied plugging the experimenters' choices of prior distribution, likelihood, and data into Bayes's rule. This is however not what Bayesian decision theory predicts. After all, the optimal action should be a function of the posterior distribution (what a subject believes about the world) and the utility function (the subjects' preferences over various sorts of error or correctness). Having carefully ensured that the posterior distributions will be the same across the population, and having also (as Eberhardt and Danks say) made the utility function homogeneous across the population, Bayesian decision theory quite straightforwardly predicts that everyone should make the same choice, because the action with the highest (posterior) expected utility will be the same for everyone. Picking actions frequencies proportional to the posterior probability is simply irrational by Bayesian lights ("incoherent"). It is all very well and good to say that each subject contains multitudes, but the experimenters have contrived it that each subject should contain the same multitude, and so should acclaim the same choice. Taking the distribution of choices across individuals to confirm the Bayesian model of a distribution within individuals then amounts to a fallacy of composition. It's as though the poet saw two of his three blackbirds fly east and one west, and concluded that each of the "was of three minds", two of said minds agreeing that it was best to go east.

By hypothesis, then, the mind is going to great lengths to maintain and update a posterior distribution, but then doesn't use it in any sensible way. This hardly seems sensible, let alone rational or adaptive. Something has to give. One possibility, of course, is that is sort of cognition is not "Bayesian" in any strong or interesting sense, and this is certainly the view I'm most sympathetic to. But in fairness we should (as Eberhardt and Danks) do, explore branches of the escape tree for the Bayesians.

There are, of course, situations where the utility-maximizing strategy is randomized; but the conditions needed for that don't seem to hold for these sorts of experiments. The decision problem the experimentalists are trying to set up is one where the optimal decision is indeed a deterministic function of the posterior distribution. And even when a randomized strategy is optimal, it rarely just matches posterior probabilities. An alternative escape is to consider that why the experimentalists try to make prior, likelihood, data and utility homogeneous across the subject population, they almost certainly don't succeed completely. One way this could be modeled is to actually include a random term in the decision model. This sort of technology has actually been fairly well developed by economists, who also try to match actual human behavior to (specious, over-precise) models of choice. This "curse of determinism" is broken by economists by adding a purely stochastic term to the utility being maximized, leading to a distribution of choices. Such random-utility models have not been applied to Bayesian cognition experiments, and, yet again, assuming that the individual-level noise terms could be adjusted just so as to get the distribution of individual choices to approximate the noise-free posterior distribution, why should they be?

Now, I do want to raise a possibility which goes beyond Eberhardt and Danks, which goes to the specificity of the distributional evidence. The dynamics of Bayesian updating is an example of the replicator dynamics from evolutionary theory, with hypotheses as replicators and fitness as likelihood. But not only is Bayes a very narrow special case of the replicator equations (no sources of variation analogous to mutation or sex; no interaction between replicators analogous to frequency dependence), lots of other adaptive processes approximately follow those equations as well. Evolutionary search processes (a la Holland et al.'s Induction) naturally do so, for instance, but so does mere reinforcement learning, as several authors have shown. At the level of changing probability distributions within an individual, all of these would look extremely similar to each other and to Bayesian updating. Even if Bayesian models find a way to link distributions within subjects to distributions across populations, specifically supporting Bayesian models would need evidence which differentially favored them over all other replicator-ish models. One way to provide such differential support would be to show that Bayesian models are not only rough matches matches to the data, they fit it in detail, and fit it better than non-Bayesian models could. Another kind of differential support would be showing that the Bayesian models account for other features of the data, beyond the dynamics of distributions, that their rivals do not. It's for the actual psychologists to say how much hope there is for any such approach; I will content myself by observing that it is very easy to tell an evolutionary-search or reinforcement-learning story that ends with the distribution of people's choices matching the global probability distribution**.

What is not secondary at all is the main point of this paper: Bayesian models of inference and decision do not predict that the population distribution of choices across individuals should mirror the posterior distribution of beliefs within each individual. That is rather so far from the models' predictions as to refute the models. Perhaps, with a lot of technical work in redefining the decision problem and/or modeling experimental noise, the theories could be reconciled with the data. Unless that work is done, and done successfully, then as accounts of human cognition these theories are doomed. Anyone who finds these issues interesting would do well to read the paper.

Disclaimer: Frederick is a friend, and David is on the faculty here, though in a different department. Neither of them is responsible for anything I'm saying here.

*: There are times when uninstructed people are quite good at using Bayes's rule: these are situations where they are presented with some population frequencies and need to come up with others. See Gerd Gigerenzer and Ulrich Hoffrage, "How to Improve Bayesian Reasoning without Instruction: Frequency Formats", Psychological Review 102 (1995): 684--704, and Leda Cosmides and John Tooby, "Are Humans Good Intuitive Statisticians After All? Rethinking Some Conclusions from the Literature on Judgement Under Uncertainty", Cognition 58 (1996): 1--73 [PDF]. In my supremely arrogant and unqualified opinion, this is one of those places where evolutionary psychology is not only completely appropriate, but where Cosmides and Tooby's specific ideas are also quite persuasive.

**: It is also very easy to tell an evolutionary-search story in which people have new ideas, while (as Andy and I discussed) it's impossible for a Bayesian agent to believe something it hasn't always already believed at least a little.

Bayes, Anti-Bayes; Minds, Brains, and Neurons; Enigmas of Chance; Kith and Kin

Posted by crshalizi at September 18, 2011 21:29 | permanent link

September 15, 2011

Improving Estimation by Nonlinear Least Squares (Introduction to Statistical Computing)

In which we see how to estimate both parameters of the West et al. model from lab, in the process learning about writing functions, decomposing problems into smaller steps, testing the solutions to the smaller steps, and minimization by gradient descent.

Assignment

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:06 | permanent link

Writing Multiple Functions (Introduction to Statistical Computing)

Lecture 5: Using multiple functions to solve multiple problems; to sub-divide awkward problems into more tractable ones; to re-use solutions to recurring problems. Value of consistent interfaces for functions working with the same object, or doing similar tasks. Examples: writing prediction and plotting functions for the model from the last lab. Advantages of splitting big problems into smaller ones with their own functions: understanding, modification, design, re-use of work. Trade-off between internal sub-functions and separate functions. Re-writing the plotting function to use the prediction function. Recursion. Example: re-writing the resource allocation code to be more modular and recursive. R for examples.

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:05 | permanent link

Writing and Calling Functions (Introduction to Statistical Computing)

Lecture 4: Just as data structures tie related values together into objects, functions tie related commands together into objects. Declaring functions. Arguments (inputs) and return values (outputs). Named arguments, defaults, and calling functions. Interfaces: controlling what the function can see and do; first sketch of scoping rules. The importance of the interface. An example of writing and improving a function, for fitting the model from the last lab. R for examples.

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:04 | permanent link

Lab: Flow Control and the Urban Economy (Introduction to Statistical Computing)

In which we use nonlinear least squares to fit the West et al. model.

Lab, R code; solutions.

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:03 | permanent link

Tweaking Resource-Allocation-by-Tweaking (Introduction to Statistical Computing)

In which we make incremental improvements to our code for planning by incremental improvements.

Assignment, code.

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:02 | permanent link

Flow Control, Looping, Vectorizing (Introduction to Statistical Computing)

Lecture 3: Conditioning the calculation on the data: if; what is truth?; Boolean operators again; switch. Iteration to repeat similar calculations: for and iterating over a vector; while and conditional iteration (reducing for to while); repeat and unconditional iteration, with break to exit loops (reducing while to repeat). Avoiding iteration with "vectorized" operations and functions: the advantages of the whole-object view; some examples and techniques: mathematical operators and functions, ifelse; generating arrays with repetitive structure.

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:01 | permanent link

Lab: Basic Probability and Basic Data Structures (Introduction to Statistical Computing)

In which we play around with basic data structures and convince ourself that the laws of probability are, in fact, right. (Or perhaps that R's random number generator is pretty good.)

Lab, solutions

Introduction to Statistical Computing

Posted by crshalizi at September 15, 2011 11:00 | permanent link

September 01, 2011

"Scalable Privacy-Preserving Record Linkage Using Similarity-based Indexing" (Next Week at the Statistics Seminar)

With great data comes great responsibility:

Peter Christen, "Scalable Privacy-Preserving Record Linkage Using Similarity-based Indexing"
Abstract: Privacy-preserving record linkage is concerned with the development of techniques that allow the scalable, automatic and accurate matching of individual records from disparate databases across organisations, such that no sensitive or confidential information needs to be revealed by the database owners, and the parties involved in the linkage only learn which records are classified as matches.
In this presentation I will provide some background on this topic, illustrate the challenges involved in privacy-preserving record linkage, and present novel scalable protocols for privacy-preserving record linkage that are based on the pre-calculation of similarity values by the database owners. Besides improved scalability compared to other approaches to privacy-preserving record linkage, the advantages of our protocols are that any similarity function (for strings, dates, ages, numbers, etc.) as well as any "blocking" function can be employed.
Place and time: Adamson Wing, Baker Hall, 4:30 to 5:30 pm on Thursday, 8 September 2011
As always, the talk is free and open to the public.

Enigmas of Chance

Posted by crshalizi at September 01, 2011 13:20 | permanent link

August 31, 2011

Books to Read While the Algae Grow in Your Fur, August 2011

Attention conservation notice: I have no taste.

Vladimir Vovk, Alex Gammerman and Glenn Shafer, Algorithmic Learning in a Random World
This is a badly-written book full of interesting results and ideas. The basic goal is simple: rather than making point forecasts, make predictions in the form of confidence sets, in such a way that the stated confidence level really does correspond to the actual probability of being right. An obvious approach would be to use Bayesian updating to form posterior-predictive sets, but those come with no guarantees of correct coverage, unless the prior is right, and indeed the Bayesian posterior probabilities can be arbitrarily bad (which is one reason why Bayesians need to test their models). Another tack would be to form a frequentist predictive distribution, but, while these exist, they're finicky and delicate.
The trick used in this book is wonderfully simple. Suppose data points are exchangeable (i.e., come from a "random world"), and we have a goodness-of-fit test which gives us a sensible (uniformly distributed) p-value. After observing a sequence of n data-points, consider all possible values for data-point n+1, and calculate their p-values. The ones which cannot be rejected at level a form the prediction set, with confidence level 1-a. All that is really needed for this to work is that we have some way of measuring the discrepancy or "conformity" of one data point with the others which gives uniformly-distributed ranks under the null hypothesis*. (This is why the authors call their scheme "conformal prediction"; it has nothing to do with conformal mappings in geometry, much less conformal field theory.) Actually calculating the prediction set in a reasonable way depends on the details of the conformity measure; they show that nearest-neighbor prediction, ridge regression, and some sorts of support vector machines are fairly easily handled.
The basic idea can be elaborated into predicting distributions ("Venn predictors"), into conditional confidence levels, into rescuing Bayesian prediction intervals, and in some situations into handling dependent data. For the last, they consider a set-up they call "on-line compression modeling", which amounts to postulating what Lauritzen calls a "totally sufficient" statistic, i.e., one which not only is sufficient in the ordinary sense, but which can be updated recursively, and screens off past and future observations. (Actually, I think that all they really need is a predictive Markovian representation, which can be constructed in great generality; in continuous time and for non-stationary processes, even.)
The book is, as I said, badly written. Formally, it only requires knowledge of stochastic processes to the point of understanding exchangeability (and de Finetti's theorem), martingales and Markov processes (and there are appendices to refresh the reader on measure-theoretic probability), and of statistics as far as regression, goodness-of-fit testing and confidence intervals. In practice, readers will find acquaintance with standard machine learning ideas, as found in e.g. Hastie, Tibshirani and Friedman, essential. Even with this background, the brilliant clarity of the main ideas is obscured by a large mass of unnecessary detail, non-standard notation and terminology (e.g., refusing to consider sequences of observations in favor of multisets, a.k.a. "bags", indicated by extra symbols; or eschewing the idea of sufficiency in the chapters on "on-line compression modeling"), and some rather dubious philosophy. (The distinction between "inductive" and "transductive" learning is neither defensible** nor even fruitful, and I say this with very deep respect for Vladimir Naumovich.) The obvious connections to frequentist prediction intervals, and to Butler's predictive likelihood, go unexplored. This is all unfortunate, but until someone writes a cleaner and clearer account of the theory, I have little choice but to recommend this to anyone with a serious interest in machine learning or statistical prediction.
*: I am indebted to Larry Wasserman for pointing out the importance of uniform ranking, and for discussing his work on extending these results, which he really ought to publish.
**: Supposedly, "transduction" is reasoning directly from the properties of individual observed cases to those of individual unobserved cases, without first inducing a general rule, and then deducing specific instances from it. Clearly, any inductive procedure can be turned into a transductive one simply by composition of functions. Conversely, any transductive procedure can be turned into an inductive one, by considering hypothetical new unobserved cases so as to map out the general rule. This is thus a distinction without a difference in terms of capacities. At most there might be a difference in terms of algorithmic representations (and computational complexity), but that's not relevant to the probabilistic or statistical theory undertaken here.
Update, 1 September: Shiva Kaul writes me to remonstrate with me about transduction. I quote his letter (with permission):
I think transduction (in the modern sense of the word, perhaps not what Vovk et al discuss) is statistically distinct from induction. I'm not aware of any transductive sample complexity upper bounds that beat corresponding lower bounds for inductive sample complexity. However, transductive upper bounds often beat inductive ones, e.g., "Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing".
The reduction you posted doesn't work for matrix completion. By considering a hypothetical new missing entry, one eliminates a present entry, which could change the predicted values for the other missing entries.
My superficial impression from the paper Shiva points me to is that it deals with a finite set of objects (entries in a matrix), and the difference between the "inductive" and "transductive" set-ups comes from the former sampling entries with replacement, which is kind of silly in this context, while the latter does not. But clearly I need to read and think more deeply before being entitled to an opinion. (This concludes this edition of Shalizi Smackdown Watch.)
Tony Judt, Postwar: A History of Europe Since 1945
A massive, but utterly satisfying, total history of the European subcontinent since the close of the Second World War — which, of necessity, involves going back before the war for many things. Judt makes no secret of the fact that his sympathies lie with anti-Communist liberal social democracy. (He strives very hard to be fair, — his portrait of Thatcher, for instance, shows real respect, though no admiration — but I am clearly not best-placed to say if he succeeded.) Accordingly, to his mind the great and incredible accomplishment of western Europe is not just its recovery, but the construction, in the democratic welfare states, of one of the most free and most just forms of life humanity has yet known, intertwined with a new and uniquely peaceful form of international relations through the European Union. (He is very sound on the role the United States played in encouraging these developments, which we should be proud of.) That all these institutions were created with mixed motives, and are more or less flawed and corrupt, goes with their being human creations, and does not reduce their accomplishments. This story is contrasted, intelligently, with that of eastern Europe under Communist rule, ending with its remarkably peaceful dissolution, with due attention paid to Gorbachev's remarkable, if entirely un-intentional, achievements. (The one place where I find myself seriously questioning Judt's interpretations is his insistence that the Soviet economy could not be reformed without undermining Communist rule. Here he draws on local economists like János Kornai, and the argument even makes some sense, but how does it explain China and Vietnam?)
Judt does an outstanding and remarkable job of giving even coverage across space, across time, and across domestic and international politics, the economy, social life, popular and high culture, intellectual affairs, and connections and contrasts among all of these. (The only major area of endeavor he slights is the history of science and technology, for understandable reasons.) He moves seamlessly and illuminatingly from the economics of post-war reconstruction to criticism of films of the 1940s, and then to a [very characteristic] consideration of the content of collective memories of the war. Remarkably, he accomplishes all of this while not presuming that his readers already know the story already. I recommend it most highly.
— Some of the passages here are recycled from essays collected in Reappraisals (or perhaps vice versa, considering how long he was working on this book).
Charlie Stross, The Fuller Memorandum
Mind-candy. Continuing Lovecraftian spy-fiction, with office comedy. These have never been quite as light-hearted as they first seem, but this one has some genuinely creepy and disturbing scenes and images. Enjoyable independently of previous books in the series, for certain values of "enjoyment".
(But it seems to me that Bob is unduly shaken in his atheism. [Since this all comes up in the first few pages, I don't count it as spoilers.] Yes, his universe has immensely powerful and ancient alien intelligences, some of whom take an interest in humanity. But that no more makes it a genuinely theistic universe than that of a Helicobacter living in a human gut. Ancient, powerful entities operating under weird-seeming rules of physics are not eternal, omnipotent supernatural beings. This is another expression of MacLeod's apophatic atheology.)
Margaret Maron, Storm Track; Slow Dollar; High Country Fall; Rituals of the Season; Winter's Child; Hard Row; Death's Half Acre
Why yes, I did basically spend a week in bed trying to distract myself from dental pain, how could you tell? These books go down like small, pleasant bits of candy, but like a lot of mystery stories they are also social fiction, the on-going theme here being the transformation of rural society in the South.
Benjamin I. Schwartz, The World of Thought in Ancient China
Fairly standard exposition of Chinese philosophy and some of its background through, roughly, the beginning of the Qin dynasty and the First Emperor, i.e., mostly the Hundred Schools of the Warring States period. I did not actually find it any more enlightening than, say, Fung Yu-lan's old book, let alone something like Graham's Disputers of the Tao. The main distinguishing features of Schwartz's book seem to be as follows. (1) Presuming the reader is already familiar with the broad outlines of the history, both political and intellectual. (2) Spending a lot of time disputing modern writers without bothering to fully expound their views (e.g., the argument with Fingarette in the chapter on Confucius, or with Needham in the chapter on cosmology*, both of which would have been impenetrable had I not read the other authors first), or even contrasting with more-or-less fashionable thinkers of the early 1980s (Geertz?!). The occasional stabs at, say, contrasting Confucius's ideas about ethics in public and private life with those of Plato and Aristotle are not sustained enough to really count as comparative history. Finally, (3) many very vague causal speculations, e.g., that the prevalence of ancestor worship made Chinese civilization more receptive to "universal monarchy" than other parts of the world. (I don't suppose that's impossible, but how on Earth could we tell?) In the end, I got a bit bored, and wouldn't really recommend this for non-specialists; try Disputers instead, or even Waley's vintage but engaging Three Ways of Thought in Ancient China. I am not, of course, qualified to say if it has any value for specialists in Chinese intellectual history.
Update: I am told, by someone who took Schwartz's classes at Harvard, that he was an inspiring teacher; I can well believe it. It's striking, and from my point of view a bit sad, how often great teaching fails to translate to the printed page, or for that matter vice versa.
*: To be clear, I think that Schwartz is right in his criticisms of Fingarette and Needham. The former's book on Confucius is a mere period piece from a now-abandoned phase of analytical philosophy; the latter engaged in a lot of speculation, wishful thinking, and sheer projection when writing about the "five elements" school. (This does not invalidate the scholarly value of Science and Civilisation in China.) But these hardly seems like one of the most important things to say about either school.
Megan Lindholm, Luck of the Wheels
Mind-candy fantasy novel; the fourth book in a series I haven't read, which I picked up because Lindholm's The Wizard of the Pigeons is a neglected classic of urban fantasy (from before that sub-genre got locked into its current formula), and I was curious about her other books. The first two-thirds or so of Luck of the Wheels is an amusing picaresque with some truly dreadful adolescents, followed by a blood-soaked revenge drama, finishing with what under the circumstances has to count as a happy ending, though from the viewpoint of the start of the novel it's an utter disaster. I am especially intrigued by the fact that every step in this transformation follows plausibly from the previous one. I will keep an eye peeled for the other books in this series.
— Incidentally, until looking up her website just now, I had no idea that Lindholm also writes lap-breaker fantasy epics as "Robin Hobb"; that answers my question about whatever happened to her...
Lois McMaster Bujold, Falling Free
Early and comparatively unpolished Bujold, which I had somehow never read before. It's not as masterful as her later works — in particular, the characters are not as richly developed. But even early, lesser Bujold is deeply entertaining. (The cover art of my old paperback copy is, as usual with this publisher, needlessly horrid; I am tempted to buy the NESFA Press edition simply to replace it with something bearable.)
Trey Shiels, The Dread Hammer
Mind-candy; fantasy full of the sort of no-good-can-come-of-this behavior you find in so many fairy tales, and for that matter epics. I will be reading the sequel. — "Shiels" is the open pen-name of Linda Nagata, who wrote some excellent hard science fiction novels in the 1990s and early 2000s, and then, well, went away for a while. This is not very much like her earlier books in theme or even style, but still good.
Patrick R. Laughlin, Group Problem Solving
A summary of research by experimental social psychologists on problem solving by groups of American college students, with special reference (not unreasonably!) to the contributions of one P. R. Laughlin and collaborators. These experiments are done on WEIRD subjects, and the problems are deliberately artificial, so there are the usual worries about generalizing to other contexts. (Is problem solving by, say, engineering designers really very much like cryptarithmetic?) But the experiments do nonetheless show some extremely interesting phenomena, and a general pattern of minimally-organized groups doing as well or better than the best individuals, under fairly careful controls. This book should really be taken more as an extended (158 pp.) review paper than a comprehensive treatise, and you have to brace yourself for a psychologist's idea of prose (and indeed a psychologist's idea of what constitutes a "theoretical model"; the online first chapter is representative in both respects), but it's a fast read, and full of useful information for anyone concerned with collective cognition. (The price for the hard-back edition is, however, outrageous.)
Duncan J. Watts, Everything Is Obvious, Once You Know the Answer
I'll actually try to give this a full write-up later, but in the meanwhile I will say: (1) this is great and recommended unreservedly; if you like this weblog at all you should definitely read it; (2) Tom Slee's review is very good; and (3) Duncan's been a friend since Santa Fe days, so feel free to discount my praise, but if I thought this was bad I'd just stay decently quiet about it.
Naomi Novik, Empire of Ivory
Mind-candy; enjoyable continuation of the series about dragons in the Napoleonic wars, in which Our Heroes venture to Africa, and the forces of European imperialism and the slave trade are righteously repelled. Of course, given the situation Novik has set up in her version of Africa, there is no way in Hell the trans-Atlantic slave trade could have begun in the first place; and no slave trade means astoundingly different European colonies in the Americas, if any at all, hence no French Revolution and no Napoleon. In short, the usual problem with alternate histories. (On examination, as so often, Timothy Burke said it first, and better.) But I will still get the sequel, because I want to know how she'll get her heroes out of the soup she lands them in at the end. — It's been long enough since I read the earlier installments that I found the catch-up parts welcome, and you could probably read this without the previous books, but I'd recommend starting the series at the beginning.
Karin Slaughter, Fallen
Absorbing, gruesome and wrenching as usual. I am not quite sure that it matches the past laid out in earlier books in the series, but this merely makes me want to go back and re-read them. (The coincidence of this book's title with one of Kathleen George's is I think due to the English language's sheer poverty of short, vaguely ominous phrases. But by this point, Slaughter could call a book Kittens and Flowers and it would fill me with apprehension.) — Previously.

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; Writing for Antiquity; Enigmas of Chance; Cthulhiana; The Collective Use and Evolution of Concepts; Minds, Brains, and Neurons; Complexity; Commit a Social Science; Kith and Kin; Philosophy; Networks

Posted by crshalizi at August 31, 2011 23:59 | permanent link

Rainfall and Data Structures (Introduction to Statistical Computing)

In which we practice working with data frames, and grapple with some of the subtleties of R's system of data types.

Assignment, due at the start of class, Wednesday, 7 September 2011

Introduction to Statistical Computing

Posted by crshalizi at August 31, 2011 10:31 | permanent link

More Data Structures (Introduction to Statistical Computing)

Matrices as a special type of array; functions for matrix arithmetic and algebra: multiplication, transpose, determinant, inversion, solving linear systems. Using names to make calculations clearer and safer: resource-allocation mini-example. Lists for combining multiple types of values; access sub-lists, individual elements; ways of adding and removing parts of lists. Lists as key-value pairs. Data frames: the data structure for classic tabular data, one column per variable, one row per unit; data frames as hybrids of matrices and lists. Structures of structures: using lists recursively to creating complicated objects; example with eigen.

Slides

Introduction to Statistical Computing

Posted by crshalizi at August 31, 2011 10:30 | permanent link

August 29, 2011

Basic Data Types and Data Structures (Introduction to Statistical Computing)

Introduction to the course: statistical programming for autonomy, honesty, and clarity of thought. The functional programming idea: write code by building functions to transform input data into desired outputs. Basic data types: Booleans, integers, characters, floating-point numbers. Subtleties of floating point numbers. Operators as basic functions. Variables and names. An example with resource allocation. Related pieces of data are bundled into larger objects called data structures. Most basic data structures: vectors. Some vector manipulations. Functions of vectors. Naming of vectors. Continuing the resource-allocation example. Building more complicated data structures on top of vectors. Arrays as a first vector structure.

Slides

Introduction to Statistical Computing

Posted by crshalizi at August 29, 2011 10:31 | permanent link

Introduction to Statistical Computing

At the intersection of Enigmas of Chance and Corrupting the Young.

Class homepage

Class announcement

    Lectures:
  1. Introduction to the class, basic data types, basic data structures
  2. More Data Structures: Matrices, Lists, Data Frames, Structures of Structures
  3. Flow Control, Looping, Vectorization
  4. Writing and Calling Functions
  5. Writing Multiple Functions
  6. Top-Down design
  7. The Scope of Names
  8. Debugging
  9. Testing
  10. Functions as Arguments
  11. Functions as Return Values
  12. Exam briefing
  13. Split, Apply, Combine: Using Base R
  14. Split, Apply, Combine: Using plyr
  15. Abstraction and Refactoring
  16. Simulation I: Random Variable Generation
  17. Exam debriefing
  18. Simulation II: Monte Carlo and Markov Chains
  19. Simulation III: Mixing and Markov Chain Monte Carlo
  20. Basic Character Manipulation
  21. Regular Expressions I
  22. Regular Expressions II
  23. Importing Data from Webpages I
  24. Importing Data from Webpages II
  25. Databases I
  26. Databases II
    Homework:
  1. Rainfall and Data Structures
  2. Tweaking Resource-Allocation-by-Tweaking
  3. Improving Estimation by Nonlinear Least Squares
  4. Standard Errors of the Cat Heart
  5. Rancorous Testing
  6. Outlier-Robust Linear Regression
  7. 'Tis the Season to Be Unemployed
  8. Sampling Accidents
  9. Get (the 400) Rich(est list) Quick
  10. Baseball Salaries
    Labs:
  1. Basic Probability and Basic Data Structures
  2. Flow Control and the Urban Economy
  3. Of Big- and Small- Hearted Cats
  4. Further Errors of the Cat Heart
  5. Testing Our Way to Outliers
  6. Likelihood
  7. Split-Apply-Combine
  8. Changing My Shape, I Feel Like an Accident
  9. Regular Expressions I
    Exams:
  1. Midterm
  2. Final Project Descriptions

My Work Here Is Done

Self-Evaluation and Lessons Learned

Posted by crshalizi at August 29, 2011 10:30 | permanent link

August 25, 2011

The Engineers of Human Sociality

There is a line in my review of Networks, Crowds, and Markets which I feel guilty about:

Nowadays, companies whose sole and explicit purpose is the formalization of social networks have hundreds of millions of active customers. (Although they are not often seen this way, these firms are massive exercises in centrally planned social engineering, inspired by sociological theories.)
The reason I feel guilty about this is that this is not my insight at all, but rather something I owed to a manuscript which Kieran Healy was kind enough to share with me some years ago. I have been bugging Kieran ever since to make it public, and he has now, for unrelated reasons, finally done so:
Kieran Healy, "The Performativity of Networks" [PDF]
Abstract: The "performativity thesis" is the claim that parts of contemporary economics and finance, when carried out into the world by professionals and popularizers, reformat and reorganize the phenomena they purport to describe, in ways that bring the world into line with theory. Practical technologies, calculative devices and portable algorithms give actors tools to implement particular models of action. I argue that social network analysis is performative in the same sense as the cases studied in this literature. Social network analysis and finance theory are similar in key aspects of their development and effects. For the case of economics, evidence for weaker versions of the performativity thesis in quite good, and the strong formulation is circumstantially supported. Network theory easily meets the evidential threshold for the weaker versions; I offer empirical examples that support the strong (or "Barnesian") formulation. Whether these parallels are a mark in favor of the thesis or a strike against it is an open question. I argue that the social network technologies and models now being "performed" build out systems of generalized reciprocity, connectivity, and commons-based production. This is in contrast both to an earlier network imagery that emphasized self-interest and entrepreneurial exploitation of structural opportunities, and to the model of action typically considered to be performed by economic technologies.
It is, I think, actually easier for social network theories to be performed (i.e., implemented on a distributed platform of East African Plains Apes) than for financial theories. The reason is that financial theories get performed, basically, when they produce Nash equilibria (not necessarily evolutionarily stable strategies), while the people running a social network service can just build their theories into the code. To give a simple example: If you think friends' friends are extra likely to be friends ("triadic closure"), you write your software to suggest friends accordingly. Even if triadic closure isn't really part of the pre-existing friendship network, it will now be hard to avoid in your formalized, recorded social network.

Kieran's paper is one of the most interesting things I've read about social networks in a long time. Even those of us who are interested in networks from the viewpoint of modeling complex systems should care about it, because it has implications for us: viz., we need to think about how much of what we're modeling reflects engineering decisions made in the South Bay or lower Manhattan, as opposed to (other) social processes. Go read.

Networks; Commit a Social Science; Kith and Kin

Posted by crshalizi at August 25, 2011 09:30 | permanent link

August 24, 2011

Unhappy the land that is in need of foresightful utility maximizers

Brad DeLong, contemplating the slide from bad things are happening because people do not act like economic models assume they do to people are irrational and deserve to be punished, remarks, sensibly enough, that "a system that for good outcomes requires that people act in ways people do not do is not a good system — and to blame the people rather than the system is to commit a major intellectual error." Somehow, this made me think of the following, which I offer with all due apologies to Brecht's memory:

Some economist decreed that the people
had lost the market's confidence
and could only regain it with redoubled effort.
If that is the case, would it not be be simpler,
If the market simply dissolved the people
And purchased another?

(This is related, of course, to the way that hexapodia is the key insight into neutralizing the Dutch menace.)

Manual trackback: MetaFilter

The Dismal Science; Learned Folly

Posted by crshalizi at August 24, 2011 23:59 | permanent link

Course Announcement: 36-350, Statistical Computing

Since the semester begins on Monday, I might as well admit to myself that I am, in fact, teaching a new class:

36-350, Statistical Computing
Instructors: Cosma Shalizi and Vincent Vu
Description: Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to programming, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs.
Students will learn the core of ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will in particular learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations, how to parallelize data analyses, how to employ numerical optimization algorithms and diagnose their limitations, and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code.
The class will be taught in the R language.
Pre-requisites: This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential, as is some prior acquaintance with statistical software. Previous programming experience is not assumed, but familiarity with the computing system is. Formally, the pre-requisites are "Computing at Carnegie Mellon" (or consent of instructor), plus one of either 36-202 or 36-208, with 36-225 as either a pre-requisite (preferable) or co-requisite (if need be).
Further details, subject to change, at the class website. Teaching materials will definitely be posted there, and may be posted here.

(For tedious reasons, this class has the same number as the data-mining class I've taught previously; that course is now numbered 36-462, and will be taught in the spring by somebody else, while I'll be returning to 36-402, advanced data analysis.)

Corrupting the Young; Enigmas of Chance; Introduction to Statistical Computing

Posted by crshalizi at August 24, 2011 23:58 | permanent link

August 13, 2011

Idle Queries, August 2011 Edition

  1. Under current US law, could a credit card association (e.g., Visa) refuse to authorize donations to all political parties and organizations, just because they are political? Could they block donations to a particular political organization, e.g., the Democratic Senatorial Campaign Committee, because they do not like its politics?
  2. English has the (annoyingly un-parallel) words "chinoiserie" and "japonisme" to denote western art which tries to evoke that of China and Japan, or at least certain western ideas of that art. (We acquired these words in our usual way, obviously.) Are there analogous words for art which tries to evoke African art, or Indian? If not, what should they be?
  3. Which pairs of languages have the most mutual borrowing of words? Is there any word which has been borrowed from one language, transformed in the borrower, and then loaned back?
  4. How closely related do two species of cats have to be to respond to each other's territorial scent-markers? Is this always symmetric?
  5. Would it be wrong to put this print up in my office, across from the student chair?
  6. Would CSSR's performance improve if I made it worry that its life was meaningless?
  7. Is the presence of the Frankfurt School in increasingly-mainstreamed right-wing conspiracy theories simply revenge for The Authoritarian Personality, or is it also revenge for Prophets of Deceit: A Portrait of the American Agitator? (Which you should read; those people are now an influential faction of Congress. [See "increasingly-mainstreamed" above.])

I should perhaps add that there is no particular connection among these.

Update, later that day: John Kozak offers the example of French boeuf ("cow") -> English "beef" + "steak" -> French bifteck, and suggests that there are probably many more French -> English -> French loops.

Update, 16 August: The question about (what I learn from Scott Martens are properly called) re-borrowings seems to have struck a chord. Some suggestions from readers follow.

Ádám Tóth offers the chain froc (French) -> "frock" (English) -> frac (French); the mind boggles slightly at the idea of French borrowing an English word for clothing, but apparently so.

Continuing on the French-English-French loop, Matthieu Authier offers French pied de grue, "crane's foot", a drawing of which was apparently used to mark succession in family trees, hence English "pedigree", whence French pedigrée.

Shifting from going back and forth across the English Channel to going back and forth across the North Sea, Marius Nijhuis provides an interesting list for Dutch. I'll quote his e-mail (with permission) at some length:

In these cases the original word is still in use, mostly in roughly its original meaning, next to a reimported version with a clearly different meaning. They all involve French or English or both. German would be the obvious third language to look for loops. But German dialects and Dutch are historically too close, so words move back and forth too easily to get interesting changes in meaning.
  • Mannequin, currently used in Dutch for 'runway model'. From 'manneken', still the Flemish word for 'little guy'.
  • Boulevard, used in Dutch mostly for a road running paralel to a beach. The French word comes from 'bolwerk', Dutch for bastion.
  • Sketch, used in Dutch only for a short comedy act. The English word comes from 'schets', meaning sketch.
  • Drugs, used in Dutch only for narcotics. From English, through French 'drogue' meaning 'spices' in those days, from Dutch (or Old Dutch) 'droge waren', meaning 'dry goods' in shipping. Much earlier than 'drugs', 'drogue' had already returned as 'drogist', the Dutch word for 'drug store', from French 'droguiste', 'seller of spices'.
  • Etappe, nowadays mostly used in Dutch for a stage in a cycling contest. From French 'etape' in the Tour de France, through a chain of military uses from old French 'estaple', meaning trade depot, in turn derived from Old Dutch 'stapel'. And stapel also moved to English, leading to "staple foods", that returned to Dutch as 'stapelvoedsel'. In this case, 'stapel' has nearly disappeared from Dutch in its original meaning, Its current ordinary meaning is simply 'stack'.
  • Dock, written with a c, is in modern Dutch a device to put an ipod in. Dok is still the word for the maritime structure.
  • Cruise, used in Dutch for a luxury boat trip. From 'kruisen', tacking against in the wind in a sailing boat. Cruisecontrol (one word) is the normal Dutch word for the speed control in cars. Cruising as done by aircraft got reattached to 'kruisen', eventually leading to 'kruisraket', cruise missile. That is a curious word Dutch might never have formed on its own, since it can also mean 'cross missile' or 'crotch missile'.

There are actually even more reader e-mails on this subject in the queue, but I don't have permission to quote from them yet.

Posted by crshalizi at August 13, 2011 11:00 | permanent link

August 04, 2011

Power Law as Poetic Justice

This graph,

a bad power law fit to the scientific output of A.-L. Barabábsi, may be the single most ironic image I have ever seen in a scientific communication. Sadly, however, its authors do not seem to appreciate the irony. (And why can't they leave Mark and Su Shi alone?)

Via Mason and Aaron.

Power Laws; Learned Folly

Posted by crshalizi at August 04, 2011 12:05 | permanent link

August 01, 2011

Extended Harmony, and Tiger Repellent Drumming

Attention conservation notice: Almost 1000 words of follow-up to a post on an inter-blog dispute, complete with graphs and leaden sarcasm.

Some follow-up to the last post, in response to e-mails, and discussion elsewhere.

  1. Non-capitalist societies can and do, of course, also have lots of inequality. So?

  2. I do not see why my aside about capitalism and state power should be the least bit controversial. Since my libertarian friends seem more comfortable with deduction from first principles than empirical observation, let's try it this way. Capitalism needs private property. Since there will be disputes about property, there must be an institution for settling those disputes effectively. That institution is the state. (Yes, I know about anarcho-capitalist ideas for private, market-driven courts and policing. These would last about a week: at best, they would re-sort themselves into territorial states, or more likely those doing the guard labor would ask themselves what they guard should belong to someone else.) Capitalism thus presumes a state which decides who owns what, and is (almost) unquestioningly obeyed. Again: capitalism needs contracts, there will be disputes about contracts, there must be an institution which can settle those disputes. Thus capitalism presumes a state which can decide who owes what to whom, and force them to pay. Capitalism with intellectual property is even more demanding: copyright presumes a state exercising minute control over the content of all modes of communication; patents presume a state exercising minute control over processes of production and the use of knowledge. (And then we can get into what the state has to do by way of culture and education, i.e., reshaping the mentality of the population.) When such powers are used stupidly or capriciously, capitalism suffers, but when such powers are absent, capitalism does not exist.

    Now, continuing with the theme of harmonizing means and ends, if you want capitalism, but you find a state that powerful very scary (as you have every right to do), then you have a problem. You might, on reflection, favor some other economic system which does not require such a powerful state. (This is not a popular option, save among marginal advocates of rural poverty and idiocy.) You might, on reflection, decide that such power is perfectly A-OK, so long as it's used for ends you approve of and there's no danger of the people taking over. (Hence Hayek's anti-democratic political ideas, and viewing Pinochet's reign of terror as less damaging to [what he saw as] liberal values than the British National Health Service.) Or you might try to find ways of taming or domesticating state power, of civilizing it. (I think that has a pretty good track-record, but who knows how long we can keep it up?) What you cannot do, with any intellectual honesty or even hope of getting what you want, is pretend that capitalism can work without a powerful, competent and intrusive state. As Ernest Gellner once wrote, "Political control of economic life is not the consummation of world history, the fulfilment of destiny, or the imposition of righteousness; it is a painful necessity."

  3. I should have made clearer that the policies which have been sold over the last few decade as enhancing economic growth have little show for themselves on that score. A crude but still illuminating way to see this is simply to plot GDP per person over time, adjusted for inflation:
    Data from the St. Louis Federal Reserve Bank's FRED service: GDP from the GDPCA series, population from the POP series. The plot starts at 1952 because that's when the population series does.
    I've plotted this with a log scale on the vertical axis, so the slope indicates the (exponential) rate of growth. The slope does not change very much over time, though it seems to become less steep as time goes on. This visual impression is confirmed if we look at growth rates:
    Annual exponential growth rates from the previous figure: yearly values (dots) and an 11-year moving average (black line). (The correlation time of the growth rate series is about 3.5 years.)

    What these graphs bring to mind is the ancient joke where a new doctor is examining a patient in an insane asylum. The lunatic is compulsively banging on a drum, and the doctor asks why. "To repel the tigers, of course." Doctor: "Tigers? There are no tigers for thousands of miles!" Lunatic: "You see how well it works." If you want to say that the more deregulatory, capitalism-unleashed direction in which policy has moved since the 1980s (or even the late 1970s) is growth-enhancing, you can't point to an acceleration of growth, since there hasn't been any (pace Eugene Fama and John Taylor). Instead, you have to argue, counter-factually, that without those policy changes, growth would have slowed down (even more than it did); that pounding those drums drove away the tigers of stagnation. It's not completely illogical to try to explain a more-or-less constant growth rate with a variable policy regime, but you need some story about what other variables policy is compensating for, and indeed a story about why the compensation is so close.

    At a finer-grained level, you can look at performance over the business cycle, and again see that the new policy regime doesn't deliver any more aggregate growth. It certainly doesn't lead to faster productivity growth (again, despite claims to the contrary). But one thing which has changed is that aggregate growth does a lot less for most people than it used to. Again, you could tell a counter-factual stories about how all of these would be much worse without those policies, but by this point you are claiming that your drumming repels not just tigers but also snow leopards, elephants, and Glyptodon.

Manual trackback: Agnostic Liberal

The Dismal Science

Posted by crshalizi at August 01, 2011 09:50 | permanent link

July 31, 2011

Books to Read While the Algae Grow in Your Fur, July 2011

Attention conservation notice: I have no taste.

Charles Sherrington, The Integrative Action of the Nervous System
This is from 1906; Ramón y Cajal had just established that the nervous system actually divided into discrete cells, i.e., neurons, and Sherrington himself had just named the synapse. (The part, in Lecture I, where Sherrington lays out the argument for localizing lots of the interesting properties of reflexes at synapses is a wonderful display of scientific reasoning.) One is staggered both by our ignorance of how the nervous system worked (nobody knew whether nerve impulses were electrical, chemical, mechanical, or something else) and by the sheer crudity of experimental methods.
Clearly, this work is only of historical interest now, but that interest is considerable, since the big, synoptic picture of the nervous system which it draws is still pretty much the one neuroscientists use. It goes (in somewhat modernized language) as follows. The nervous system exists in animals to control muscles, i.e., comparatively rapid motion. Some motions (like those of the heart, lungs, and bowels) are to be kept up continually in more or less steady rhythms; others are sporadic, adaptive responses to circumstances. The later are triggered by sensory organs, and the nervous system provides both what we now call "pattern generators" for rhythms, and the links connecting sensory organs to effector organs, especially muscle fibers. The key to making all this work is that nerve cells transmit impulses to each other, which can, depending on the relations between the cells, either excite or inhibit further transmission on the part of the downstream cells; neurons are themselves excited by sensory cells, and can cause muscle cells to contract. The character of the interaction between neurons depends, somehow, on what goes on at the synapses between them, and it is asymmetry at the synapses which makes the propagation of nerve impulses go in only one direction. To produce useful, adaptive responses, each sensor generally must be able to control multiple effectors; conversely, each effector is generally under the partial control of multiple sensors. This many-to-many linkage means that the nervous system must be a network (a word Sherrington uses, e.g., towards the ends of Lectures IV and VI), where what we would now call functional and effective connectivity changes dynamically. The "integrative action" of his title consists of coordinating the effector responses to sensory stimuli, sometimes cooperatively, sometimes antagonistically (as when different reflexes would move the same body parts in different ways). Some of this can be handled in comparatively local and stereotyped ways, which is more or less what goes on in spinal-cord reflexes, but in an intact, healthy animal, the pre-eminent organ of integration is the brain.
The book seems to be out of print. There is a scan at archive.org, but I haven't looked at it to check its quality.
Charles Saunders, Imaro
Mind-candy. African-themed swords-and-sorcery fantasy, from someone who realizes that Africa is not all one thing.
Patrick O'Brian, H.M.S. Surprise
"Jack, you have debauched my sloth!" (It is characteristic that what sets up this adorable punch-line actually shows a great deal about the characters.)
Carlo Gaetan and Xavier Guyon, Spatial Statistics and Modeling [Prof. Gaetan's website for the book]
For readers who are reasonably comfortable with statistical theory and have some knowledge of stochastic processes. (Someone who had made it through Larry's All of Statistics, and perhaps the Markov chains chapters of Grimmett and Stirzaker, should be in good shape.) They consider three broad classes of spatial models: those defined by second-order moments (covariances or "variograms"), Gibbs-Markov random fields, and point processes. (Spatio-temporal processes are handled mostly by occasional asides about adding an extra coordinate for time, though the Gibbs-Markov chapter gives a little more attention to the fact that time is special.) Chapter four covers simulation methods, including various forms of Monte Carlo. The very long (~100 pp.) fifth chapter actually lays out statistical methods, more or less divided up according to the model type, and giving welcome attention to nonparametric estimates and heuristic checks on models. These are complemented by extensive appendices which state, but do not prove, the necessary ergodic theorems and central limit theorems for random fields, and general results about minimum contrast/quasi-likelihood estimation. The problems at the end of each chapter are a reasonable mixture of theory, calculation, computational mini-projects and data analysis.
Overall, the book is decent but unspectacular. There are a few places where the text is actively unclear (e.g., the definition of a Markov random field on pp. 55--56), and others where more explanation might have been useful (e.g., why the Propp-Wilson algorithm [p. 136] works). Against this, it is up to date, and over-all accurate and has sensible priorities. I am not altogether sure about recommending it for self-study, but would be fine with assigning it for a class.
Jay Lake, Green
The best new fantasy novel I have read these last six months or more. The world is vivid and complicated and mysterious, Green is a compelling character (and her changes in viewpoint are startling, natural, and sometimes heart-aching), the action thrilling and momentous and self-contained, and Lake actually appreciates economy of story-telling. (I have read whole series which just get up to the point where Green leaves Copper Downs.) Lake is now on my "buy on sight" list, despite his apparently having committed some steampunk novels.
My one complaint, and I realize this is in some ways petty, is the cover art: it's attractive, skillfully drawn and actually reflects a lot of important elements from the book, but makes Green oddly melanin-deficient for someone supposed to be from somewhere analogous to the Indian subcontinent and who repeatedly describes her skin as "brown".
Jon "Lost in Transcription" Wilkins, Transistor Rodeo
I see why Jon likes Agha Shahid Ali. There are some samples at his website, but perhaps I may quote one more:
Love Song

What can I offer you, lady?
A fig, perhaps? You are April
and morning and I would line

every street with blueberries
who would tip their tiny crowns
whenever you appeared,
border your life with trumpts
until your shadow was famous,

but I would still be filthy,
and you so starry and upturned,
so yes, perhaps a fig.

Disclaimer: Jon is a friend, but I have no stake in the success of this book, and paid for my own copy.
Carol O'Connell, Mallory's Oracle
Ariana Franklin, Mistress of the Art of Death
Mystery-flavored mind candy. I cannot decide which heroine --- the 12th century forensic pathologist or the late-20th-century 1337 haxxor policewoman --- is more implausible, but they both worked as the stories carried me along.
Albert Sánchez Piñol, Cold Skin
Tolerable in its own right, but I feel let down, because the extravagant praise on the cover led me to expect something more than a story worth of a Star Trek episode and occasional asides about loneliness, hatred and fear which came across as more over-wrought than profound. (Perhaps they were better in Catalan?)
The story, and why I quite deliberately compare it to a Star Trek episode: A nameless man, disillusioned with Europe after the Great War (evidently; the date is vague) takes a year-long post as the weather monitor on an isolated and improbably warm Antarctic island. The only other person on the island is a strange man supposedly keeping the lighthouse (which is useless since no ships come). As soon as the ship sails away, the protagonist is attacked by scaly bipedal vertebrates which come out of the sea. The light-house keeper has been fending them off for at least a year, and keeping one as a pet/sex-toy. The protagonist is let into the fortified lighthouse to help with its defense. After a few months of this, he has sex with the lighthouse keeper's pet, which is really really good, because she's, like, all natural and uninhibited and doesn't have any, y'know, civilized hang-ups about it.* Then he realizes the creatures are actually intelligent beings, and that maybe they should try negotiating, what with being vastly out-numbered and having only a finite supply of bullets and all. (Kirk at least put recognition of sapience before getting it on with the green-skinned babes.) Some unconvincing sentimental scenes follow, but I spoil nothing by saying that the humans and the amphibians fail to find a satisfactory basis for mutual co-existence.
Over-all: short, some entertainment value, not scary, neither beautiful nor sublime. Worth reading if expectations are suitably low.
*: I mock, but "sexual liberation of exiled civilized person via uncorrupted native" is, if not older than dirt, then at least as old as Enlightenment-era fantasies about Tahiti and the South Seas. (And of course, as here, it's almost always a civilized male and a barbarous female.) If an artist is going to use a theme that worn-out and familiar, they need to either use it really well, or do something new with it, e.g., and off the top of my head, present it as the self-delusion of a rapist. (I offer that suggestion at random; Sánchez Piñol, so far as I can tell, wants us to take the trope at face value.)
Diana Rowland, My Life as a White Trash Zombie
Mind-candy: in which becoming one of the walking dead, dependent on a steady supply of fresh human brains to keep from rotting, turns a young woman's fortunes around, which gives you some idea of her life before. (In other words, it is everything the cover promises.) I found it quite hilarious, but might not have enjoyed it nearly so much in a different mood.
Kathleen George, Taken, Fallen and The Odds
Really excellent series of mystery novels, set in Pittsburgh but definitely worth reading even without the local connection. (Afterimage, which I read about a year ago, is the third book in the series, between Fallen and The Odds, but not too much is lost from reading out of order.) In most cases the reader finds it pretty plain, or even explicit, whodunnit very early on; the pleasure here is the character studies, as well as watching the detectives (and others) piece things together, and the criminals deal with their crimes. Remarkably enough, the books keep getting better.

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; The Commonwealth of Letters; Enigmas of Chance; Minds, Brains, and Neurons

Posted by crshalizi at July 31, 2011 23:59 | permanent link

July 25, 2011

Harmony of Means and Ends

Attention conservation notice: 1900 dry, pedantic, abstract words about "theory of politics", and why it might matter to bringing about progressive political changes, from someone who is completely ineffective at actual politics, and not notably engaged in it either. None of it is at all original, and much of it is painfully obvious. Plus, it was provoked by a squabble among bloggers, and reading anything responding to a literary controversy flame-war is usually a waste of time.

Some posts by Henry Farrell on "left neoliberalism" and "theory of politics" (1, 2, 3) provoked quite a huge response, which I will not try to catalog. (I will however point to a useful older post by Ben Alpers on the term "neoliberalism", and to Timothy Burke's reaction.) I tried to explain, in the comments at Unfogged, what I thought Henry was trying to say, and for want of other material I'll repost that here, with a few modifications. By way of disclaimer, Henry is a friend and collaborator, and we've spent a lot of time talking about related issues, but what follows is in no way endorsed by him.

With that out of the way: What Henry means when he talks about "a theory of politics" is a theory about how political change (or stasis) happens, not about what political ends are desirable, or just, or legitimate, which is much of what I take "political theory" to be. "What are the processes and mechanisms by which political change happens?" is, at least in part, a separate question from "What would a good polity look like?", and Henry is talking about the former, not the latter. Of course the answer to the first question will tend to be context-dependent, so specialize it to "in contemporary representative democracies", or even "America today" if you like.

The first importance of such a theory is instrumental: if you want to have policies that look like X, a good theory of politics would help you figure out how to achieve X-shaped policies. But the second importance is that the theory might change your evaluation of policies, because it would change your understanding of their effects. The U.S. tax deduction for mortgage interest is arguably economically inefficient, since it promotes buying housing over renting, for no very clear economic rationale. But in so doing it (along with massive government intervention in forming and sustaining the mortgage market, building roads, using zoning to limit the construction of rental property, etc.) helps create a large group of people who are, or think of themselves as, property owners, possessors of substantial capital assets and so with a stake in the system*. If the deduction were, for instance, means-tested, it would not be nearly so effective politically.

Or again, if, for instance, you like material prosperity, you might favor policy X because (you think) it promotes economic efficiency. (Some other time, we can and should have the conversation about "economic efficiency", and the difference between "allocating scarce resources to their most valuable uses" and "allocating resources to meet effective demand", i.e., about the injustice inherent in the market's social welfare function.) But if you are also egalitarian, and policy X would make it easier for a small group of already-privileged people to wield political influence, then you might decide that policy X is not, after all, worth it, because of its inegalitarian political effects. (At a guess, some, but not all**, of Brad DeLong's reaction to Henry's posts is explained by letting X = "Clinton-era financial deregulation".) If you value a certain kind of distribution of political power as such (democracy, aristocracy, the vanguard party, rule by philosopher kings central bankers, etc.), a theory of politics would be an important part of how you gauge the value of different policies, at least ones which you think would tend to change how much power different individuals, or groups of individuals, would have.

If you are more or less egalitarian about economic resources and political power, then you will want to see policies that not only contribute to material prosperity, and to distributing that prosperity, but also to making it easier and more feasible for those who are poorer and of lower social status to make their interests felt politically. (Rich, high-status people typically have little trouble on that score. Also, this presumes that interests are not completely homogeneous, but that's OK, because they're not.) Sometimes these goals will reinforce each other, sometimes they will conflict and one will need to make trade-offs. It is hard to make an intelligent trade-off, however, if you do not have any tools for recognizing they exist, or assessing what they are; this, again, is why Henry thinks achieving progressive goals needs a theory of politics.

Now, if I tried to back out a theory of politics from the practice of left neo-liberals, it would something like this: what matters most to the interest of voters is the over-all growth of the economy; as it grows, they will become more prosperous, and reward the political party which implemented those policies. They will also be willing to support unobtrusive welfare-state measures, especially if they look like they are run efficiently and go to the truly deserving, because prosperous people feel generous. So the most important thing is "the economy, stupid", and making sure the voters know who is responsible for good economic times.

I do not want to discount this completely, but, even if they're right about which policies will promote economic growth, it seems oddly naive about how any sort of representative democracy, yoked to capitalism, is going to work. We do indeed have lots of common interests (to give some innocuous ones: not being turned into smears of radioactive glass, not living amid pandemic or endemic communicable illness, having prosperous neighbors, etc.), but we also have diverging interests. Groups or classes of people often have systematically diverging interests. This is because whenever two or more parties have a positive-sum collaborative interaction, there is inevitably a zero-sum struggle over dividing the gains from cooperation. (Voluntary market exchange may be welfare-enhancing for everyone, but whenever you buy something and would still have done so for a penny more, your consumer surplus is the seller's failure of price discrimination.) In this struggle, as in all bargaining games, there is a natural advantage to the side which is already better off. Beyond and beside interests, there are of course also values, which may be unselfish but also diverge.

Capitalism seems to inevitably produce a small number of people who are extremely rich and command considerable economic power; this gives them very distinctive interests. (Often they will also identify themselves with their business enterprises, and their interests as on-going and growing bureaucracies.) Being human, many of them will try using that power to advance those interests and further enrich themselves, by dominating others and by bending the government to their will. (Capitalism needs a very high degree of internal peace and automatic obedience to uniform legal authority — when the courts decide whom disputed property belongs to, or what contracts require, it must stick — to say nothing of physical infrastructure and human resources, and so it always presumes a very powerful state.) They have the resources, and the incentives, to exert influence and to keep doing so. Rich and powerful people can be wrong about the effects of their actions, but when they are not, one should expect a positive feedback, with economic power being used to enhance political power, which in turn is exercised to enhance economic power.

Against this, there are the vast majority of ordinary people, who have their varying interests, but also pretty uniformly have interests which oppose those of the rich and powerful. (Again, they also have interests in common with the rich and powerful.) They are on the receiving, losing end of the feedback between wealth and political influence. Since they have fewer resources than the rich and powerful, it is simply harder for them to get the government to listen, or even to keep track of what it is doing that might affect them. If we want a society which is even close to equal politically and economically --- if we do not want the majestic equality of the law which forbids the rich and poor equally from stealing bread and sleeping under bridges --- then effective counter-vailing power must be organized, which means institutions for collective action. Of course, on the usual Logic of Collective Action grounds, this will be harder for large groups of people with few resources than for small, already advantaged classes...

I would also add --- and this is something Henry and I have ben thinking about a lot --- that it is often not at all trivial to figure out what your interests are, or how to achieve them, and that (small-d) democrats should try to find ways to help people work that out. Actually having political clout is often going to depend on collective action, but this needs to be complemented by collective cognition, which is how people figure out what to want and how to achieve it. That, however, is part of a much larger and rather different story, for another time.

All of this can be boiled down to something much shorter (and perhaps should have been at the start): "When you tell us that (1) the important thing is to maximize economic growth, and never mind the distributional consequences because (2) we can always redistribute through progressive taxation and welfare payments, you are assuming a miracle in step 2." For where is the political power to enact that taxation and redistribution, and keep it going, going to come from? A sense of noblesse oblige is too much to hope for (especially given how many of our rich people have taken lots of economics courses), and, for better or worse, voluntary concessions will no longer come from fear of revolution***.

There are I think two reasonable defenses left neoliberals could make. One is to say that creating or strengthening any forms of countervailing power under modern American conditions would itself take a miracle. That goal would be futile and idle, but we could increase economic growth, which would at least benefit some people. The other would be to deny that anyone has a reliable theory of politics, in this sense, certainly none which could be used as a guide to action, and no hope of developing one; whereas we do know a bit about economics. I find neither of these convincing, but I've gone on long enough already. Have a cartoon:

Manual trackback: Crooked Timber; The Browser; Uncertain Principles; Three Quarks Daily; Crook Timber (again); Jacobin

Update: See next post for some follow-up.

*: I owe this argument to my father, lo these many years ago.

**: I esteem DeLong's writings very highly, have learned much from them, and think he is on balance very much a force for good, but there are times when I simply cannot understand how his mind works, and do not particularly want to.

***: It would be fascinating to know to what extent the development and decline of the welfare state tracks, not fears of Communism as such, but fears of other people finding Communism attractive. (By 1980, the USSR was a powerful state, but also an obviously unappealing model.) I have no idea how to study this.

The Progressive Forces; Kith and Kin

Posted by crshalizi at July 25, 2011 13:20 | permanent link

July 13, 2011

"Q: Is Control controlled by its need to control? A: Yes."

Attention conservation notice: 1800+ words on yet more academic controversy over networks. (Why should those of us in causal inference have all the fun?) Contains equations, a plug for the work of a friend, and an unsatisfying, "it's more complicated than that" conclusion. Wouldn't you really rather listen to William Burroughs reading "Ah Pook Is Here"?

The following paper appeared a few months ago:

Yang-Yu Liu, Jean-Jacques Slotine and Albert-László Barabási, "Controllability of complex networks", Nature 473 (2011): 167--173 [PDF reprint courtesy of Prof. Barabási; the supplemental files, which contain all the actual math, are available from his publications page]
Abstract: The ultimate proof of our understanding of natural or technological systems is reflected in our ability to control them. Although control theory offers mathematical tools for steering engineered and natural systems towards a desired state, a framework to control complex self-organized systems is lacking. Here we develop analytical tools to study the controllability of an arbitrary complex directed network, identifying the set of driver nodes with time-dependent control that can guide the system's entire dynamics. We apply these tools to several real networks, finding that the number of driver nodes is determined mainly by the network's degree distribution. We show that sparse inhomogeneous networks, which emerge in many real complex systems, are the most difficult to control, but that dense and homogeneous networks can be controlled using a few driver nodes. Counterintuitively, we find that in both model and real systems the driver nodes tend to avoid the high-degree nodes.

(I will hold my tongue over the philosophy of science in the first sentence of the abstract.)

Liu et al. looked specifically at systems of linear differential equations, with one (scalar) variable per node, and some number of outside control signals. Numbering the nodes/variables from 1 to N, the equation for the ith node is

\[ 
\frac{dx_i(t)}{dt} = \sum_{k=1}^{N}{a_{ik}x_k(t)} + \sum_{j=1}^{P}{b_{ij}u_j(t)} 
 \]
Here the x variables are the internal variables of the system, and the u variables are the control signals. The coefficients aik encode the connections between nodes of the assemblage; non-zero coefficients indicate links in the network. The bij coefficients represent coupling of the network nodes to the input signals. If you can't, or won't, read the equation, an adequate English translation is "the change in the state of each node depends on the state of its neighbors in the network, and the outside inputs the node receives".

Following the engineers, we say that the system is controllable if it can be moved from any state vector x to any other state vector x', in a finite time, by applying the proper input signal u(t). (This abstracts from questions about deciding what state to put it in, or for that matter about how we know what state it starts in ["observability"].) Liu et al. asked how the graph --- the pattern of non-zero links between nodes --- affects controllability. It's easy to see that it has to matter some: to give a trivial example, imagine that the nodes form a simple feed-forward chain, x1 -> x2 -> ... -> xN-1 -> xN, only the last of which gets input. This system cannot then be controlled, because there is no way for an input at the last node to alter the state at any earlier one. Liu et al. went through a very ingenious graph-theoretic argument to try to calculate how many distinct inputs such linear networks need, in order to be controlled.

Their conclusions are telegraphed in their abstract, which however does not play up one of their claims very much: namely, the minimum number of inputs needed is usually, they say, very large,, a substantial fraction of the number of nodes in the network. This is, needless to say, bad news for anyone who actually has a dynamical system on a complex network which they want to control.

Before we start making too much of this (I can already imagine the mangled David Brooks rendition, if it hasn't appeared already), it's worth pointing out a slight problem: the Liu et al. result is irrelevant to any real-world network.

Noah J. Cowan, Erick J. Chastain, Daril A. Vilhena, James S. Freudenberg, Carl T. Bergstrom, "Controllability of Real Networks", arxiv:1106.2573
Abstract Liu et al. have forged new links between control theory and network dynamics by focusing on the structural controllability of networks (Lui et al., Nature:473(7346), 167-173, 2011). Two main results in the paper are that (1) the number of driver nodes, ND, necessary to control a network is determined by the network's degree distribution and (2) ND tends to represent a substantial fraction of the nodes in inhomogeneous networks such as the real-world examples considered therein. These conclusions hinge on a critical modeling assumption: the dynamical system at each node in the network is degenerate in the sense that it has an infinite time constant, implying that its value neither grows nor decays absent influence from inbound connections. However, the real networks considered in the paper---including food webs, power grids, electronic circuits, metabolic networks, and neuronal networks---manifest dynamics at each node that have finite time constants. Here we apply Liu et al.'s theoretical framework in the context of nondegenerate nodal dynamics and show that a single time-dependent input is all that is ever needed for structural controllability, irrespective of network topology. Thus for many if not all naturally occurring network systems, structural controllability does not depend on degree distribution and can always be conferred with a single independent control input.
(Disclaimer: Carl is a friend, and I've often plugged his work over the years [1, 2, 3, 4]. However, the opinions in this post are mine, not his.)

Look at the equation for the Liu et al. model: x, the state of the node in question, does not appear on the right-hand side. This means that, in their model, nodes have no internal dynamics --- they change only due to outside forces, otherwise they stay put wherever they happen to be. A more typical linear model, which does allow for internal dynamics, would be

\[ 
\frac{dx_i(t)}{dt} = -p_i x_i(t) + \sum_{k=1}^{N}{a_{ik}x_k(t)} + \sum_{j=1}^{P}{b_{ij}u_j(t)} 
 \]
In words, "the change in the state of each node depends on its present state, the state of its neighbors in the network, and the outside inputs the node receives". This is of course a far more typical situation than the current state of a node being irrelevant to how it will change.

This seems like a very small change, but it has profound consequences for these matters. As Cowan et al. say, one can actually bring this case within the mathematical framework of Liu et al. by treating the internal dynamics of each node as a loop from the node to itself. Doing so has the immediate consequence (Proposition 1 in Cowan et al.) that any directed network could be controlled with only one input signal. To give a very rough analogy, in the Liu et al. model, a node move only as long as it is being actively pushed on; as soon as the outside force is released, it stops. In the more general situation, nodes can and will move even without outside forcing --- since it's a linear model, the natural motions are combinations of sinusoidal oscillations and exponential return to equilibrium --- and this actually makes it easier to drive the system to a desired state. It is a little surprising that this always reduces the number of input signals needed to 1, but that does indeed follow very directly from Liu et al.'s theorems.

Now, constant readers may have been wondering about why I've not said anything about the linearity assumption. Despite appearances, I actually have nothing against linear models --- some of my best friends use nothing but linear models --- and it seemed perfectly reasonable to me that Liu et al. would work with a linear set-up, at least as a local approximation to the real nonlinear dynamics. Unfortunately, that turns out to be a really bad way to approximate this sort of qualitative property:

Wen-Xu Wang, Ying-Cheng Lai, Jie Ren, Baowen Li, Celso Grebogi, "Controllability of Complex Networks with Nonlinear Dynamics",arxiv:1107.2177
Abstract: The controllability of large linear network systems has been addressed recently [Liu et al. Nature (London), 473, 167 (2011)]. We investigate the controllability of complex-network systems with nonlinear dynamics by introducing and exploiting the concept of "local effective network" (LEN). We find that the minimum number of driver nodes to achieve full control of the system is determined by the structural properties of the LENs. Strikingly, nonlinear dynamics can significantly enhance the network controllability as compared with linear dynamics. Interestingly, for one-dimensional nonlinear nodal dynamics, any bidirectional network system can be fully controlled by a single driver node, regardless of the network topology. Our results imply that real-world networks may be more controllable than predicted for linear network systems, due to the ubiquity of nonlinear dynamics in nature.
The local effective network works just like you'd imagine: basically, take the nonlinear dynamics, linearize them by a Taylor series around your favorite operating point, and treat the non-zero-to-first-order couplings as edges. After that point, everything goes very much as in the Cowan et al. paper, which, oddly, is not cited. (Wang et al.'s language, drawn from nonlinear dynamics, is much more homely and familiar to me than is Cowan et al.'s, which comes from control systems, but under the words the stories are the same.)

As Cowan et al. go on to observe, being controllable is an entirely qualitative property --- it says "there exists a control signal", not "there exists a control signal you could ever hope to apply". There are several ways of quantifying how hard it is to control a technically-controllable system, and this seems unavoidably to depend on much more information than just that provided by the network's degree distribution, or even the full graph of the network. This would be particularly true of nonlinear systems, which of course are most of the interesting ones.

So, to sum up, there were two very striking and interesting claims in the Liu et al. paper: (i) that the degree distribution alone of a network gives us deep insight into its a specific aspect of its dynamics, and (ii) this shows that most complex networks are very hard to control. What both the follow-up papers show is that (ii) is wrong, that with this sense of "control", you can, generically, control an arbitrarily complex network by manipulating just a single input signal. But this, together with the recognition that we need to get beyond this very qualitative notion of control, also undermines (i). That to me is rather disappointing. It would have been great if we could have inferred so much from just the degree distribution. (It would have given us a good reason to care about the degree distribution!) Instead we're back to the messy situation where ignoring the network leads us into error, but merely knowing the network doesn't tell us enough to be useful, and non-network details matter. Back, I suppose, to the science.

Aside I may regret later: Barabási really does not have a great track record when it comes to Nature cover-stories, does he? But, if past trends hold good, neither the Cowan et al. nor the Wang et al. paper have any chance of appearing in that journal.

Manual trackback: Resilience Science

Update, 29 July 2011: I should have been clearer above that the paper by Wang et al. is not written as a comment on the original Nature paper, unlike that by Cowan et al.

Update, 30 August 2011: I haven't had a chance to read it, but I thought it only right to note the appearance of "Comment on 'Controllability of Complex Networks with Nonlinear Dynamics'," by Jie Sun, Sean P. Cornelius, William L. Kath, and Adilson E. Motter (arxiv:1108.5739).

Networks; Complexity; Mathematics

Posted by crshalizi at July 13, 2011 19:35 | permanent link

July 11, 2011

Advanced Data Analysis from an Elementary Point of View: Self-Evaluation and Lessons Learned

Accidentally left in my drafts folder for two months. I still haven't looked at my student evaluations.

Now that most of the final exams are graded, but before I've gotten to see my student evaluations, it seems like a good time to reflect on the class. Also, I have had enough May wine, with woodruff from my garden, that the prospect of teaching it again next year can be greeted with equanimity.

First, and conditioning everything else, this was by far the largest class I've taught (70 students), and to the extent it went well it's entirely due to my teaching assistants, Gaia Bellone, Shuhei Okumura and Zachary Kurtz. I'd say I couldn't thank them enough, but clearly I'll have to do 30--40% better than that next year, when there will be between 90 and 100 students. (Memo to self: does the university allow me to pay bonuses to TAs in whiskey?)

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at July 11, 2011 17:15 | permanent link

Of the Identification of Parameters

Attention conservation notice: 1700-word Q-and-A on technical points of statistical theory, prompted by a tenuous connection to recent academic controversies.

Q: What is a statistical parameter?

A: The fundamental objects in statistical modeling are probability distributions, or random processes. A parameter is a (measurable) function of a probability distribution; if you want to be old-fashioned, a "functional" of the distribution. For instance, the magnitudes of various causal influences ("effects") are parameters of causal models.

Think of these distributions as being like geometrical figures, and the parameters as various aspects of the figures: their volume, or area in some cross-section, or a certain linear dimension.

Q: So I'm guessing that whether a parameter is "identifiable" has something to do with whether it actually makes a difference to the distribution?

A: Yes, specifically whether it makes a difference to the observable part of the distribution.

Q: How can a probability distribution have observable and unobservable parts?

A: We specify models involving the variables we think are physically (biologically, psychologically, socially...) important. We don't get to measure all of these. Fixing what we can observe, each underlying distribution induces a distribution on the variables we do measure, the observables. In the analogy, we might only get to see the shadows cast by the geometric figures, or see what volume they displace when submerged in water.

Q: And how does this relate to identifiability?

A: Every (measurable) functional of the observable distribution is identifiable, because, in principle, what we can observe gives us enough information to work it out, or identify it. Every parameter of the underlying distribution which is not also a parameter of the observable distribution is unidentifiable, or unidentified.

In the analogy, if we know all the figures are boxes (i.e., rectangular prisms), but we only get to see their displacement, then volume is identifiable, but breadth, height and width are not. It is not a matter of not having enough data (not measuring the displacement precisely enough); even knowing box's volume exactly would not, by itself, tell us the height of the box.

Q: Are all identifiable parameters equally easy to estimate?

A: Not at all. For real-value parameters, the natural quantification of identifiability is the Fisher information, i.e., the expectation value of the second derivative of the log-likelihood with respect to the parameter. (In general the first derivative is zero.) But this seems like, precisely, a second-order issue after identifiability as such. Of course, if a parameters is unidentifiable, the derivative of the log-likelihood with respect to it is zero. But at this point we are leaving the clear path of identifiability for the thickets of estimation theory, and had better get back on track.

Q: So is identifiability solely a function of what's observable?

A: No, it depends on the combination of what we can measure and what models we're willing to entertain. If we observe more, then we can identify more. Thus if we can measure the volume of a box and its area in horizontal cross-section, then we can identify its height (but not its breadth or width). But likewise, if we can rule out some possibilities a priori, then we can identify more. If we can only measure volume, but know the box is a cube, then we can find height (and all its other dimensions). Of course we could also identify height from volume and the assumption that the proportions are 1:4:9, like the monolith in 2001.

Q: I get why expanding the observables lets you identify more parameters, but restricting the set of models to get identification seems to have "all the benefits of theft over honest toil". Do people really report such results with a straight face?

A: Identifying parameters by restricting the models we entertain is just as secure as those restrictions. If we have good actual reasons for the restrictions, then it would be silly not to take advantage of that. On the other hand, restricting models simply to get identifiability seems quite contrary to goals of science, since it is as important to admit what we do not yet know as to mark out what we do. At the very least, these are the sorts of hypotheses which need to be checked — and which must be checked with other or different data, since, by non-identifiability, the data in question are silent about them. (If you are going to assume all boxes are cubes, you should check that; but looking at their volumes won't tell you whether or not they are cubes. That data is indifferent between your sensible cubical hypothesis and the idle fancies of the monolith-maniac.)

Q: Couldn't we get around non-identifiability by Bayesian methods?

A: Expressing "soft" restrictions by a prior distribution about the unidentified parameters doesn't actually make those parameters identified. Suppose, for instance, that you have a prior distribution over the dimensions of boxes, p(B,H,W). The three parameters B,H,W completely characterize boxes, and in this are equivalent to the three parameters of volume V = BHW and the two proportions or ratios h = H/B and w = W/B. Thus the prior p(B,H,W) is equivalent to an unconditional prior on volume multiplied by a conditional prior on the proportions, p(V) p(h, w|V). Since the likelihood is a function of V alone, Bayesian updating will change the posterior distribution over volumes, but leave the (volume-conditional) distribution over proportions alone. This reasoning applies more generally: the prior can be divided into one part which refers to the identifiable parameters, and another which refers to the purely-identifiable parameters, and learning only updates the former. (If a Bayesian agent's prior prejudices happen to link the identified parameters to the unidentified ones, its convictions about the latter will change, but strictly through those prior prejudices.) The prior over the identifiable parameters can and should be tested; that over the unidentified ones cannot. (Not with that data, anyway.)

Q: If a parameter is unidentified, why bother with it at all? Why not just use Occam's Razor to shave them away?

A: That seems like an excess of positivism. (And I say this as someone who is sympathetic to positivism.) After all, which parameters are identifiable depends on what we can observe. It seems excessive to regard boxes as one-dimensional when we can only measure displaced volume, but then three-dimensional when we figure out how to use a ruler.

Q: Still, shouldn't there be a presumption against the existence or importance of unidentifiable parameters?

A: Not at all. It is very common in politics to simultaneously assert that the electorate leans towards certain parties in certain years; that people born in certain years have certain inclinations; and that people's political inclinations go through a certain sequence as they age. If we admit all three kinds of processes, we have to try to separate the effects on political opinions of people's age, the year they were born (their cohortperiod). The problem is that (e.g..) everyone who will be 45 years old in 2012 was born in 1967, so there is no way to separate the effects of being 45 years old in 2012 (age+period) from being born in 1967 (cohort). Any two of the effects of age, period and cohort are identifiable if we rule out the third a priori; if we allow that all three might matter, we are not able to identify their effects.

Q: I fail to see how this isn't actually an example in favor of my position — people think these are three different effects, but they're just wrong.

A: We can break this sort of impasse by specifying more detailed mechanisms (and hoping we get more data). For instance, suppose that people tend to become more politically conservative as they age, but that this is because they accumulate more property as they grow older. Then, with data on property holdings, we could separate the effects of cohort (were you born in 1967?) and age (are you 45?) from period (are you voting in 2012?), because aging influences political opinions not through a mysterious black box but through an observable mechanism. Or again, there are presumably mechanisms which lead to period effects, as in Hibbs's "Bread and Peace" election model. (Even if that model is wrong, it illustrates the kind of way a more elaborate theory can bring evidence to bear on otherwise-unidentifiable questions.) Of course these more elaborated, mechanistic theories need to be checked themselves, but that's science.

Q: So, what does all this have to do with the social-contagion debate?

A: What Andrew Thomas and I showed is that the distinction between the effects of homophily and those of social influence or contagion is unidentifiable in observational (as opposed to experimental) data. This, to my way of thinking, is a much more consequential problem for claims that such-and-such a trait is socially contagious than doubts about whether this-or-the-other significance test was really appropriate; it says that the observational data was all irrelevant to begin with. Instead, trying to attribute shares of the similarity between social-network neighbors to influence vs. pre-existing similarity is just like trying to say how much of the volume of a box is due to its height as opposed to its width — it's not really a question data could answer. It could be that we could use other evidence to show that most boxes are cubes, but that's a separate question. No amount of empirical evidence about the degree of similarity between network neighbors can tell us anything about whether the similarity comes from homophily or influence, just as no amount of measuring the volume of boxes can tell us about their proportions.

Q: Mightn't there be assumptions about how social influence works, or how social networks form, which let us estimate the relative strengths of social contagion and homophily?

A: There might be indeed; we hope to find them; and to find external checks on such assumptions. Discovering such cross-checks would be like finding ways of measuring the volume of a geometrical body and and its horizontal cross-section. Andrew and I talk about some possibilities towards the end of our paper, and we're working on them. So I'm sure are others.

Q: I find your ideas intriguing; how may I subscribe to your newsletter?

A: For more, see Partial Identification of Parametric Statistical Models; my review of Manski on identification for prediction and decision; and Manski's book itself.

Engimas of Chance; Networks; Dialogues

Posted by crshalizi at July 11, 2011 13:27 | permanent link

July 06, 2011

Updates!

Some updates:

As you were.

Manual trackback: Brad DeLong; Azimuth; Three Quarks Daily

Networks; Self-Centered; Linkage

Posted by crshalizi at July 06, 2011 15:44 | permanent link

June 30, 2011

Books to Read While the Algae Grow in Your Fur, June 2011

Attention conservation notice: I have no taste.

Sarah Vowell, Assassination Vacation
Tourism inspired by the presidential assassinations of Lincoln, Garfield and McKinley. Naturally, especially since it was written during the worst of the Iraq War and Bush administration, the book is really about what America means, or should.
Patrick O'Brian, Master and Commander and Post Captain
I used to re-read these once a year. I should probably resume doing so.
ObLinkage: Jo Walton re-reading the whole series.
I. J. Parker, The Masuda Affair and The Fires of the Gods
Mind-candy. Continuing historical mystery novels set in Heian Japan. Akitada is kind of a jerk through the first half or more of The Masuda Affair, but that's sort of the point.
Sarah Zettel, The Quiet Invasion
Humans who have established a research settlement in the clouds on Venus meet aliens from a Venus-like world who want to colonize Venus; the situation develops in a manner not necessarily to anyone's advantage. (This is not a spoiler.) As always with Zettel, the story is engaging, the characters are well-developed and mostly do not line up nearly into good guys and bad guys, the science is harder than Larry Niven*, and a desirable future is attained without destroying most of the human race.
Scattered remarks: (1) Libertarianism seems like a very stupid revolutionary ideology for human colonists utterly dependent on each other and on supply lines to Earth. (Creating an industrial infrastructure on another planet would mean paying all the R&D costs of figuring out how to do everything Earth already does, in a vastly more hostile environment, and then the construction costs of building it, simply to save on shipping. Capitalism will only do this for things where shipping is very, very expensive.) (2) The alien's economic system of "promises" is a nice conceit, and helps establish that they are not like us (they pretty much do live in an Arrow-Debreu economy), but I can't help wondering what will happen once some of their new friends introduce them to the concept of money.
*: Here, Zettel has interstellar teleportation portals, which is obviously not good from a scientific-accuracy point of view, but otherwise things are quite plausible. (Her Venus, in particular, draws heavily on Grinspoon's excellent Venus Revealed.) Niven's Known Space, on the other hand, had faster than light travel, human psionics, interspecies telepathic mind control, stasis fields, selective breeding for luck, an account of human origins and senescence more worthy of Madame Blavatsky than anyone who had ever heard of the sciences of genetics or comparative anatomy, and multiple physically impossible (but plot-necessary) materials. (I am indebted to James Nicoll and Carlos Yu for these examples.) This is giving him a pass on technologized microscopic black holes.
Lois McMaster Bujold, Shards of Honor
I hadn't re-read this in so long I'd forgotten just how good it was. It astonishes me just that it was a first novel. (My favorite line, of many great lines: "I am an atheist, myself. A simple faith, but a great comfort to me, in these last days.")
Sarah Langan, The Missing
Mind-candy, extra dark. Like her earlier The Keeper, to which this is a loose sequel, this is structured as an outbreak narrative (the UK title is The Virus) set in a small town in Maine. In this as in many other ways, we are clearly in Stephen King country, and perhaps the best way to convey the impression it left on me is for you to imagine King minus the sentiment about children and the faith that there is Something opposing his horrors, and plus a background in environmental epidemiology and some serious anger at the patriarchy. These are, to be clear, good things, and her take on the infection-which-turns-people-into-monsters was genuinely creepy and scary. ROT-13'd spoiler-ish quibbling: fur qvq abg, ubjrire, znantr gb znxr vg pbaivapvat gung gur pbagntvba pbhyq trg bhg bs pbageby ba n angvbany fpnyr, ubjrire — abe qvq gung frrz gb or arprffnel sbe ure fgbel.
Diarmaid MacCulloch, Christianity: The First Three Thousand Years
Beginning (as the sub-title hints) with the background of ancient Israel and ancient Greece, and continuing down to the present day, covering every branch he can think of, with evident sympathy and at least the appearance of learning. Like his earlier book The Reformation (surprisingly little of which gets repeated here), this aims to be a truly global history, so he gives considerable space to groups like the Nestorians, to the Ethiopian Church, etc. During late antiquity and the medieval period, eastern and western orthodoxy get equal billing, and if the west takes up more room later, it is because the west expanded (and diversified) in ways the east didn't.
MacCulloch likes: toleration, pluralism and ecumenical outreach; personal humility and charity; churches which remember that the Kingdom is not of this world, but struggle against injustice anyway; mysticism (but not obscurantism); music, architecture and painting; and counterfactuals. (Even more than The Reformation, this book is full of remarks about the different paths that various branches of Christianity could have gone, but didn't. None of these claims seem crazy, but their epistemic basis is often unclear.) He dislikes: Biblical (pseudo-) literalism; religion in the service of nationalism, and of temporal power more generally; spiritual arrogance; iconoclasm; aggressive opposition to organized religion.
Lucy A. Snyder, Shotgun Sorceress
Mind-candy. In which our heroine, having previously harrowed the hells of Ohio, and acquired (ROT-13'd spoilers) n yvgreny wrjryrq rlr naq unaq bs synzr juvpu vf abg dhvgr haqre ure pbageby jura fur trgf, yrg hf fnl, rkpvgrq, confronts the forces of darkness in east Texas. It ends in media res, and I have pre-ordered the sequel.
Christa Faust, Money Shot
Mind-candy. Hard-boiled thriller set in the porn-filming sub-culture of contemporary LA. Despite the setting, it has basically no sexually arousing content, which is part of the point. If a fast-paced story of violent, single-minded revenge without the least trace of personal redemption sounds appealing, you will like this. (Since the professional novelists who provided the blurbs for this book have already made every possible porn-related pun, I can skip that.)
Carrie Vaughn, Kitty Goes to War
Mind-candy. In which our heroine, and the city of Denver, must confront the consequences of U. S. Army planners in Afghanistan not having seen Dog Soldiers. Oh, and a chain of nefarious convenience stores. (Previously.)
Amanda Downum, The Drowning City and The Bone Palace
Mind-candy. It's hard out there for a necromancer. (The city of the first book is a bit like New Orleans, inhabited by Sumatrans. That of the second is Constantinople, in some favorable slice of Byzantine history which is probably not profitable to try to identify.)
Evan Dorkin and Jill Thompson, Beasts of Burden: Animal Rites
Julia Spencer-Fleming, One Was a Soldier
I got about half-way through this before realizing that I was reading, and engrossed by, a perfectly ordinary, perhaps even literary, novel about small-town life and post-traumatic stress. Then, fortunately for my genre-reading ways, some murders happened and got satisfyingly solved. (Previously.)
Hubert M. Blalock, Causal Inferences in Nonexperimental Research
Technically obsolete, of course --- it's from 1961! --- but full of good sense, in no small part because it draws so heavily on Herbert Simon (especially the Cowles Commission papers, "Causal Ordering and Identifiability" and "Spurious Correlation: A Causal Interpretation"), and writes so many graphical models. The aggregation procedure Blalock talks about as a way of checking confounding sounds like a cross between instrumental variables and Pearl's front-door criterion, and it would be interesting to step through it to see what it does identify, if someone hasn't done so already. Recommended for those interested in the history of causal inference, and how ideas come to be behind their time.
Correction: Oddly, Blalock writes that if two variables, say, X and Y, are uncorrelated, but both make a positive contribution to a third, say Z, then the causes are positively correlated conditional on the effect. This is wrong; they are negatively correlated. The easiest way to see this is to imagine the limiting case where Z = X + Y, with no noise; then, conditional on Z=z, X = z - Y and the causes are perfectly negatively correlated. (Blalock seems to have been mislead by the idea that a high value for the effect, Z, makes it more probable that both causes are large, which is true but not relevant to the conditional correlation between the causes.) I do not believe, however, that Blalock's arguments ever rely on the sign of the partial correlation between causes, merely that it is non-zero.

Books to Read While the Algae Grow in Your Fur; Scientfiction and Fantastica; The Pleasures of Detection; Writing for Antiquity; The Commonwealth of Letters; The Beloved Republic; Enigmas of Chance

Posted by crshalizi at June 30, 2011 23:59 | permanent link

June 29, 2011

Knights, Muddy Boots, and Contagion; or, Social Influence Gets Medieval

Three papers have appeared recently, critiquing methods which people have been using to try to establish social influence or social contagion: "The Spread of Evidence-Poor Medicine through Flawed Social Network Analysis" (arxiv:1007.2876) by Russell Lyons; "The Unfriending Problem" by Hans Noel and Brendan Nyhan (arxiv:1009.3243); and "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies" (arxiv:1004.4704, blogged about here) by Andrew Thomas and myself. All three were of course inspired by the works of Nicholas Christakis, James Fowler and collaborators. This has lead to a certain amount of chatter online, including rash statements about how social influence may not exist after all. That last is silly: to revert to my favorite example of accent, there is a reason that my Pittsburgh-raised neighbors say "yard" differently than my friends from Cambridge, and it's not the difference between drinking from the Monongahela rather than the Charles. Similarly, the reason my first impulse when faced with a causal inference problem is to write out a graphical model and block indirect paths, rather than tattooing counterfactual numbers in invisible ink on my experimental subjects, is the influence of my teachers. (Said differently: culture happens.) So, since we know social influence exists and matters, the question is how best to study it.

Fortunately, once consequence of this recent outbreak of drama is a very long and thoughtful message from Tom Snijders to the SOCNET mailing list. Since there is a public archive, I do not think it is out of line to quote parts of it, though I would recommend anyone interested in the subject to (as the saying goes) read the whole thing:

What struck me most in the paper by Lyons ... are the following two points. The argument for social influence proposed by Christakis and Fowler (C&F) that earlier I used to find most impressive, i.e., the greater effect of incoming than of outgoing ties, was countered: the difference is not significant and there are other interpretations of such a difference, if it exists; and the model used for analysis is itself not coherent. This implies that C&F's claims of having found evidence for social influence on several outcome variables, which they already had toned down to some extent after earlier criticism, have to be still further attenuated. However, they do deserve a lot of credit for having put this topic on the agenda in an imaginative and innovative way. Science advances through trial and error and through discussion. Bravo for the imagination and braveness of Nick Christakis and James Fowler.

...Our everyday experience is that social influence is a strong and basic aspect of our social life. Economists have found it necessary to find proof of this through experimental means, arguing (Manski) that other proofs are impossible. Sociologists tend to take its existence for granted and are inclined to study the "how" rather than the "whether". The arguments for the confoundedness of influence and homophilous selection of social influence (Shalizi & Thomas Section 2.1) seem irrefutable. Studying social influence experimentally, so that homophily can be ruled out by design, therefore is very important and Sinan Aral has listed in his message a couple of great contributions made by him and others in this domain. However, I believe that we should not restrict ourselves here to experiments. Humans (but I do not wish to exclude animals or corporate actors) are purposive, wish to influence and to be influenced, and much of what we do is related to achieve positions in networks that enable us to influence and to be influenced in ways that seem desirable to us. Selecting our ties to others, changing our behaviour, and attempting to have an influence on what others do, all are inseparable parts of our daily life, and also of our attempts to be who we wish to be. This cannot be studied by experimental assignment of ties or of exchanges alone: such a restriction would amount to throwing away the child (purposeful selection of ties) with the bathwater (strict requirements of causal inference).

The logical consequence of this is that we are stuck with imperfect methods. Lyons argues as though only perfect methods are acceptable, and while applauding such lofty ideals I still believe that we should accept imperfection, in life as in science. Progress is made by discussion and improvement of imperfections, not by their eradication.

A weakness and limitation of the methods used by C&F for analysing social influence in the Framingham data was that, to say it briefly, these were methods and not generative models. Their methods had the aim to be sensitive to outcomes that would be unlikely if there were no influence at all (a sensitivity refuted by Lyons), but they did not propose credible models expressing the operation of influence and that could be used, e.g., to simulate influence processes. The telltale sign that their methods did not use generative models is that in their models for analysis the egos are independent, after conditioning on current and lagged covariates; whereas the definition of social influence is that individuals are not independent....

Snijders goes on, very properly, to talk about the models he and his collaborators have been developing for quite a few years now (e.g.), which can separate influence from homophily under certain assumptions, and to aptly cite Fisher's dictum that the way to get causal conclusions from observations studies is to "Make your theories elaborate" --- not give up. Lyons's counsels of perfection and despair are "words of a knight riding in shining armour high above the fray, not of somebody who honours the muddy boots of the practical researcher". (Again, if this sounds interesting, read the full message.) I agree with pretty much everything Snijders says, but feel like adding a few extra points.

  1. It is of course legitimate to make modeling assumptions, but that one then needs to support those assumptions with considerations other than their convenience to the modeler. I see far too many papers where people say "we assume such and such", get results, and don't try to check whether their assumptions have any basis in reality (or, if not, how far astray that might be taking them). Of course the support for assumptions may be partial or imperfect, might have to derive in some measure from different data sources or even from analogy, etc., through all the usual complications of actual science. But if the assumptions are important enough to make, then it seems to me they are important enough to try to check. (And no, being a Bayesian doesn't get you out of this.)
  2. As we say in our paper, I suspect that much more could be done with the partial-identification or bounds approach Manski advocates. The bounds approach also seems more scientifically satisfying than many sensitivity analyses, which make almost as many restrictive and unchecked assumptions as the original models. Often it seems that this is all that scientists or policy-makers would actually want anyway, and so the fact that we cannot get complete identification would not be so very bad. I wish people smarter than myself would attack this for social influence.
  3. It would be very regrettable if people came away from this thinking that social network studies are somehow especially problematic. On the one hand, as shown in Sec. 3 of our paper, when social influence and homophily are both present, individual-level causal inference which ignores the network is itself confounded, perhaps massively. (I've been worrying about this for a while.) But the combination of social influence and homophily would seem to be the default condition for actual social assemblages, while individual-level studies from (e.g.) survey data have become the default mode of doing social science.
    On the other and more positive side, we have it seems to me lots of examples of successfully pursuing scientific, causal knowledge in fields where experimentation is even harder than in sociology, such as astronomy and geology. Perhaps explaining the clustering of behavior in social networks is fundamentally harder than explaining the clustering of earthquakes, but we're even more at the mercy of observation in seismology than sociology.

Manual tracback: Slate; A Fine Theorem

Networks; Enigmas of Chance

Posted by crshalizi at June 29, 2011 13:24 | permanent link

June 03, 2011

A Scientific Prediction, Uninfluenced by Sentiment or Wishful Thinking

Within a year, Kanazawa will have a fellowship at the American Enterprise Institute (where he'll fit right in); he will not have learned anything about factor models, or data analysis, or indeed anything else. I suspect he will also have a book under way about how the politically correct hordes drove him from England, in which he will compare himself to Galileo, but that is not so securely supported by my model.

Oh, and as for Henry's point, I feel like I should offer a back-handed defense of evolutionary psychology. It's true that a field where Kanazawa could get away with so much for so long as nothing to be proud of, but it's not at all clear that evolutionary psychology is actually worse in this regard than other branches of psychology, in some which the mistakes are much more prenicious and much more entrenched. Or that psychology is any worse than other fields; I will plug, once again, Hamilton's The Social Misconstruction of Reality: Validity and Verification in the Scholarly Community. (Nonetheless I would not be surprised if standard really were lower in evolutionary psychology than elsewhere.)

Context; more context; yet more context.

Anticontrarianism; Learned Folly; The Natural Science of the Human Species; The Running-Dogs of Reaction

Posted by crshalizi at June 03, 2011 22:01 | permanent link

May 31, 2011

Books to Read While the Algae Grow in Your Fur, May 2011

Attention conservation notice: I have no taste.

K. A. Stewart, A Devil in the Details
Merline Lovelace, Now You See Her and Catch Her if You Can
Mind-candy, assorted. I imagine that having worked on a DARPA grant for several years probably made the Lovelace books more amusing to me.
Stephen King, The Gunslinger and The Drawing of the Three
"The man in black fled across the desert, and the gunslinger followed..." I do not care for the revisions — I see why King made them, but I think they are nonetheless mistakes — but I think the sory is still there, and still great.
Laura Bickle, Embers and Sparks
Mind-candy. Protecting Detroit from supernatural menaces, whether it wants it or not. (Arson investigation has long been a particularly thankless task in mundane Detroit.)
Greg Rucka and Matthew Southworth, Stumptown: The Case of the Girl Who Took her Shampoo (But Left her Mini)
Mind-candy: a love-letter to Portland, in the form of a comic-book detective story, which does not insult the reader's intelligence.
Thomas R. Rochon, Culture Moves: Ideas, Activism, and Changing Values [official blurb, with some previews]
A book about how intellectuals develop new values and social movements spread and entrench them, explicitly inspired by the great moral regeneration of this country in the second half of the twentieth century, at the hands of the civil rights movement, feminism and environmentalism, but also looking at other, more or less successful, progressive movements. (Various right-wing movement actually fit his schema fairly well, but are only mentioned in passing.) The basic idea is that "critical communities come" up with new ideas about values (mostly by changing the application of old values, sometimes by bringing new areas under the domain of valuation in the first place, as the environmentalists did), but that then these get taken up and spread by social movements, which transform their participants in the process.
The book is interesting, decently written for sociology, and reasonably convincing. I suspect that the scheme is too tidy --- it seems overly influenced by the French Enlightenment, and the idea of a vanguard party, and to give insufficient weight to how activists themselves develop and change ideas through struggling to realize them --- but not at all absurd.
The data analysis, however, annoys me. Why oh why do people insist on taking (Pearson) correlation coefficients among categorical variables? The one which killed me is when Rochon used what sounds like a fantastic longitudinal data set to show that participating in protests in the 1960s and 1970s had long-term impacts on the values and attitudes of then-young adults. The problem is that those who participated in the protests began as detectably different from non-protestors, so seeing that protestors and non-protestors later diverged doesn't get at the impact of being in the movement --- we could just be seeing two developmental trajectories, one more common among the protestors, the other less. The right comparison (for once) would be to match protestors with those who were similar before the protest, but did not, as it turned out, join the movement, and look at how they diverged. This may not make a big difference in the end, but it wouldn't've killed him to do it right, would it?
Noel Maurer and Carlos Yu, The Big Ditch: How America Took, Built, Ran, and Ultimately Gave Away the Panama Canal
An economic history of the construction and operation of the Canal, centering on how much it was worth to the US, and how much we paid for it. "Worth" here is evaluated by asking how much more it would have cost to move the same flows of goods by the next-cheapest alternative, which for the earlier part of the story meant a combination of trans-continental railroads, and sailing around South America. Initially, the Canal was a great deal for the US --- and, unsurprisingly, an even better deal on the terms we extracted from our puppet state of Panama than we could have gotten from Colombia. By the post-WWII era, however, it was no longer so important for us, in no small part because of the creation of the Interstate highway system. Giving it back made sense, but was hampered by the lobbying of the rent-capturing, and solidly reactionary, American "Zonians", as well as a wide-spread distate for abandoning an imperial possession. As they put it in a nice phrase (which I cannot re-find right now for exact quotation), the Canal went from being a tool of American national defense, to a symbol of defensive American nationalism. America never really operated the Canal on a profit-maximizing basis, at first deliberately (the un-extracted surplus for users went into the American economy), and then by default (institutional capture by the Zonians). Panama, not being so constrained, has done very well on this score, at least post-Noriega.
No technical knowledge of economics is needed to read this book;
Disclaimer: Carlos is an on-line acquaintance, but I bought by own copy of the book and have no stake in its success.
David Easley and Jon Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World
I'm reviewing this for American Scientist, so I can't write much here now (but will update when the review comes out). I will just say that this is at once a very impressive achievement, and marred by the economist's uncontrollable urge to tell lies to children (and undergraduates, and lay-people...) about economics, without admitting that they are lies-told-to-children.
Update: and the review is out.
Scott Westerfeld, Leviathan
World War I re-imagined as a young adult fantasy epic, complete with a trans-continental quest, a Lost Heir and a plucky heroine disguised as a boy. — That sounds more dismissive than is really fair; it was a perfectly enjoyable distraction from exercise, and after a century (if not less), every tragedy drifts into the mythic background... The conceit here is that Darwin discovered DNA ("life chains") and genetic engineering, so that the technology of the "Darwinist" powers (Britain, France and Russia) is based on genetically modified organisms, or purpose-built ecosystems, while the "klanker" central powers (Germany, Austria-Hungary, the Ottomans) all use machinery. To descend to the level of geekish nit-picking, while Westerfeld clearly enjoys his "fabricated beasties", I don't think he's adequately thought through the premise. Metal's advantage over flesh and bone is that it is much harder, much sharper, and can tolerate much higher temperatures. This means metal plus combustion (or even batteries) can deliver much more power (energy per unit time) --- more acceleration, higher velocities, more firepower. A fabricated animal couldn't out-run or out-gun even an actual WWI-vintage tank, so if it was to have a comparative advantage, it would have to lie in being self-replicating; Darwinist tactics would have to rely completely on swarms, rather than huge hydrogen-breathing airships. But while I personally think that hybridizing The Glass Bees and Daedalus makes an awesome premise for a young-adult novel, I suspect that's very much a minority taste.
Sequels.
Lois McMaster Bujold, The Curse of Chalion, The Paladin of Souls, The Hallowed Hunt
Theological fantasy, with historical settings based on late-medieval Spain (the first two) and post-Carolingian Germany (the third). Re-read to celebrate the end of classes. Intensely satisfying, as Bujold almost always is. On re-reading, I particularly admire the way she handles a middle aged heroine in Paladin, though the romantic sub-plot feels less convincing than the rest.
Jack Campbell, The Lost Fleet: Beyond the Frontiers: Dreadnaught
Having wrapped up the Anabasis plot nicely with the last volume, I am not sure where this is headed, but happy to enjoy the ride.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; The Dismal Science; Writing for Antiquity; Networks; Power Laws; Complexity; The Beloved Republic Commit a Social Science; The Progressive Forces; The Collective Use and Evolution of Concepts

Posted by crshalizi at May 31, 2011 23:59 | permanent link

May 29, 2011

Silence = Content (Again, Hopefully)

Why I am so unresponsive lately:

Deadline for the NIPS 2011 conference
7:59 pm Pittsburgh time, Thursday 2 June.
Number of papers I am trying to submit with students and other collaborators
Somewhere between 5 and 7.
Deadline for the research and teaching statements part of my "case" for promotion to associate-professor-without-tenure
Wednesday 1 June.

Working on both of these undertakings reminds me uncomfortably of the gambler making sure the dice know how much he desires a shot at tenure down the road a new pair of shoes.

Update, 3 June: My laptop entered a cataleptic state as of noon on the 31st. I still managed to submit the six papers with co-authors. Blogging will continue to be sparse in the immediate future.

Self-centered

Posted by crshalizi at May 29, 2011 10:30 | permanent link

May 08, 2011

Lyric Poetry in Pluto's Republic

Attention conservation notice: 3200 words on a silly academic paper about popular music and narcissism. Contains complaints about bad data analysis, firm statements about writing poetry from someone who can't, and largely unsupported gloomy reflections about the condition of the house of intellect.

Let me begin with a quotation from one of my favorite books:

A good many years ago a neighbour whose sex chivalry forbids me to disclose exclaimed upon learning of my interest in philosophy: `Don't you just adore Pluto's Republic?'

Pluto's Republic has remained in my mind ever since as a superlatively apt description of [the] intellectual underworld.... We each populate Pluto's Republic according to our own prejudices: for me its most prominent citizens are IQ psychologists.... Other prominent citizens include all practitioners of `scientism', especially those who apply what they mistakenly believe to be the methods of science to the investigation of matters upon which science has no bearing whatsoever...

— Peter Medawar, Pluto's Republic, p. 1

I have no taste, and so a large part of my reading consists of what is frankly mind candy, and a large part of the candy consists of mystery series in which murders are solved by amateur sleuths. It is part of the norms of this genre or tradition that the heroine (and it is almost always a heroine) is a not-too-old woman who pursues a more or less genteel occupation in a small town or not-too-large city, which has an (unremarked-on) rate of violent death comparable to post-invasion Iraq. It is equally a norm of the genre that the novels are narrated in the first person (singular). Two of my other addictions are secondary-world fantasies and space opera science fiction, which use first-person narration far more sparingly*. While one might come up with functional-rhetorical rationales for this contrast (perhaps based on some of the experiments in Bortolussi and Dixon), the proximate explanation is simply genre norms. To argue on this basis that writers, or readers, of amateur-sleuth mysteries are less narcissistic than writers or readers of space opera and lap-breaker fantasies would be stupid.

More specifically, it would be to ignore the fact that, while mind-candy genre novels are perhaps very humble works of art, they are works of art, and, as the poet says, all art is artifice. They are things which are made to achieve certain ends (which may be vague), employing skills and traditions and what one might call internal norms. Even when writers pour their hearts out on to the page, to treat works of art as direct, unmediated expressions of their makers' personalities trembles on the border between utter philistinism and not-safe-to-be-let-outdoors-without-grownup-supervision naivete. And this is true not just of popular novels, but also of poems which demand musical accompaniment and take about three minutes to recite, bringing us to today's reading.

C. Nathan DeWall, Richard S. Pond, Jr., W. Keith Campbell, and Jean M. Twenge, "Tuning in to Psychological Change: Linguistic Markers of Psychological Traits and Emotions Over Time in Popular U.S. Song Lyrics", Psychology of Aesthetics, Creativity, and the Arts online-first (2011)
Abstract: American culture is filled with cultural products. Yet few studies have investigated how changes in cultural products correspond to changes in psychological traits and emotions. The current research fills this gap by testing the hypothesis that one cultural product — word use in popular song lyrics — changes over time in harmony with cultural changes in individualistic traits. Linguistic analyses of the most popular songs from 1980--2007 demonstrated changes in word use that mirror psychological change. Over time, use of words related to self-focus and antisocial behavior increased, whereas words related to other-focus, social interactions, and positive emotion decreased. These findings offer novel evidence regarding the need to investigate how changes in the tangible artifacts of the sociocultural environment can provide a window into understanding cultural changes in psychological processes.

I want to add a few remarks to what Mark Liberman has already said ("Lyrical Narcissism?", "Vampirical Hypotheses", "Pop-culture narcissism again"), first about the methodological inadequacies, then about the statistics, and finally on the larger lessons.

The empirical basis for inferring narcissism from using first person singular pronouns appears to be Robert Raskin and Robert Shaw, "Narcissism and the Use of Personal Pronouns", Journal of Personality 56 (1988): 393--404. This shows that, over twenty years ago, there was a modest positive correlation (+0.26) between scores on a quiz intended to measure narcissism, and how often 48 UC Santa Cruz undergrads used first-person singular pronouns in extemporized five minute monologues. Top 100 songs are not spontaneous monologues by undergrads looking for a painless way to get $5 and/or check off a Psych. 1 requirement, and DeWall et al. offer no evidence that this correlation generalizes to any other context. In particular they offer no reason to think that differences over time, as language and culture changes, should be explained in the same way as these differences across people, at a single time and in a single school.

Let me sketch an analogy. You can measure the height of a building from the length of its shadow, using trigonometry. If you gather a data set of many building heights and shadow lengths taken at nearly the same time of day on the same day of the year, there will in fact be an excellent correlation between the two, and a genuinely linear relationship. (Indeed, the only reason the correlation would be even slightly less than 1 would be measurement noise.) But the relationship between the height of buildings and the length of their shadows depends on where the sun is in the sky. At a different time of day or a different day of the year, you will get a different linear relationship. If you just plug in to your formula blindly, you will get bad estimates of the height. If you were a morning person, and precisely operationalized your initial data as "length of shadow to the west of the building", you would get negative estimated heights in the afternoon, when shadows point to the east. (Sure, it's counterintuitive that buildings are actually sunk below the group, but are you going to argue with the numbers?) On cloudy days, whatever you measured in place of shadows would just be noise.

To draw the moral explicitly, even if there is such a thing as a one-dimensional personality trait of narcissism**, and even if that was correlated with pronoun use in one particular historical population, in one particular social/rhetorical context, that tells us nothing at all about the correlation in other situations. I don't assert that it can't be true, but there is no psychological or statistical reason to presume that it is true, and so it needs to be established. In more psychological terms, thinking otherwise is not so much slipping into the fundamental attribution error as wallowing in it.

Composing a popular song is not coming up with a five-minute off-the-cuff monologue. Lyrics are in fact composed. They are deliberately made to achieve certain effects on the audience, including meshing in certain ways with the music (which is also being composed), they are stylized, and their composition is guided by inherited traditions and formulas of the genre and by individual habits of writing. Those guides and constraints are at once cognitive — it is computationally necessary to cut down the search space (see e.g. Lord or Simon or Boden) — and aesthetic — they are norms (see e.g. Wellek and Warren). The persona of the song or poem is not the personality of the song-writer or poet. (David Byrne is not actually a psycho-killer.) This is true no matter how strong the emotions which motivate the song-writer are, or how lacking the writer may be in self-conscious artistry.

Commercially successful popular songs are artistic compositions which have been filtered through a rather byzantine industry of gate-keepers and intermediaries. The songs which survive this filtration must then be bought by many thousands of people, for their own reasons. The song might succeed by appealing to a single very popular taste; or simultaneously appealing to many different tastes; or, indeed, merely by already being popular.

If the question is whether musicians have become more narcissistic, we need to ask whether more narcissistic musicians compose songs which use first person singular pronouns more often, and, if so, whether this signal survives the filtering process of the music industry. If the question is whether audiences have become more narcissistic, we need to ask whether more narcissistic people prefer songs which use such pronouns more often, and, if so, whether this signal survives the filtering process of the music industry. (Anyone who thinks individual preferences simply translate into aggregate outcomes has simply not been paying attention, and for a very long time at that.) We are so far from the laboratory situation of Raskin and Shaw that it's not even funny.

Let me turn to more specific weaknesses in the logic of the paper.

  1. Should the analysis adjust for genres at all? If consumption or production of songs reflects personality traits, and some genres are inherently more narcissistic, then switching to those genres is itself indicative of changing distributions of personality.
  2. That said, p. 5 provided the following head-desk moment: "Because our multiple regression analysis controlled for genre, any changes in genre over time did not significantly account for our effects". I hope that any one of my undergraduate students in data analysis could, by this point in the semester, pick at least three holes in this without having to think too hard***.
  3. Whose narcissism is being measured here? Is it the consumers of the songs, or their producers? This point is never addressed, and it is hard to see how it could be from this data.
  4. Why does an increase in the use of first-person-singular pronouns in popular songs not indicate increasing empathy, and decreasing self-centeredness, on the part of the audience? (That is, they are better able and more willing to identify with the persona of the song's narrator.)
  5. Why attribute these changes to a modification in the distribution of personality types among the general population, rather than changes in the demographic characteristics of people who buy music?
  6. Why attribute changes to the general population, as opposed to changing personality types among producers? One might sub-divide this further into composers, performers, gate-keepers in music companies, etc.
  7. Why attribute these to changes to changing personality types, as opposed to changing audience expectations about musicians, and suitable modes of expression for them? Maybe neither audiences nor musicians are a whit more narcissistic, but audiences expect their musicians to be.
  8. Why attribute these to an actual change in personality types, as opposed to a change in the perception by producers of the tastes of the commercially-effective audience? (Such a perception might or might not be right.)
  9. Salganik and Watts have shown, experimentally, that song popularity is incredibly stochastic and path-dependent. Musicians learn from each other, and the gate-keepers of the music industry try to copy what works, so there is a large amount of reinforcement or cascading added to an initially extremely noisy process. This is going to be very apt to produce cascades, and "hitchhiking", as well as changing distributions of songs when nothing at all about audience or musician personalities is changing. (Genre norms create an extra mechanism for hitchhiking.)
  10. The relationship between personality traits and language use may just have changed, either in the general culture or in the highly specific and formalized context of song lyrics. As Pennebaker will explain to you, there are many ways in which first-person singular pronouns can be used, carrying many different implications. (Pennebaker is actually cited in this paper, but not at all intelligently.)
For all I know, it's possible to rule out all of these alternative explanations, but they would have to first be recognized and then investigated. (Distinguishing changes in audiences' demand for narcissism from changes in musicians' ability to supply it will be especially tricky.) DeWall et al., however, are oblivious to any possibility other than lyrics reflecting personality, while being vague about whose personality is on display.

So, to sum up, we have basically no reason to think that changes in the use of first-person singular pronouns measure changes in narcissism (certainly not over time or in this context), and a slew of alternative explanations for any changes which might be found, other than "Americans are becoming narcissistic and this is reflected in their popular songs". One might, perhaps, write these off as the excessive scruples which come from over-indulging in skeptical philosophy. Let's have the courage to assume away all these inconvenient possibilities, and look at what the data show.

The centerpiece Figure 1 from DeWall et al. goes like so:

Many journalists seem to have found this very convincing. Fortunately, however, DeWall et al. also provide a table with the mean and standard deviation of the first person pronoun use for each year, and a 95% confidence interval. (They don't say how they calculated the latter, but I'll take them at their word and presume they did that properly.) This lets me plot the actual data, which looks like this:

(My code, in R.) The black dots, joined by lines to guide the eye, are the actual percentages. The dashed lines are the 95% confidence limits. The horizontal grey line is the over-all mean percentage, over the whole data set. The two colored lines are two smoothing spline fits, one with (purple) and one without (blue) giving extra weight to years with smaller standard deviations. Making the smoothing splines requires a little knowledge of statistics; everything else just needs the ability to draw the numbers DeWall et al. provide.

The flat horizontal line is inside the confidence limits in 27 of the 28 years. This is exactly what we would expect if there was no signal here whatsoever, and all fluctuations from year to year were just noise****. (95% coverage per year and 28 years yields 1.4 expected non-coverage events.) There is nothing here to explain; the appearance that there is something in their Figure 1 is one part bad data analysis to one part How to Lie with Statistics-level bad graphing*****.

While perhaps not a truly epic fail, this is not a creditable performance. The paper probes a hugely complex tangle of issues relating individual minds, communication, social norms, artistic expression, social change and cultural transformation. There is no shame in not unraveling the whole snarl at once, but between the incompetent data analysis, the failure of logical imagination, and the deep misunderstanding of how works of art are made and used, it does nothing to advance our knowledge of anything. I am not sure which is more needed here, remedial reading in Richard Berk and Denny Borsboom, or in Wellek and Warren and Erving Goffman, but they need something. Psychology of Aesthetics, Creativity, and the Arts does not seem to be a very highly ranked journal within psychology, but the authors of this paper have plenty of papers elsewhere, so this says something about the intellectual standards of the discipline.

Looking at the reception of the paper (see Liberman, again, for linkage), one finds dreary moralizing about how kids these days are selfish brutes and nobody makes decent music any more, given an unearned air of authority by the pretense to science. It should not, by this point, come as a surprise that many science journalists and pundits lack the numeracy, imagination and skepticism to avoid being taken in by such foolishness.

Public trust in scientists — that we generally know what we ware talking about — is an extremely valuable resource for the scientific community. It is, I think, ultimately why people are willing to devote such vast resources to the scientific enterprise, to letting us gratify our curiosity. This trust has been painfully built up over many long years and generations and even centuries, by, among other things, taking great pains to be trustworthy. This trust is even a valuable resource for the public, when it is not misplaced. The more I see of this kind of thing, the more I wonder how well-founded that trust really is. This specific myth — that it has been scientifically proven that pop songs reflect increasing American narcissism — will persist as a minor vampirical hypothesis, occasionally draining the blood from graduate students in psychology. This kind of pointless myth-making and perversion of science will continue as long as the implicit goal of our institutions for cultivating knowledge is in fact to realize Pluto's Republic.

Update, later that day: Mark Liberman points out, by e-mail, that the famous 23rd psalm ("The Lord is my shepherd; I shall not want") clocks in at 14.3% first person pronouns in the King James Version, above the DeWall et al. confidence limits for all but seven years. I would add that "Rock of Ages" is a lower but still well-above-average 13.3%. On the other hand, "Rock of Ages" by Def Leppard (a top 100 song in 1983, and so part of the data) is between 4.6% and 6.2% first person singular pronouns (depending on how you want to count "gimme"). Clearly, the only thing saving American popular culture from epidemic narcissism in the early 1980s was preferring heavy metal to hymns.

Update, next day: John Emerson points out that "Like a Rolling Stone" contains plenty of instances of second person singular pronouns, and no first person singular pronouns, fully consistent with Bob Dylan's famed selflessness.

Manual trackback: Language Log (so I should probably really tag this post "incestuous amplification"); English, Jack; Permutations (with links to earlier posts)

*: Cue fannish nit-picking.

**: My skepticism about the "constructs" of correlational psychology is not limited to IQ/g, but that's another story for another time. For the present, I am willing to stipulate that narcissism exists and can be measured by the psychometric instruments which purport to do so.

***: Hole one: their four genre categories are incredibly crude, and have huge amounts of internal diversity. (Likewise, the genre norms of amateur-sleuth mysteries are rather different from private-eye detective stories, police procedurals, and serial-killer thrillers. Calling them all "mysteries", with one dummy variable, would not answer.) Hole two: "controlling for" variables this way only gets you an all-else-being-equal prediction if the regression model is actually well specified, which they hadn't the wit to check. Hole three: the counterfactual issue. (This is the only evenly slightly tricky one.) We have a certain distribution of the regressor variables in the training data, and so certain correlations among them. These correlations mean that each regressor can, to some extent, be linearly predicted from the others. The regression coefficients are basically the correlation between the response and the distinct, linearly-unpredictable part of each regressor. This means that when you change the distribution of regressors, the regression coefficients will, in general, change too. The regression coefficients can only be used to answer counterfactual questions ("what would the proportion of first-person pronouns be, if genre composition had stayed constant?") under very special assumptions, which we have no reason to think hold here. (See the notes for lectures 2, 22 and 23 for more.)

****: More exactly, this is what we would expect if the causes producing year-to-year shifts were so many, so various, and had such hard-to-describe inter-relations with each other that they cannot be effectively compressed or predicted from the past of the time series, and must simply be described in all their unique historical detail. As I tell my undergrads, "any signal distinguishable from noise is insufficiently complicated". If you want the full technical version of this idea, read Li and Vitanyi.

*****: Abbreviated scale on the vertical axis, visually exaggerating the change; inappropriate use of a linear model, which is guaranteed to give the impression of a steady and relentlessly one-directional march; no indication of uncertainty. I also can't figure out why they binned the values on the horizontal axis.

Learned Folly; Minds, Brains, and Neurons; Enigmas of Chance; The Commonwealth of Letters; Commit a Social Science

Posted by crshalizi at May 08, 2011 13:15 | permanent link

April 30, 2011

Books to Read While the Algae Grow in Your Fur, April 2011

Attention conservation notice: I have no taste.

Agha Shahid Ali, A Nostalgist's Map of America
Via Jon Wilkins.
Tamim Ansary, Destiny Disrupted: A History of the World through Islamic Eyes
An accurate, eloquent and humane work of popular history. It would be unfair to call it Marshall Hodgson's The Venture of Islam for Dummies, because it doesn't totally rely on Hodgson and it doesn't patronize the reader, but it has something of that flavor. Both writers emphasize the very deep historical currents which fed into the Islamic tradition, especially in what Hodgson called the "Nile-to-Oxus region" and Ansary dubs "the Middle World", and the larger world-historical context, the interaction of the Muslim story with others; both devote great attention to the various flavors of ethical conviction and spiritual and intellectual aspiration in Islamic civilization, more or less explicitly underscoring the variety of the ways in which people have been Muslim; both have nuanced understandings of how tradition works, and the way any major tradition has tremendous internal diversity, which is necessarily drawn on selectively. (Both also give Isalm in southeast Asia, or even south Asia, less attention than its demographic weight warrants.) Ansary, however, is much more readable than Hodgson, even downright colloquial — and writing in the shadow of 9/11 and the global war on terror. His book is forthrightly aimed at promoting understanding of the Islamic world on the part of westerners, without attempting to paper over real differences, or turn history into a succession of mere Lessons For Our Time. I think it succeeds very well, and could well outlast our current troubles. I recommend it strongly to anyone looking for a popular introduction to Islamic history.
Disclaimer: Ansary's father and my grand-father were both sent as students from Afghanistan to the United States in the '30s, and were friends; he is a friend of the family.
Carrie Vaughn, After the Golden Age
Mind-candy, but very nearly perfect mind-candy. I read it one sitting, and wished there were more.
Jason Shiga, Bookhunter
The adventures of the Oakland Public Library Police, confronting a triple locked-room book-theft mystery with the high-tech wizardry of 1973. What makes this so delightful is that it manages to be both a hilarious parody of a procedural and a perfectly-formed specimen of the genre.
Jeannine Hall Gailey, Becoming the Villainess
Poetry, re-telling classical myths, fairy-tales, comic books, etc., etc., with good imagery and a very distinct personality. (The samples on her webpage are pretty representative.) She should write more; fortunately she has another book coming out later this year. — And here it is.
Warren Ellis and Amanda Conner, Two-Step
Fun, but trifling; a light-hearted revisiting of some themes from Transmetropolitan as a pure comedy (with lots of amusing violence). For completist fans of Ellis and/or Connor's work.
Chris Roberson and Michael Allred, iZombie
Comic-book mind-candy.
Herbert A. Simon, An Empirically-based Microeconomics
A 2009 reprint of a 1997 book based on lectures delivered in 1993. Still current; economists are, thankfully, paying more attention to experiments, but still refusing to adjust their models of decision-making to accommodate experimental findings. (Fuller comments later.)
Patricia A. McKillip, The Bell at Sealey Head
At one level, this is fine McKillip, if not perhaps her most compelling story. At another, I wonder if there isn't some sort of meta-fictional auto-critique going on. Further comments are ROT-13'd for possible spoilers. Guvf vf gur svefg fgbel V'ir ernq ol ZpXvyyvc jurer bar bs gur punenpgref jnf, va snpg, n jevgre, naq n jevgre bs snagnfgvp fgbevrf ng gung. Gur rpubrf orgjrra Tjraqbyla'f fgbevrf naq gur bar ZpXvyyvc vf jevgvat srry yvxr gurl bhtug gb zrna fbzrguvat, ohg V pna'g dhvgr fnl jung. Naq V jbaqre vs rira gur evghnyf qrsvavat Lfnob'f irefvba bs Nvfyvaa Ubhfr ("rvgure nofheq be unhagvatyl ribpngvir", nf bar punenpgre fnlf) ner fhccbfrq gb, va fbzr jnl, rpub ZpXvyyvc'f bja pnerre-ybat hfr bs gur unhagvatyl ribpngvir?
Tony Judt, Reappraisals: Reflections on the Forgotten Twentieth Century
Book reviews (mostly with a heavy biographical element) and political essays, covering a huge range of topics related to the left in the 20th century; and what the hell we should do with ourselves now. Many of them are from the New York Review of Each Other's Books, with post-scripts about squabbles in the letters column. The title's air of "I know more than you, and will now sit in judgment" is in fact fully representative of the tone of the essays, but Judt did know more than most of us about the history of political ideas, and that certainly didn't make him less entitled to his opinions than anyone with a seat at a bar (or a blog).

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Writing for Antiquity; The Commonwealth of Letters; The Progressive Forces; Islam; The Dismal Science; Minds, Brains and Neurons; Pleasures of Detection, Portraits of Crime

Posted by crshalizi at April 30, 2011 23:59 | permanent link

April 29, 2011

Friday Educational Nightmares Blogging

Like many people, when I was a student, I used to have nightmares where I found I had to take a final exam in a class I did not remember being in. (One particularly vivid one also involved discovering a new building on the Berkeley campus, between Evans Hall and Le Conte.) When I became a teacher, this flipped around, and I had sporadic nightmares where I was giving an examination in a class I didn't remember teaching. Thankfully, it's been several years since that one last visited me.

After watching this xtranormal movie from my friend Cris Moore, I suspect that I will have nightmares about examining students in subjects they did not know they were taking:

I hasten to add that, to be the best of my knowledge, nothing like this has ever happened in this department.

Linkage

Posted by crshalizi at April 29, 2011 15:49 | permanent link

April 27, 2011

Every Living Thing, Pushed into the Ring

DeLong asks for the best response to "My City Was Gone", itself a response to "Mountains beyond Mountains". Since the content of the game has clearly moved from "sprawl" to "one-upmanship", I claim that the best response is in fact "Stadium Love":

I actually prefer the official video, but concert footage seems to be an implicit rule of the game.

Had the conversation not strayed, "Nothing but Flowers", or "The Big Country" would have been admissible. (I am vexed by the fact that I cannot instantly call up high-quality video recordings of these songs performed when first released, several decades ago. Where's my jetpack Logic named Joe?)

Manual trackback: Grasping Reality with $Numerosity $Appendages

Linkage

Posted by crshalizi at April 27, 2011 20:00 | permanent link

April 21, 2011

"Optimal Nonparametric Prediction and Automated Pattern Recognition in Dynamical Space-Time Systems" (or, Our New Filtering Techniques are Unstoppable!, Part II: The Rise of the Austrian Machines)

My student Georg M. Goerg, who I co-advise with Larry Wasserman, has just defended his thesis proposal:

"Optimal Nonparametric Prediction and Automated Pattern Recognition in Dynamical Space-Time Systems"
Abstract: Many methods in statistics, machine learning, and signal processing, such as speech analysis or pattern recognition in images and videos, try to extract informative structures from a dynamic system and remove noisy uninformative parts. Although such methods and algorithms work well in practice, they often do so because they have been specifically tuned to work in a very particular setting, and thus may break down when conditions and properties of the data do not hold anymore.
It would be very useful to have an automated pattern recognition method for dynamic system, which does not rely on any particular model or data structure, but gives informative patterns for any kind of system. Shalizi (2003) showed for discrete fields that an automated pattern discovery can be constructed by a characterization and classification of local conditional predictive distributions. The underlying idea is that statistically optimal predictors not only predict well but — for this very reason — also describe the data well, and therefore reveal informative structure inherent in the system.
In this thesis I extend previous work from Shalizi, Klinkner, and Haslinger (2004) to obtain a fully automated pattern recognition for continuous-valued space-time systems — such as videos — by means of optimal local prediction of the space-time field. Applications to simulated one-dimensional spatial dynamics and a real-world image pattern recognition demonstrate the usefulness and generality of the presented methods.
slides, full proposal (both somewhat large PDFs)

Very constant readers may recall having seen this line of research at various points down the years, most recently in "Our New Filtering Techniques Are Unstoppable!". Georg's goal is to make those methods work for continuous-valued fields, which was not needed for studying cellular automata but will be very handy for data analysis, and where already has some preliminary results. Beyond that, the goal is to develop the statistical theory which would go along with it and let us get things like confidence intervals on statistical complexity.

I can say without any shame that I was quite pleased with Georg's presentation, because I really had no part in making it; all the credit goes to him in the first place, and to provided by Larry, Chris Genovese, Cris Moore and Chad Schafer. Based on this experience, and Georg's publication record, I imagine he will have all the problems polished off by the NIPS deadline, with a monograph or two to follow by the end of the summer.

I will, however, try not to read any omens into my first Austrian student commencing a dissertation on automatic pattern discovery on the day Skynet declares war on humanity.

Kith and Kin; Enigmas of Chance; Complexity

Posted by crshalizi at April 21, 2011 15:00 | permanent link

Discovering Causal Structure (Advanced Data Analysis from an Elementary Point of View)

How do we get our causal graph? Comparing rival DAGs by testing selected conditional independence relations (or dependencies). The crucial difference between common causes and common effects. Identifying colliders, and using them to orient arrows. Inducing orientation to enforce consistency. The SGS algorithm for discovering causal graphs; why it works. Refinements of the SGS algorithm (the PC algorithm). What about latent variables? Software: TETRAD and pcalg. Limits to observational causal discovery: universal consistency is possible (and achieved), but uniform consistency is not.

PDF notes

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 21, 2011 12:04 | permanent link

Estimating Causal Effects (Advanced Data Analysis from an Elementary Point of View)

Reprise of causal effects vs. probabilistic conditioning. "Why think, when you can do the experiment?" Experimentation by controlling everything (Galileo) and by randomizing (Fisher). Confounding and identifiability. The back-door criterion for identifying causal effects: condition on covariates which block undesired paths. The front-door criterion for identification: find isolated and exhaustive causal mechanisms. Deciding how many black boxes to open up. Instrumental variables for identification: finding some exogenous source of variation and tracing its effects. Critique of instrumental variables: vital role of theory, its fragility, consequences of weak instruments. Irremovable confounding: an example with the detection of social influence; the possibility of bounding unidentifiable effects. Matching and propensity scores as computational short-cuts in back-door adjustment. Summary recommendations for identifying and estimating causal effects.

PDF notes

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 21, 2011 12:03 | permanent link

Graphical Causal Models (Advanced Data Analysis from an Elementary Point of View)

Statistical dependence, counterfactuals, causation. Probabilistic prediction (selecting a sub-ensemble) vs. causal prediction (generating a new ensemble). Graphical causal models, structural equation models. The causal Markov property. Faithfulness. Counterfactual prediction by "surgery" on causal graphical models. The d-separation criterion. Path diagram rules.

PDF notes

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 21, 2011 12:02 | permanent link

Mystery Multivariate Data (Advanced Data Analysis from an Elementary Point of View)

The second exam; In which we attempt to discover structure in ten-dimensional data of unknown origin.

Assignment; sample data set. Solutions, R for solutions.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 21, 2011 12:01 | permanent link

Estimating with DAGs (Advanced Data Analysis from an Elementary Point of View)

In which we learn how to read and use diagrams full of circles and arrows and a paragraph on he back explaining what each one is.

Assignment, fake-smoke.csv data file (or rather pseudo-data). Solutions.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 21, 2011 12:00 | permanent link

April 09, 2011

Patterns of Exchange (Advanced Data Analysis from an Elementary Point of View)

In which we practice the art of principal components analysis on currency exchange rates, in the process discovering the Pacific and the Americas.

Assignment, fx.csv data set; solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2011 23:53 | permanent link

Graphical Models (Advanced Data Analysis from an Elementary Point of View)

Conditional independence and dependence properties in factor models. The generalization to graphical models. Directed acyclic graphs. DAG models. Factor, mixture, and Markov models as DAGs. The graphical Markov property. Reading conditional independence properties from a DAG. Creating conditional dependence properties from a DAG. Statistical aspects of DAGs. Reasoning with DAGS; does asbestos whiten teeth? Appendix: undirected graphical models, the Gibbs-Markov theorem; directed but cyclic graphical models. Appendix: Some basic notions of graph theory; Guthrie diagrams.

PDF handout

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2011 23:52 | permanent link

Mixture Model Examples and Complements (Advanced Data Analysis from an Elementary Point of View)

Precipitation in Snoqualmie Falls revisited. Fitting a two-component Gaussian mixture; examining the fitted distribution; checking calibration. Using cross-validation to select the number of components to use. Examination of the selected mixture model. Suspicious patterns in the parameters of the selected model. Approximating complicated distributions vs. revealing hidden structure. Using bootstrap hypothesis testing to select the number of mixture components. The multivariate Gaussian distribution: definition, relation to the univariate or scalar Gaussian distribution; effect of linear transformations on the parameters; plotting probability density contours in two dimensions; using eigenvalues and eigenvectors to understand the geometry of multivariate Gaussians; estimation by maximum likelihood; computational aspects, specifically in R.

PDF, R; bootcomp.R (patch graciously provided by Dr. Derek Young)

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2011 23:51 | permanent link

Mixture Models (Advanced Data Analysis from an Elementary Point of View)

From factor analysis to finite mixture models by allowing the latent variable to be discrete. From kernel density estimation to mixture models by reducing the number of points with copies of the kernel. Probabilistic formulation of mixture models. Geometry: q+1 points define a q-dimensional plane. Clustering. Estimation of mixture models by maximum likelihood, and why it leads to a vicious circle. The expectation-maximization (EM, Baum-Welch) algorithm replaces the vicious circle with iterative approximation. More on the EM algorithm: convexity, Jensen's inequality, optimizing a lower bound, proving that each step of EM increases the likelihood. Mixtures of regressions. Other extensions.

PDF handout

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 09, 2011 23:50 | permanent link

April 08, 2011

"Tie this to your lanyard, Billy Collins"

Let us cleanse the palate after that last unpleasant announcement. My brother Aryaman (the talented one) writes: "A colleague of mine who is interested in pursuing science education after her PhD was directed to a collection of (I think apocryphal) answers to science questions from 5th and 6th graders in Japan. I noticed many of them were almost little haikus. So I took the time to work some into form..."

Ichi.
Broke up molecules,
stuffed with atoms. Broke atoms,
stuffed with explosions

Ni.
People run around
in circles they are crazy.
But planets orbit.

San.
Our sun is a star.
But it changes back to a
Sun in the daytime.

Shi.
A vibration is
motion that cannot decide
which way it should go.

Go.
Some know the time by
looking at the sun. I can't
make out the numbers

Roku.
To some, solutions
are answers. To chemists they
are still all mixed up.

Shichi.
Humidity: when
you are looking for some air
but finding water.

Hachi.
It is so hot some
places that the people there
have to live elsewhere.

Kyuu.
Clouds circling the earth
round and round and round. There is
not much else to do.

Jyuu.
Evaporation
is blamed when people forget
to put the top on.

Jyuu ichi.
Vacuums are nothings.
We mention them to let them
know we know they're there.

Jyuu ni.
Some past animals
became fossils while others
prefer to be oil.

Jyuu san.
Cyanide is this
bad: one drop in a dog's tongue
kills the strongest man

Jyuu shi.
Law of gravity
says no jumping up unless
you will come back down.

Jyuu go.
Genetics explain
why you look like your father
or why you might not.

Jyuu roku.
South America
cold summers and hot winters;
somehow they manage.

What is it with biologists and poetry anyway?

The Commonwealth of Letters

Posted by crshalizi at April 08, 2011 19:25 | permanent link

April 07, 2011

Irrelevant Along Many Dimensions (Next Week at the Statistics Seminar)

Attention conservation notice: Of no use unless you care about mathematical statistics, and will be in Pittsburgh on Monday.

As I have had a number of occasions to tell the kids this semester, and will certainly repeat later, one of the most valuable things a data analyst can know is that some variables have nothing to do with each other. (Visions of the totality of interconnections making up the Cosmic All are for higher beings, like the Arisians, Marxist literary critics, and the Medium Lobster, not mere empiricists.) This is not at all easy when confronting high-dimensional data, and so I am especially pleased by the topic of next week's seminar.

Gabor Székely, National Science Foundation, "Testing Independence for High Dimensional Data with Application to Time Series"
Abstract: Brownian distance correlation was introduced about six years ago by G.J. Szekely. This correlation characterizes independence and determines a consistent test of multivariate independence for random vectors in arbitrary dimension. In this talk a modified Brownian distance correlation is proposed and applied to the problem of testing independence of random vectors in high dimension. The distribution of a simple transformation of the test statistic converges to Student t as dimension tends to infinity for any fixed sample size. Thus we obtain a distance correlation t test for independence of random vectors in arbitrarily high dimension, applicable under very general conditions. One of the important applications is testing independence of two time series.
Place and time: Scaife Hall 125, 4--5 pm on Monday, 11 April 2011

As always, the talk is free and open to the public.

Those of you wishing to follow along at home may find it enlightening to read "Brownian distance covariance" (arxiv:1010.0297) by Székely and Rizzo, along with the commentaries linked there — all published, I can't resist pointing out, in the Annals of Applied Statistics.

Update, 8 April: Due to the looming uncertainty about whether we will have a functioning National Science Foundation, the talk has been canceled. So, this is another Bad Thing which I blame on the wingnuts' apocalyptic fear of poor women having contraceptives. (I do not of course speak for the statistics department, for CMU, or for Dr. Székely.)

Enigmas of Chance

Posted by crshalizi at April 07, 2011 20:30 | permanent link

April 03, 2011

"When Bayesians Can't Handle the Truth"

Attention conservation notice: Self-promotional; only of interest if you care about theoretical statistics and will be in the Boston area on Monday.

A talk, based on Bayes < Darwin-Wallace paper.

"When Bayesians Can't Handle the Truth"
Harvard Statistics Colloquium
Abstract: When should a frequentist expect Bayesian updating to work? There are elegant results on the consistency of Bayesian updating for well-specified models facing IID or Markovian data, but both completely correct models and fully observed states are vanishingly rare. In this talk, I give conditions for posterior convergence that hold when the prior excludes the truth, which may have complex dependencies. The key dynamical assumption is the convergence of time-averaged log likelihoods (Shannon- McMillan-Breiman property). The main statistical assumption is a building into the prior a form of capacity control related to the method of sieves. With these, I derive posterior convergence and a large deviations principle for the posterior, even in infinite- dimensional hypothesis spaces, extending in some cases to the rates of convergence; and clarify role of the prior and of model averaging as regularization devices. (Paper)
Time and place: 4 pm on Monday, 4 April 2011, Science Center Room 309

Self-centered; Bayes, anti-Bayes

Posted by crshalizi at April 03, 2011 17:10 | permanent link

April 01, 2011

One Hand Washes the Other

Attention conservation notice: I was going to follow Andy's example, and write a statistical April Fool's post on why parametric models are superior to non-parametric ones in every way; but this popped out instead.

Proposition: God only ever judges a creature for an offense against another creature, not against God.

Proof: Every being is either a created or not; but the only uncreated being is God. Every offense is therefore against another creature or against God. If God were to punish a creature for a sin against God, the deity would be at once plaintiff and judge (there being no division or disunity within God); yet even human justice recognizes that no one should decide their own case. Divine justice being perfect, the proposition follows.

Remark: this leaves open the possibility that a creature who sins against God could rightly be judged for this by another creature, if the latter could be impartial. This suggests that Satan's role has been misunderstood: the Adversary must be a disinterested gentleman with no love of his creator, because only such a being could judge impartially between God and God's creatures.

Rétrolien manuel: Anniceris

Modest Proposals

Posted by crshalizi at April 01, 2011 12:00 | permanent link

March 31, 2011

Books to Read While the Algae Grow in Your Fur, March 2011

Attention conservation notice: I have no taste.
Scott Snyder, Rafael Albuquerque and Stephen King, American Vampire
Comic-book mind-candy. Very nice art and a promising story which would have been better if sentiment had been more ruthlessly repressed.
Jeff Parker and Tom Fowler, Mysterius the Unfathomable
Comic-book mind-candy. I will never look at Dr. Seuss in quite the same way again.
Colin de la Higuera, Grammatical Inference: Learning Automata and Grammars
The best, and perhaps even the only, available textbook on grammatical inference. Unlike statistical language modeling, where we just aim at getting good probability estimates and the like, and where students are well-served by books like Charniak and Manning and Schutze, the goal here is, more ambitiously, to recover the language as such, or perhaps even a particular grammatical representation of the language. (Probabilistic finite state machines, on which de la Higuera has written some important papers, are however discussed at some length.) While this goes back almost as far as the study of formal languages as such, this is the best attempt I've seen at drawing the various scattered literatures together into a graspable shape. In principle the book is self-contained, given basic competence in computer science (not programming!), but some acquaintance with formal languages and automata, at the Lewis and Papadimitriou or Hopcroft and Ullman level would be a good idea.
There are an annoying number of typos, which unfortunately I did not note as I went along. Despite them, I now find myself entertaining fantasies of teaching a seminar on grammatical inference.
Disclaimer: I reviewed a partial draft manuscript for the publisher in 2007, and they sent me a free copy of the book when it came out.
Jennifer Crusie, Maybe This Time
Picked up on Jo Walton's recommendation Enjoyable, though I had more trouble accepting the romance working as depicted than I did the ghosts.
Sara Creasy, Song of Scarabaeus
Mind-candy. I like the idea of terraforming by using retroviruses to re-work the planet's native metabolic pathways, but the time-scale seems unduly compressed.
Chad Orzel, How to Teach Physics to Your Dog
If you like "Many Worlds, Many Treats", you will like this; and I dare say if not, then not. As one would guess from the blog post, this is very nice popular science writing: clear, correct, gently funny, and informed by Orzel's experience as an experimental physicist. (Speaking as a recovering theorist, I appreciate this.) I even learned from it: quantum "teleportation" experiments had never made a lot of sense to me before. ("State duplication" might be a better name...) Strongly recommended, even for cat people.
Disclaimer: Chad's an occasional correspondent.
[With abundant thanks to "Uncle Jan" for this!]
Lauren Willig, The Orchid Affair
Kat Richardson, Labyrinth
Sarah Graves, The Face at the Window
Mind-candy in long-running series. [[Back-linkage]] Both the Graves and the Richardson are a lot darker than previous installments.
Shaun O'Boyle, Modern Ruins: Portraits of Place in the Mid-Atlantic Region
Photographs, almost entirely in monochrome, of abandoned industrial and institutional edifices in Pennsylvania and New York. Short essays by others on the historical context introduces each section of photographs --- which are the main source of interest here. These get their interest, I think, from the contrast between the composition and the content. The latter is, of course, abandonment and decay: these are places from which the world has moved on, and O'Boyle shows them as they peel, rust and crumble. But the composition containing them is elegant and geometrical; many of them bring to mind Renaissance studies in perspective and ideal form, especially with their vertical grids. The human figure is totally absent. (Unlike some Renaissance studies in perspective and grids.) What now-lost civilization, one is left wondering, built these enigmatic structures scattered across eastern North America? Surely not the degraded and disorganized people found wandering in their shadows...
(The introduction by Geoff Manaugh is recycled BLDGBLOG entries, which is not a bad thing unless one's been re-reading Manaugh.)
Disclaimer: I got a free copy of this book from the publisher through LibraryThing's "early reviewers" program.
W. E. B. DuBois, The Souls of Black Folk [Free Gutenberg versions]
What could I possibly add to all that has been said about this book already?

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Pleasures of Detection; The Beloved Republic; Enigmas of Chance; Physics

Posted by crshalizi at March 31, 2011 23:59 | permanent link

March 30, 2011

Factor Analysis (Advanced Data Analysis from an Elementary Point of View)

Adding noise to PCA to get a statistical model. The factor analysis model, or linear regression with unobserved independent variables. Assumptions of the factor analysis model. Implications of the model: observable variables are correlated only through shared factors; "tetrad equations" for one factor models, more general correlation patterns for multiple factors. (Our first look at latent variables and conditional independence.) Geometrically, the factor model says the data have a Gaussian distribution on some low-dimensional plane, plus noise moving them off the plane; and that is all. Estimation by heroic linear algebra; estimation by maximum likelihood. The rotation problem, and why it is unwise to reify factors. Other models which produce the same correlation patterns as factor models; in particular the Thomson sampling model, in which the appearance of factors arises from not knowing what the real variables are or how to measure them.

PDF handout; lecture-18.R computational examples you should step through (not done in class); correlates of sleep in mammals data set for those examples; thomson-model.R

Update, 9 April: A correspondent points me to this tweet, in what I can only call a "let's you and him fight" spirit. While the implicit charge against me by Adams is not without some justice, if you don't want this to happen, you really shouldn't brag about how many beauty pageants your child has won, or for that matter dress the poor beast in such funny clothes.

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:06 | permanent link

Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View)

Principal components: the simplest, oldest and most robust of dimensionality-reduction techniques. PCA works by finding the line (plane, hyperplane) which passes closest, on average, to all of the data points. This is equivalent to maximizing the variance of the coordinates of projections on to the line/plane/hyperplane. Actually finding those principal components reduces to finding eigenvalues and eigenvectors of the sample covariance matrix. Why PCA is a data-analytic technique, and not a form of statistical inference. An example with cars. PCA with words: "latent semantic analysis"; an example with real newspaper articles. Visualization with PCA and multidimensional scaling. Cautions about PCA; the perils of reification; illustration with genetic maps.

PDF handout, pca.R for examples, cars data set, R workspace for the New York Times examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:05 | permanent link

Fair's Affairs, or, Adultery 1969 (Advanced Data Analysis from an Elementary Point of View)

Background on the 1969 Psychology Today survey and Fair's theory of optimal adultery. Logistic regression for counts vs. logistic regression for binary outcomes. Comparison of predictions on a qualitative level. Quantitative comparison of predicted probabilities of adultery across models. Checking calibration. Sanity-checking of the specification. Scientific evaluation of the models. Does it make sense to keep analyzing this data?

Assignment; Solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:04 | permanent link

Diabetes among the Pima (Advanced Data Analysis from an Elementary Point of View)

Background on the long-term study of diabetes among the Pima in Arizona. Data-set cleaning. (Note that while about half the records contain physically impossible values, this is routinely used as an example of a data set with no missing values in computer science.) Fitting logistic regression for a binary outcome. Calculations with fitted logistic regression models. Separating associations from between variables from significant predictors. Model-checking.

Assignment, solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:03 | permanent link

A Worked Generalized Linear Model Example (Advanced Data Analysis from an Elementary Point of View)

Building a weather forecaster for Snoqualmie Falls, Wash., with logistic regression. Exploratory examination of the data. Predicting wet or dry days form the amount of precipitation the previous day. First logistic regression model. Finding predicted probabilities and confidence intervals for them. Comparison to spline smoothing and a generalized additive model. Model comparison test detects significant mis-specification. Re-specifying the model: dry days are special. The second logistic regression model and its comparison to the data. Checking the calibration of the second model.

PDF handout, snoqualmie.csv data set, R

Manual trackback: SnoValley Star (!)

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:02 | permanent link

Logistic Regression (Advanced Data Analysis from an Elementary Point of View)

Modeling conditional probabilities; using regression to model probabilities; transforming probabilities to work better with regression; the logistic regression model. Maximum likelihood for logistic regression; numerical maximum likelihood by Newton's method and by iteratively re-weighted least squares. Logistic-additive models as a non-parametric alternative (which you should probably use unless you have very definite reasons); bootstrap specification testing for logistic regression.

PDF notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 30, 2011 23:01 | permanent link

"Effect of Influential Observations on Penalized Linear Regression Estimators" (This Week at the Statistics Seminar)

It is only appropriate that a talk about influential outliers be held at on an unusual day, at an unusual time and place:

Karen Kafadar, "Effect of Influential Observations on Penalized Linear Regression Estimators"
Abstract: In current problems (e.g. microarrays, financial data) where the number of variables can greatly exceed the number of observations ("big p, small n"), penalized regression has been advocated as a way to identify informative variables by setting to zero a large subset of the regression coefficients. This approach to "model selection" aims for good fits to the data, but often attempts are made to interpret the resulting nonzero coefficients. With squared error loss and an L1 penalty (sum of the magnitudes of the regression coefficients, or "LASSO"), the resulting model can be highly sensitive to potential outliers, in either the response variable or the design space. In this study, we examine the effect of influential points (outliers and leverage points) on L1-penalized regression estimators, when the loss function is the usual L2 squared error loss, the biweight loss function, or the MM loss function, and show the advantages of a robust loss function to reduce the effect of influential points in the simple case of linear regression when the proportion of non-zero coefficients is less than 20 percent.
Joint work with Guilherme V. Rocha.
Time and place: 12:30--1:30 on Friday, 1 April 2011 in Rangos 2, University Center

As always, the talk is free and open to the public.

Enigmas of Chance

Posted by crshalizi at March 30, 2011 23:00 | permanent link

Just Another Hangnail on the Hidden Hand

It has long been one of my ambitions to be denounced as a tentacle of a mysterious and shadowy conspiracy bent on global domination. Reading this therefore fills me with conflicting emotions. On the one hand, satisfaction of a cherished dream; on the other, is that all there is to a conspiracy? But predominating at the moment is regret — for I turned down my invitation to the conference at Bretton Woods because it conflicted with my teaching schedule. In retrospect, this was dumb. Surely the kids could have taken care of themselves for a week, while I joined the rest of the Immense Shadowy Global Conspiracy in hatching nefarious schemes? How could I have passed up an opportunity to commune with sinister intellects in the wild hills of New England in order to teach? How, when I spent days on end this month reviewing grant proposals for them, did I fail to spot the question on the evaluation forms about "Potential to advance the sinister designs of Mr. Soros and his associates who must not be named"? How could I have asked for so little in my grant application, when clearly any proper subversive conspiracy could have paid for so, so much more?

I take comfort only in the fact that I will, after all, be lecturing that week on how to detect the hidden common causes linking apparently disparate events — and there's always next year to go and explain how combining ergodic theory and statistical learning methods will let us take over the world.

Running Dogs of Reaction; Psychoceramica; The Dismal Science; Self-centered

Posted by crshalizi at March 30, 2011 22:30 | permanent link

March 16, 2011

SocInfo 2011 (Dept. of Signal Amplification)

As part of the program committee, it behooves me to boost our own signal. I won't reproduce the whole of the call for papers, but just some of the key points:

3rd International Conference on Social Informatics [SocInfo 2011]
6--8 October 2011, Singapore
Abstracts due 7 April, full papers 15 April
The International Conference on Social Informatics (SocInfo) is an interdisciplinary venue for researchers from informatics and the social & management sciences to come together to share ideas and opinions, and present original research work. The goal is to create an opportunity for the dissemination of knowledge between the two communities, as well as to enable mutual critical discussion of current research.
The analysis, modeling and simulation of complex social phenomena, including Web2.0 applications, using well-established models such as social networks, is of particular interest to SocInfo2011. Enhancement of established social models may lead to breakthroughs in the design of algorithms or information systems that rely on social participation and social mechanisms. Behavioral game theory and realistic social simulation may be the road to the development of new social models for social information systems.
Web mining, text mining, natural language processing, opinion mining and sentiment analysis are the domains that attempt to exploit information available in the World Wide Web for a better understanding of social phenomena. Such research can benefit from the state-of-the-art knowledge in the social sciences, and is therefore solicited for SocInfo.
Applications of social concepts in concrete information systems such as web enterprises, enterprise management, e-governance and service sector, virtual environments such as multiplayer online games, multi-agent systems, e-commerce (including trust management and reputation systems) environments for support of teamwork, argumentation and debate, or creative work, will be of interest. The extensions of established knowledge in recommendation systems, collaborative applications (including collaborative filtering and tagging), distributed AI or multi-agent systems for the purpose of incorporating social mechanisms are also of interest to the conference.

Signal Amplification; Networks; The Collective Use and Evolution of Concepts

Posted by crshalizi at March 16, 2011 19:00 | permanent link

The Distribution of Library Book Circulation Is Not a Power Law, or, Gauss and Man at Huddersfield

Via Bill Tozier comes news of this blog post by Eric Hellman, which is part of a controversy over how libraries should pay publishers for electronic books. I have not thought or studied that enough to have any sort of opinion, though since it seems to go very, very far from marginal cost pricing, I am naturally suspicious. Be that as it may, Hellman suggests that the specific proposal of Harper Collins needs to be seen in the light of the "long tail" of the circulation distribution of library books. That is, most books circulate very little, while a few circulate and awful lot, accounting for a truly disproportionate share of the circulation, and (says Hellman) the Harper Collins proposal would, compared to the status quo, shift library funds from the publishers of the large mass of low-circulation books to the publishers of the tail of high-circulation books, Harper Collins prominent among them.

To support this, Hellman uses some data from a very rich data set released by the libraries of the University of Huddersfield in England. (I have been meaning to look this up since Magistra et Mater mentioned the cool stuff Huddersfield is doing with library data-mining.) Hellman's analysis went as follows. (See his post for details.) He made a cumulative histogram of how often each book had circulated (binning counts over 100 by tens); plotted it on a log-log scale; fit a straight line by least squares; and declared the distribution a power law because the R2 was so high.

Constant readers can imagine my reaction.

Having gotten hold of the data, I plotted the cumulative distribution function (without binning), and including both Hellman's power law (purple), and a discrete log-normal distribution (red):

This is clearly heavy tailed and massively skewed to the right. It is equally clear that it is not a power law: there are simply orders of magnitude too few books which circulated 500 or 1000 or 2000 times. (Remember that the vertical axis here is on a log scale.) The difference in log-likelihoods is 200 in favor of log-normal, i.e, the data were e200 times less likely under the power law. Applying the non-nested model comparison test from my paper with Aaron and Mark, the chance this big a difference in likelihoods arising through fluctuations when the power law is actually as good or better than the log-normal is about 10-35. I have not attempted to see whether the deviations from the log-normal curve are significant, but it does look quite good over almost the whole range of the data. There could be some systematic departures at the far right, but over-all it looks like Gauss is not mocked at Huddersfield.

I should say right away that Hellman was very gracious in our correspondence about this (I am after all a quibbling pedant quite unknown to him). More importantly, his analysis of the Harper Collins proposal does not, that I can see, depend at all on circulation following a power law; it just has to be strongly skewed to the right. That being the case, I hope this particular power law can be eradicated before it has a chance to become endemic in such permanent reservoirs of memetic infection as the business literature and Physica A.

Update, next day: In the interests of reproducibility, the circulation totals for the data (gzipped), and the R code for my figure. The latter needs the code from our paper, which I will turn into a proper R package Any Time Now.

Previously on "Those That Resemble Power Laws from a Distance": the link distribution of weblogs (and again); the distribution of time taken to reply to e-mail; the link distribution of biochemical networks; urban economies in the US.

Power Laws

Posted by crshalizi at March 16, 2011 18:45 | permanent link

ICCAI 2011 (Dept. of Signal Amplification)

A.k.a. the 10th International Conference on Complexity in Acute Illness, is happening in Bonn, 9--11 September. "Complexity and" or "Nonlinear dynamics and" conferences often have a lot of fluff, but one of the organizers of this is my friend Sven Zenker. Unsurprisingly, therefore, this actually looks interesting and substantive, and so perhaps of interest to readers. It also gives me an excuse to link to one of Sven's interesting papers about combining serious physiological modeling with modern statistical tools. Among other virtues, this is methodologically interesting in showing a way to learn not just from a non-identifiable model, but from the way in which it fails to be identified. I have been meaning to do this since first hearing him talk about it in July 2007...

Signal Amplification; Complexity

Posted by crshalizi at March 16, 2011 18:30 | permanent link

March 06, 2011

Aporias of the Efficiency Frontier (Why Oh Why Can't We Have a Better Academic Publishing System?)

Attention conservation notice: 350+ grumbled words about the price of academic papers, a topic of little moment even to scholars directly involved, let alone anyone with a sense of perspective.

One of the fundamental principles of economics is the virtue of marginal cost pricing: everything should be sold for just the cost of producing one extra unit. Prices below marginal cost are obviously bad for the sellers — you can't keep losing money on every sale and keep producing — but prices above marginal costs mean that the good will be consumed less than it ought to be, than its actual utility warrants.

With this in mind, let us look at the National Bureau of Economic Research, which is about a good an embodiment of the discipline's core as one could hope to find. One of its key outputs is working papers. To read these, you need either an institutional subscription, or you need to pay $5 per download. This price is orders of magnitude more than the marginal cost of serving a few hundred more kilobytes of PDF*. It is literally a textbook Econ. 1 result that NBER is ensuring its own economic research will be under-consumed. It isn't even recovering the fixed costs of production through average-cost pricing, since those costs are paid not by NBER (which is a non-profit largely funded by grants anyway) but by the authors of the papers. Rather, it confirms Healy's law, that each "discipline's organizational life inverts its core intellectual commitments".

(This rant was brought to you by wanting to read this paper, mentioned by Krugman. CMU has a subscription, so I can, but it is senseless that twenty years after the beginning of the arxiv, a central organization of the discipline of economics prices its preprints this way.)

Update, later that day: As several people have written me to point out, the authors of the paper in question have a free version PDF version online. This does not make NBER's policy any more efficient or even sensible; quite the contrary. Fortunately, I am not the kind of man who goes around making revealed preference arguments.

Update, 23 March: Hopefully, the good example of the Brookings Institution will help establish a norm, and shame NBER into adopting marginal cost pricing.

*: Pair.com will sell you 240 GB/month for $50, which works out to something like 0.02 cents for a 1 MB paper, which would be quite large or graphics-heavy. I decline to believe that NBER is being ripped off by their Internet service provider by a factor of 25,000. (This is not an endorsement of Pair's hosting services, or even a claim that their prices are especially good.)

The Dismal Science; Learned Folly

Posted by crshalizi at March 06, 2011 13:40 | permanent link

March 04, 2011

Structured Sparsity: Learning and Inference (Dept. of Signal Amplification)

In statistics, we say that a high-dimensional model is "sparse" if most of the large numbers of variables do not actually contribute to the outcome --- the true set of relevant predictors is small compared to the number of covariates. Some of the most interesting work in statistics and machine learning over the last decade and a half has been about finding and using sparsity, often starting from ideas like the lasso, but becoming considerably more general and flexible, and connecting to ideas about compressed sensing. (I will probably never get around to writing a post about SpAM, but may yet turn it into a homework problem; I still have hopes about TESLA.) Exploiting sparsity is one of the principal ways of lifting the curse of dimensionality, which otherwise weighs on us more and more every year.

Now comes great-looking workshop on structured sparsity at ICML, organized by Francis Bach, Mladen Kolar, Han Liu, Guillaume Obozinski and Eric Xing:

The aim of the workshop is to bring together theory and practice in modeling and exploring structure in high-dimensional data. Participation of researchers working on methodology, theory and applications, both from the frequentist and Bayesian point of view is strongly encouraged in order to discuss different approaches for tackling challenging high-dimensional problems. Furthermore, the workshop will link with the signal processing community, which has worked on similar topics and with whom exchanges of ideas will be very fruitful. We encourage genuine interaction between proponents of different approaches and hope to better understand possibilities for modeling of structure in high dimensional data. We invite submissions on various aspects of structured sparse modeling in high-dimensions. Here is an example of two key questions:
  • How can we automatically learn the hidden structure from the data?
  • Once the structure is learned or pre-given, how can we utilize the structure to conduct more effective inference?
See the full call for papers for more details and submission information.

(I remember when Han took stochastic processes from me --- how can he be organizing workshops?)

Enigmas of Chance; Signal Amplification

Posted by crshalizi at March 04, 2011 01:44 | permanent link

March 02, 2011

Midterm: Urban Scaling, Continued (Advanced Data Analysis from an Elementary Point of View)

In which we compare the power-law scaling model of urban economies due to Bettencourt et al. to an alternative in which city size is actually irrelevant.

This was a one-week take-home exam, intended to use more or less everything taught so far.

Assignment; master data file (each student got a different random perturbation of this).

Solution: PDF, R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 02, 2011 17:10 | permanent link

Nice Demo City, But Will It Scale? (Advanced Data Analysis from an Elementary Point of View)

In which we estimate and test the power-law scaling model of urban economies due to Bettencourt et al.

Assignment; data files: gmp_2006.csv, pcgmp_2006.csv

Solution: PDF, R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at March 02, 2011 17:00 | permanent link

February 28, 2011

Books to Read While the Algae Grow in Your Fur, February 2011

Attention conservation notice: I have no taste.

Jonathan Goodwin and John Holbo (eds.), Reading Graphs, Maps, and Trees: Responses to Franco Moretti
I suppose it's a bit odd for me to recommend this, since I'm a contributor. (Said contribution is this [ == that], with cosmetic alterations.) But apart from that lapse on the part of the organizers, it's very worthwhile. The full text is free online under a Creative Commons license (follow the link), but the printed version is quite handsome.
V. S. Redick, The Red Wolf Conspiracy
Mind-candy. Yet another Big Fat Fantasy Epic; better than average for that sort of thing (good enough that I'll try the sequel), but not recommended for those who aren't already into the sub-genre.
Taylor Anderson, Rising Tides
Mind-candy. "What these lemurs need is a boat-load of vintage honkeys", continued. — It is a striking testament to the moral regeneration of this country in the second half of the twentieth century that, in order to make his white male American heroes sympathetic to a contemporary audience, Anderson gives them views on race and sex which would have marked them as extreme radicals in 1940. (Sequel.)
Walter Jon Williams, Deep State
In which the hero of This Is Not a Game returns to pit her skills against a military dictatorship. Connection to current events is entirely fortuitous. (But WJW would have been a good addition to the Blogs and Bullets workshop.)

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Commonwealth of Letters; Writing for Antiquity; Self-Centered

Posted by crshalizi at February 28, 2011 23:59 | permanent link

February 23, 2011

Supporting the Demonstrations in Wisconsin

When I was a student at Madison, I was happy to be part of our union, the Teaching Assistants' Association. They are, naturally, deeply involved in the events in Wisconsin, and I am very proud. If you want to help the demonstrators materially, the TAA will take your money and put it to good use. (It is characteristic, and in a good way, that there is a fund especially for cleaning up the state capitol building afterwards.) And if you're not sure why the fight in Wisconsin matters, well, there are lots people explaining the many reasons.

To add my little bit, and repeat myself: the single biggest thing which has gone wrong with America during my lifetime has been the economic stagnation for most of the country, accompanied by shifting risk from those who have resources and large organizations to individuals who don't have much. And that has gone hand in hand with the decline --- the repression --- of organized labor. Unions are not perfect, but no human institutions are, and to condemn unions, specifically, because they are sometimes hide-bound or self-serving is either folly or deceit. Unions are the only organized force in this country which seriously advocates, which pushes, for the material interests and dignity of ordinary working people. The fight in Wisconsin is about whether there is, finally, a limit to how far the dismantling of American labor can be pushed.

Manual trackback: Lisa Schweitzer

The Continuing Crises; The Progressive Forces

Posted by crshalizi at February 23, 2011 00:12 | permanent link

February 22, 2011

Your City's a Sucker, My City's a Creep

I'll let the abstract speak for me on this one:

CRS, "Scaling and Hierarchy in Urban Economies", submitted to the Proceedings of the National Academy of Sciences (USA), arxiv:1102.4101
Abstract: In several recent publications, Bettencourt, West and collaborators claim that properties of cities such as gross economic production, personal income, numbers of patents filed, number of crimes committed, etc., show super-linear power-scaling with total population, while measures of resource use show sub-linear power-law scaling. Re-analysis of the gross economic production and personal income for cities in the United States, however, shows that the data cannot distinguish between power laws and other functional forms, including logarithmic growth, and that size predicts relatively little of the variation between cities. The striking appearance of scaling in previous work is largely artifact of using extensive quantities (city-wide totals) rather than intensive ones (per-capita rates). The remaining dependence of productivity on city size is explained by concentration of specialist service industries, with high value-added per worker, in larger cities, in accordance with the long-standing economic notion of the "hierarchy of central places".

Figures and calculations were done with this code and data. I realize that's not fully up to spec for reproducible computational science, but I'm getting there.

(Yes, this the paper which I started because readers kept asking me questions, and yes, A Fermi Problem in Western Pennsylvania was spun off from the first draft, which was going to be just a blog post. It turns out that the journal is OK with putting submitted manuscripts on arxiv, or at least not too upset.)

Manual trackback: Metadatta; Chris Waggoner

Self-Centered; Enigmas of Chance; The Dismal Science

Posted by crshalizi at February 22, 2011 00:05 | permanent link

February 21, 2011

Reflections on the Revolutions in North Africa

Reading this interesting post on why protests can bring down authoritarian regimes, and a response distinguishing how long a regime happens to survive from how able it is to withstand crises, I can't help thinking of what Mr. Hume would say; or rather, had said:

NOTHING appears more surprizing to those, who consider human affairs with a philosophical eye, than the easiness with which the many are governed by the few; and the implicit submission, with which men resign their own sentiments and passions to those of their rulers. When we enquire by what means this wonder is effected, we shall find, that, as FORCE is always on the side of the governed, the governors have nothing to support them but opinion. It is therefore, on opinion only that government is founded; and this maxim extends to the most despotic and most military governments, as well as to the most free and most popular. The soldan of EGYPT, or the emperor of ROME, might drive his harmless subjects, like brute beasts, against their sentiments and inclination: But he must, at least, have led his mamalukes, or prætorian bands, like men, by their opinion.

Opinion is of two kinds, to wit, opinion of INTEREST, and opinion of RIGHT. By opinion of interest, I chiefly understand the sense of the general advantage which is reaped from government; together with the persuasion, that the particular government, which is established, is equally advantageous with any other that could easily be settled. When this opinion prevails among the generality of a state, or among those who have the force in their hands, it gives great security to any government.

Right is of two kinds, right to POWER and right to PROPERTY. What prevalence opinion of the first kind has over mankind, may easily be understood, by observing the attachment which all nations have to their ancient government, and even to those names, which have had the sanction of antiquity. Antiquity always begets the opinion of right; and whatever disadvantageous sentiments we may entertain of mankind, they are always found to be prodigal both of blood and treasure in the maintenance of public justice. There is, indeed, no particular, in which, at first sight, there may appear a greater contradiction in the frame of the human mind than the present. When men act in a faction, they are apt, without shame or remorse, to neglect all the ties of honour and morality, in order to serve their party; and yet, when a faction is formed upon a point of right or principle, there is no occasion, where men discover a greater obstinacy, and a more determined sense of justice and equity. The same social disposition of mankind is the cause of these contradictory appearances.

It is sufficiently understood, that the opinion of right to property is of moment in all matters of government. A noted author has made property the foundation of all government; and most of our political writers seem inclined to follow him in that particular. This is carrying the matter too far; but still it must be owned, that the opinion of right to property has a great influence in this subject.

Upon these three opinions, therefore, of public interest, of right to power, and of right to property, are all governments founded, and all authority of the few over the many. There are indeed other principles, which add force to these, and determine, limit, or alter their operation; such as self-interest, fear, and affection: But still we may assert, that these other principles can have no influence alone, but suppose the antecedent influence of those opinions above-mentioned. They are, therefore, to be esteemed the secondary, not the original principles of government.

For, first, as to self-interest, by which I mean the expectation of particular rewards, distinct from the general protection which we receive from government, it is evident that the magistrate's authority must be antecedently established, at least be hoped for, in order to produce this expectation. The prospect of reward may augment his authority with regard to some particular persons; but can never give birth to it, with regard to the public. Men naturally look for the greatest favours from their friends and acquaintance; and therefore, the hopes of any considerable number of the state would never center in any particular set of men, if these men had no other title to magistracy, and had no separate influence over the opinions of mankind. The same observation may be extended to the other two principles of fear and affection. No man would have any reason to fear the fury of a tyrant, if he had no authority over any but from fear; since, as a single man, his bodily force can reach but a small way, and all the farther power he possesses must be founded either on our own opinion, or on the presumed opinion of others. And though affection to wisdom and virtue in a sovereign extends very far, and has great influence; yet he must antecedently be supposed invested with a public character, otherwise the public esteem will serve him in no stead, nor will his virtue have any influence beyond a narrow sphere.

A Government may endure for several ages, though the balance of power, and the balance of property do not coincide. This chiefly happens, where any rank or order of the state has acquired a large share in the property; but from the original constitution of the government, has no share in the power. Under what pretence would any individual of that order assume authority in public affairs? As men are commonly much attached to their ancient government, it is not to be expected, that the public would ever favour such usurpations. But where the original constitution allows any share of power, though small, to an order of men, who possess a large share of the property, it is easy for them gradually to stretch their authority, and bring the balance of power to coincide with that of property.

This leaves open, of course, how anyone, subject or mamaluke, learns the opinions of their fellows regarding rights and interests; but this is one thing public political action is for.

Applications to other contemporary events, in which subjects cease to let themselves be led like brute beasts, will occur to my learned and sagacious readers, and so I will not belabor the obvious.

The Continuing Crises

Posted by crshalizi at February 21, 2011 17:13 | permanent link

February 17, 2011

Blogs and Bullets in Palo Alto

This coming Thursday (Feb. 24th), I'll be at the Blogs and Bullets 2011 conference at Stanford, being organized around the eponymous report for the United States Institute of Peace. (I imagine I'll have more to say about one than the other.) It's an invitation-only workshop, but if readers in the Palo Alto area would like to get in touch the next day, drop me a line; I will have free time but no car.

Commit a Social Science; Networks; Self-Centered

Posted by crshalizi at February 17, 2011 21:45 | permanent link

Additive Models (Advanced Data Analysis from an Elementary Point of View)

The curse of dimensionality limits the usefulness of fully non-parametric regression in problems with many variables: bias remains under control, but variance grows rapidly with dimensionality. (The number of points required to pin down a [hyper-]surface to within a given tolerance grows exponentially in the number of dimensions.) Parametric models do not have this problem, but have bias and do not let us discover anything about the true function. Structured or constrained non-parametric regression compromises, by adding some bias so as to reduce variance. Additive models are an example, where each input variable has a "partial response function", which add together to get the total regression function; the partial response functions are otherwise arbitrary. Additive models include linear models as a special case, but still evade the curse of dimensionality. Visualization and interpretation of additive models by display of the partial response functions. Fitting additive models is done iteratively, starting with some initial guess about each partial response function and then doing one-dimensional smoothing, so that the guesses correct each other until a self-consistent solution is reached. Incorporation of parametric terms, and interactions by joint smoothing of subsets of variables. Examples in R using the California house-price data. Conclusion: there is hardly ever any reason to prefer linear models to additive ones, and the continued thoughtless use of linear regression is a scandal.

PDF notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 17, 2011 21:30 | permanent link

February 16, 2011

Re-Writing Your Code (Advanced Data Analysis from an Elementary Point of View)

An extended example of re-writing code to make it more powerful, flexible, and clear, based on in-class discussion.

Calculating a standard error for the median of a particular Gaussian sample by repeated simulation, "manually" at the R console. Writing a function to automate this task, with everything hard-coded. Adjusting the function to let the number of simulation runs be an argument. Writing a parallel function to do the same job for an exponential distribution. Since this is almost entirely the same, why have two functions? Putting in a logical switch between hard-coded options. Better approach: abstract out the simulation into a separate function, and make the simulator an argument to the standard-error-in-median function. Example of applying the latter function to a much more complicated simulator. Advantages of the modular approach: flexibility, clarity, ease of adjustment. Example: removing a for loop in favor of replicate in the find-the-standard-error function, without having to change any of the simulators. Writing parallel functions to find the interquartile range of the median, or the standard error of the mean. Repeating the process of abstraction: the common element is taking a simulator, estimating some property of the simulation, and summarizing the simulated distribution. All three tasks are logically distinct and should be performed by separate functions. Reduction of bootstrapping to a two-line function taking other functions as arguments.

PDF handout, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 16, 2011 01:48 | permanent link

Splines (Advanced Data Analysis from an Elementary Point of View)

Kernel regression controls the amount of smoothing indirectly by bandwidth; why not control the irregularity of the smoothed curve directly? The spline smoothing problem is a penalized least squares problem: minimize mean squared error, plus a penalty term proportional to average curvature of the function over space. The solution is always a continuous piecewise cubic polynomial, with continuous first and second derivatives. Altering the strength of the penalty moves along a bias-variance trade-off, from pure OLS at one extreme to pure interpolation at the other; changing the strength of the penalty is equivalent to minimizing the mean squared error under a constraint on the average curvature. To ensure consistency, the penalty/constraint should weaken as the data grows; the appropriate size is selected by cross-validation. An example with the data from homework 4, including confidence bands. Writing splines as basis functions, and fitting as least squares on transformations of the data, plus a regularization term. A brief look at splines in multiple dimensions. Splines versus kernel regression. Appendix: Lagrange multipliers and the correspondence between constrained and penalized optimization.

PDF notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 16, 2011 01:47 | permanent link

Bootstrapping Will Continue Until Morale Improves (Advanced Data Analysis from an Elementary Point of View)

In which we attempt to weigh the heart of the cat.

Assignment

Solution

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 16, 2011 01:46 | permanent link

Testing Parametric Regression Models with Nonparametric Smoothers (Advanced Data Analysis from an Elementary Point of View)

Testing parametric model specifications against parametric imposes strong assumptions about how we can be wrong, and so is often dubious. Non-parametric smoothers can be used to test parametric models instead. Forms of tests: differences in in-sample performance; differences in generalization performance; whether the parametric model's residuals have expectation zero everywhere. Constructing a test statistic based on in-sample performance. Using bootstrapping from the parametric model to find the null distribution of the test statistic. An example where the parametric model is correctly specified, and one where it is not. Cautions on the interpretation of goodness-of-fit tests. Why use parametric models at all? Answers: speed of convergence when correctly specified; and the scientific interpretation of parameters, if the model actually comes from a scientific theory. Mis-specified parametric models can predict better, at small sample sizes, than either correctly-specified parametric models or non-parametric smoothers, because of their favorable bias-variance characteristics; an example.

PDF notes, incorporating R examples

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 16, 2011 01:45 | permanent link

February 14, 2011

Durlauf on Social Interactions

Attention conservation notice: Only of interest if you are (1) in Pittsburgh and (2) want to spend seven and a half precious hours of your life hearing about the econometric analysis of group effects on individual behavior.

Steve Durlauf, visiting CMU from Madison, will be giving a series of workshops on the economics and econometrics of social interactions (i.e., ones not mediated through anonymous market exchange). The talks will be February 17--18 and 21--23, 9:00--10:20 am in Hamburg Hall 1502, except on the 18th when it will be in Hamburg Hall 2503. I strongly recommend this to anyone who finds things like this interesting; Durlauf has been thinking about these matters for a long time, and is a leading scholar in the field. I myself have cleverly arranged to have scheduling conflicts on every single one of those days, but will be re-reading Blume, Brock, Durlauf and Ioannides.

Disclaimer: Durlauf and I are both affiliated with Santa Fe, and I was friends with a couple of his students in graduate school.

Update: Prof. Durlauf will be giving the statistics department seminar, "On the Observational Implications of Taste-Based Discrimination in Racial Profiling", at 4 pm on Monday, 21 February, in Scaife Hall 125.

Enigmas of Chance; Commit a Social Science; Networks; Incestuous Amplification; The Dismal Science

Posted by crshalizi at February 14, 2011 12:25 | permanent link

February 13, 2011

ITA 2011: Favorite Talks

UCSD campus during ITA 2011

ITA was great, partly for the reasons visible at right, and partly for getting to enjoy to gracious hospitality of Doug White, but mostly for the scientific exchange. So, some links to my favorite talks. (Note "favorite" and not "best".) I will not attempt to explain any of these adequately, or to list everyone's co-authors. It's good that so many of the papers are on arxiv, but unfortunate that not all of them are.

Todd Coleman, "A relationship between information theory and stochastic control when beliefs are decision variables" (Abstract, arxiv:1102.0250)
Ali Jadbabaie, "Consensus, Social Learning, and Distributed Estimation" (Abstract [in the evil Word format], SSRN/1550809)
The talk focused on new results which are not included in the SSRN paper, but in the same set-up --- if I can trust my notes, replacing sufficient conditions from the old paper with weaker necessary-and-sufficient conditions.
Aryeh Kontorovich, "Efficient classification for metric data" (Abstract, PDF of COLT 2010 paper)
It would be nice to know how this relates to Laplacian regularization methods, as in Belkin and Niyogi (journal version).
Daniel Lee, "Learning metrics for nearest neighbor classification" (Abstract, PDF of NIPS 23 (2010) paper)
The results for nearest-neighbor classification are nice, but it's the extension to estimating f-divergences, including Kullback-Leibler divergences and total variation distance, that I find most interesting.
Maxim Raginsky, "Shannon meets Blackwell and Le Cam: Coding Theorems of Information Theory and Comparison of Experiments"
Sasha Rakhlin, "From statistical learning to online learning and games" (Abstract, arxiv:1011.3168, arxiv:1006.1138)
Irina Rish, "A greedy coordinate ascent method for learning sparse Gaussian MRFs" (PDF of paper)
Is it possible to remove the Gaussian assumption by combining this with the non-paranormal?
Aarti Singh, "Consistent recovery of high-dimensional graph-structured patterns" (Abstract, PDF of NIPS 23 (2010) paper)
Ramon van Handel, "On a question of Blackwell concerning hidden Markov chains" (Abstract, arxiv:0910.3603)
Ramon had just nine slides, including title and references, and they weren't over-packed either. I envy this skill.
Frank Wood, "The Sequence Memoizer" (Abstract, PDF of ICML 2009 paper, website)
Roughly speaking, the sequence memoizer is a nonparametric-Bayesian version of variable-length Markov chains, with prior distributions tuned to handle the large-number-of-rare-events issue ubiquitous in text. The compression performance of the sequence memoizer is really remarkable, as you can see for yourself, and it would be nice to understand it better. Likewise, Frank's work, with David Pfau and Nicholas Bartlett, on infinite mixtures of probabilistic deterministic finite automata, is like a nonparametric-Bayesian version of CSSR. It would be good to understand the conditions under which either the memoizer or the PDFA mixture are consistent estimators, but I would think that.
Serdar Yüksel, "Comparison and characterization of observation channels for stochastic stabilization of noisy linear systems" (Abstract, arxiv.org:1009.3824)
Brian Ziebart, "Behavior forecasting with the principle of maximum causal entropy" (Abstract; see webpage for related papers)
Actually, I didn't go to Brian's talk, but he was kind enough to explain it to me anyway. It doesn't make me reconsider my general skepticism about entropy maximization as a principal of inference, but does make me even more interested in what sorts of large deviations principles might apply to cabbies, and to games more generally.

In addition to the talks, and many enlightening conversations, Anand introduced Maxim and me to the Noble Experiment, surely the best cocktail lounge in which the wall opposite the bar is entirely covered in gilded skulls. At least one of the three of us should probably have done some memento mori blogging.

Manual trackback: Anand, "zombie-blogging" the workshop, which makes me fear for the future of ITA. (He says he was only sick with the flu, but by this point we all know how the rest of that story goes.)

Enigmas of Chance; Postcards; Incestuous Amplification

Posted by crshalizi at February 13, 2011 17:30 | permanent link

February 07, 2011

Writing R Functions (Advanced Data Analysis from an Elementary Point of View)

As in some of my previous classes, there is a wide range of programming skill among the students in 402. The following notes are mostly intended to help those at the lower end of the scale catch up, but may be of some interest to others. (It presumes familiarity with using R from the command line.) The last section rips off largely incorporates Minimal Advice to Undergraduates on Programming.

Statisticians must be able to do basic programming; someone who only knows how to run canned routines is not a data analyst but a technician who tends a machine they do not understand. Programming in R is best organized around functions. Parts of a function and a function declaration. Writing functions to encapsulate repeated procedures. First example: calculating quantiles of Pareto distributions, by hand and by a function; checking the function. Extending the function. Writing functions which call other user-defined functions. Sanity-checking arguments, e.g., with stopifnot. More layering of functions: writing a Pareto random number generator. Our first bug. The debugging process; traceback as a useful utility. Checking the Pareto generator. Automating the checking process. Passing arguments from function to function with the ... pseudo-argument. More debugging. Contexts and "scope". Revising functions to work with each others. Avoiding iteration in R for speed and clarity. Returning lists and other complex data structures; writing a function to estimate a Gaussian. General programming advice: take a real programming class; comment your code; RTFM; start from the beginning and break it down; break your code into many short, meaningful functions; avoid writing the same thing twice; use meaningful names; check whether your code works; complain rather than giving up; avoid iteration.

PDF

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 07, 2011 22:40 | permanent link

San Diego or Bust

From (very, very late) Tuesday through the end of the week, I'll be at the Information Theory and Applications workshop at UCSD. Inexplicably, the organizers of the session in memory of the late, great David Blackwell on Thursday asked me to talk about Bayesian convergence under dependence and mis-specification; inexplicably, because the rest of the line-up is excellent. (And not just for that session.) Anand has already promised/ threatened near-live-blogging.

(And considering my flights: "bust" is not an ignorable alternative.)

Update, 13 February: follow-up.

Enigmas of Chance; Self-Centered

Posted by crshalizi at February 07, 2011 22:20 | permanent link

February 04, 2011

Advanced Data Analysis from an Elementary Point of View (36-402, Spring 2011)

Attention conservation notice: I continue my efforts to make this unreadable by promising to post 15--30 pages of lecture notes twice a week, plus homework assignments.

My class this semester is 36-402, "Advanced Data Analysis", for 68 students, about half statistics majors, most in their junior year, and about half seniors from other majors. They've just come off 36-401, modern regression, as taught by the excellent Prof. Nugent, so there's nothing more about linear models which would actually be useful that I could actually teach them. Instead, I've decided to take the "advanced" part seriously, and present modern techniques and concepts in ways which, hopefully, well-prepared undergraduates can actually grasp. (By the time they get to me, our majors are very well-prepared — but they are still undergraduates.) On the theory that the course notes might be of more general interest, I'll be posting them here. When I've tried things like this in the past, I put them all together on a page I updated over the semester, but I've been told separate posts would be more convenient; though this page will point to them all.

Some of the lectures are drafts for sections of STACS.

(Many of the notes are revisions of those for my data mining course. I confess I originally intended "data analysis" to just be data mining with the serial numbers filed off, but by the end of the semester I imagine the overlap will be no more than 50%.)

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:42 | permanent link

An Insufficiently Random Walk Down Wall Street (Advanced Data Analysis from an Elementary Point of View, Homework 4)

Getting comfortable with simulations and the bootstrap; and, in the hidden curriculum, writing functions.

PDF assignment

SPhistory.short.csv

Solutions (PDF, incorporating R)

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:41 | permanent link

Old Heteroskedastic (Advanced Data Analysis from an Elementary Point of View, Homework 3)

Learning to estimate variances and conditional densities.

PDF assignment

PDF solutions, R for solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:40 | permanent link

The Advantages of Backwardness (Advanced Data Analysis from an Elementary Point of View, Homework 2)

The "Get comfortable with cross-validation and kernels" problem set.

PDF assignment

R code for problem #2

PDF solutions, R for solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:39 | permanent link

What's That Got to Do with the Price of Condos in California? (Advanced Data Analysis from an Elementary Point of View, Homework 1)

The "As you all learned in kindergarden last semester" problem set.

PDF assignment

Data set

Solutions

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:38 | permanent link

The Bootstrap (Advanced Data Analysis from an Elementary Point of View, Lecture 8)

Statisticians quantify uncertainty in inference from random data to parameters through the sampling distributions of statistical functionals. These distributions are inaccessible in all but the simplest and most implausible cases. The bootstrap principle: sampling distributions under a good estimate of the truth are close to the true sampling distributions. Parametric bootstrapping: methods for finding standard errors, biases and confidence intervals, and for performing hypothesis tests. Double-bootstraps. Examples of parametric distribution with Pareto's law of income inequality. Non-parametric bootstrapping: using the empirical distribution itself as our model. The Pareto distribution continued. Bootstrapping regressions: resampling data-points versus resampling residuals; resampling of residuals under heteroskedasticity. Examples with homework data. Cautions on bootstrapping with dependent data. When does the bootstrap fail?

Comment: The parts of this article which I didn't plagiarize for the lecture notes I used for the homework.

PDF

R

pareto.R, wealth.dat

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:37 | permanent link

Simulation (Advanced Data Analysis from an Elementary Point of View, Lecture 7)

"Simulation" means: implementing the story encoded in the model, step by step, to produce something data-like. Stochastic models have random components and so their simulation requires some random steps. Stochastic models specified through conditional distributions are simulated by chaining together random numbers; the importance of conditional independence structures. Methods of generating random numbers with specified distributions. Simulation shows us what a model predicts (expectations, higher moments, correlations, regression functions, sampling distributions); analytical probability calculations are short-cuts for exhaustive simulation. Simulation lets us check aspects of the model: does the data look like typical simulation output? if we repeat our exploratory analysis on the simulation output, do we get the same results? If not, how specifically does the model fail? Simulation-based estimation: the method of simulated moments. Indirect inference, left as an exercise for the reader.

PDF

R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:36 | permanent link

Density Estimation (Advanced Data Analysis from an Elementary Point of View, Lecture 6)

The desirability of estimating not just conditional means, variances, etc., but whole distribution functions. Parametric maximum likelihood is a solution, if the parametric model is right. Histograms and empirical cumulative distribution functions are non-parametric ways of estimating the distribution: do they work? The Glivenko-Cantelli law on the convergence of empirical distribution functions, a.k.a. "the fundamental theorem of statistics". More on histograms: they converge on the right density, if bins keep shrinking but the number of samples per bin keeps growing. Kernel density estimation and its properties; some error analysis. An example with data from the homework. Estimating conditional densities; another example with homework data. Some issues with likelihood, maximum likelihood, and non-parametric estimation.

PDF

R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:35 | permanent link

Moving Beyond Conditional Expectations: Weighted Least Squares, Heteroskedasticity, Variance Functions (Advanced Data Analysis from an Elementary Point of View, Lecture 5)

Average predictive comparisons. Weighted least squares estimates. Heteroskedasticity and the problems it causes for inference. How weighted least squares gets around the problems of heteroskedasticity, if we know the variance function. Estimating the variance function from regression residuals. An iterative method for estimating the regression function and the variance function together. Locally constant and locally linear modeling. Lowess.

Comment: Predictive comparisons were really a held-over topic from the previous lecture, and I am not quite happy with putting local polynomials here.

PDF

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:34 | permanent link

Using Nonparametric Smoothing in Regression (Advanced Data Analysis from an Elementary Point of View, Lecture 4)

The bias-variance trade-off tells us how much we should smooth; introduction to the Oracle. Our ignorance of both bias and variance, now that the Oracles have fallen silent. Estimating the sum of bias and variance with cross-validation. Adaptation as a substitute for knowledge. Adapting to unknown roughness with cross-validation; detailed examples. Using kernel regression with multiple inputs: multivariate kernels, product kernels. Using smoothing to automatically discover interactions. Plots to help interpret multivariate smoothing results. Appendix: the multivariate Gaussian distribution.

PDF

R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:33 | permanent link

Evaluating Statistical Models (Advanced Data Analysis from an Elementary Point of View, Lecture 3)

The three big uses of statistical models: as summaries of data; as predictive instruments; as scientific models. Evaluation depends on the use. Prediction is the goal which admits of the most definite evaluations; reducing the evaluation of scientific models to checking predictions (without necessarily becoming an instrumentalists). Evaluating predictions by their average errors: in-sample error distinguished from generalization error; the latter is what really needs to be controlled. A gesture in the direction of statistical learning theory. Over-fitting defined and illustrated. Cross-validation for estimating generalization error and for model selection. Forms of cross-validation; k-fold CV generally preferable to leave-one-out CV.

PDF

R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:32 | permanent link

The Truth About Linear Regression (Advanced Data Analysis from an Elementary Point of View, Lecture 2)

Using Taylor's theorem to justify linear regression locally. Collinearity. Consistency of ordinary least squares estimates under weak conditions. Linear regression coefficients will change with the distribution of the input variables: examples. Why R2 is usually a distraction. Linear regression coefficients will change with the distribution of unobserved variables (omitted variable effects). Errors in variables. Transformations of inputs and of outputs. Utility of probabilistic assumptions; the importance of looking at the residuals. What "controlled for in a linear regression" really means.

PDF

R

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:31 | permanent link

Regression: Predicting and Relating Quantitative Features (Advanced Data Analysis from an Elementary Point of View, Lecture 1)

Statistics is the science which studies methods for learning from imperfect data. Regression is a statistical model of functional relationships between variables. Getting relationships right means being able to predict well. The least-squares optimal prediction is the expectation value; the conditional expectation function is the regression function. The regression function must be estimated from data; the bias-variance trade-off controls this estimation. Ordinary least squares revisited as a smoothing method. Other linear smoothers: nearest-neighbor averaging, kernel-weighted averaging.

PDF

R, example data for notes

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at February 04, 2011 01:30 | permanent link

January 31, 2011

Books to Read While the Algae Grow in Your Fur, January 2011

Attention conservation notice: I have no taste.

I'm not sure why I read so many mysteries this month.

Virginia Swift, Brown-Eyed Girl, Bad Company, and Bye, Bye Love
Mind-candy. Mystery series set among the more aggressively eccentric inhabitants of Laramie. I wish there were more. (Earlier: the 4th book in the series.)
Eilen Jewell, Letters from Sinners and Strangers, Heartache Boulevard, Sea of Tears
Flying Lotus, Cosmogramma (no relation)
School of Seven Bells, Alpinisms and Disconnect from Desire
Clare Burson, Silver and Ash and Thieves
The idea of my writing about music is absurd; but I know what I like, and some people have actually asked what that is.
Kate Collins, Mum's the Word and Slay It with Flowers
Mind-candy. Filled an afternoon when I should have been writing a problem set.
Bill Pomidor, Murder by Prescription, Skeletons in the Closet, Ten Little Medicine Men, Mind over Murder
Mind-candy. Charming cozy medical mysteries, with local color for mid-1990s Cleveland. I read them in '98, and then re-read them as part of the great on-going book purge. I found them quite enjoyable on the re-read (it helped that I'd forgotten whodunnit in every case).
John Quiggin, Zombie Economics: How Dead Ideas Still Walk Among Us
A brisk, non-technical but sound debunking of five economic notions which ought, by all rights, to lie mouldering in the grave, but instead continue to prowl the landscape, devouring brains: the "great moderation" of the economy, the efficient market hypothesis, dynamic stochastic general equilibrium models in macroeconomics, trickle-down policy (and the general virtues of increasing inequality to promote growth), and the wonders of privatization. Obviously whole books could be written about any one of these (e.g., this one about the efficient market hypothesis), or even whole libraries, so one cannot expect a volume like this to cover every aspect, but it does a very good job of getting at the essentials, and guiding the reader to more information.
That these are all right-wing ideas, more or less frequently deployed as ideological weapons to promote the interests of the already well-off, is no accident. First of all, Quiggin himself is a social democrat, a partisan of Keynesian ideas and the mixed economy. (He is also a very able technical economist.) Secondly, of course, there are no left-wing zombie ideas in economics that matter. You could argue that (say) Parecon is a left-wing undead idea, but it's a ghost, not a zombie: it haunts some cold and drafty corners of the house of intellect, where it makes rattling noises, but nobody save a few inhabitants of a few university towns knows or cares about it. The zombie ideas Quiggin combats, on the other hand, are at once legitimating charter myths and technical operating instructions for vast industries and national and international policies.
I should perhaps say that Quiggin's presentation is quite sober, despite his title and the (hilarious, adorable) cover. He even resists the temptation to make fun of what he debunks.
Typos p. 83 note 5: for "as discussed below", read "discussed above" (specifically, pp. 20--21). I think there were some other mistakes in cross-references, but didn't make notes of the others.
Disclaimer: Quiggin blogs at Crooked Timber, and I'm friends with some of his co-bloggers and have guest-posted there myself. But he and I have never met or corresponded, and I even bought my own copy of his book.
ObLinkage: Emerson on Quiggin on zombies.
W. G. Runciman, The Theory of Cultural and Social Selection
Runciman's Treatise on Social Theory is one of the most important books on applying evolutionary ideas to the development of society, giving an interesting (and properly selectionist) account of how institutions form, persist and change through the differential reproduction of practices. In Runciman's view, it's practices of social interaction evolve, not societies. (Or rather, societies evolve only in a derivative sense, like speaking of the "evolution of ecosystems".) His book articulates this idea with great sophistication and learning, drawing on a wide range of historical examples. His The Social Animal is a self-popularization, and quite excellent. He has many fine papers and essays applying his ideas to particular social phenomena. (I might particularly mention those on "The 'Triumph' of Capitalism" [New Left Review 210 (March-April 1995): 33--47] and "The Diffusion of Christianity" [European Journal of Sociology 45 (2004): 3--21].)
All of which is to say I would probably like this more if I hadn't looked forward to it quite so eagerly. It's not as good as the earlier books. In fact in many ways Runciman assumes an audience which has already read those books — how else to explain his free use of his neologism "systact", or the "three dimensions of social space" or "of social structure", without ever defining them? (For the record, Runciman's three dimensions are economic, ideological and coercive power. And a "systact" is a category of people who occupy a specific, persistent social role, and who, in virtue of this, have a similar location in the society's structure of power. "Systact" is supposed to be the genus of which "class", "caste", "rank", "order", "estate", etc., perhaps even "race" and "gender", are all species.) And there is what I can only call a smug tone which was lacking before, and makes me want to argue against the application of selectionist ideas to culture and society. (Hell, he makes me want to defend anti-reductionists and even post-modernists.) It is certainly not a systematic general theory of cultural and social selection.
He has also acquired an odd insistence that the social selection of practices comes after the cultural transmission of information, both logically and in time. This was not present in his earlier work, and it seems to oscillate between being either harmless or insupportable. Many animals have some degree of cultural transmission, apparently without social practices, so yes, social evolution is recent, if that means "since our last common ancestor with chimpanzees". But every human population I can think of does have transmitted social practices in this sense — at the very least related to kinship — and Runciman certainly doesn't give any counter-examples.
All of which said, I do think this is a worthwhile book for anyone seriously interested in applying evolutionary concepts to human culture and society: it's a restatement of leading ideas by a leading scholar in the area. (And if he is not so widely recognized as such, so much the worse for the area.) But you really need to have read at least one of his previous books, and be ready to argue back a lot.
Clark Glymour, Galileo in Pittsburgh
Glymour is an important contemporary philosopher of science for causality and causal inference; our philosophy department being what it is, this means he works on algorithms for consistently inferring causal structure from patterns of correlations. (Previously: the great big book of causal discovery [a huge influence on me], psychological aspects, not-exactly Great Thinkers from Frege to Carnap, and Theory and Evidence.) This book is a collection of non-technical essays on causal inference, philosophy of science, general philosophy, and living in America in the late 20th and early 21st century. This is some of his best writing — clear, intelligent, funny and winning — which is saying something. The essays are also generously spiced with arguments designed to infuriate a wide spectrum of readers; in some essays these are not so much the spice as the meat. Glymour is certainly not beyond enjoying provocation, but if this is trolling, it is Socratic trolling, the truly desired reaction being thought rather than outrage.
I read it all in one sitting back in April, but then somehow forgot to post about it!
Disclaimer: Clark's an acquaintance, and was kind enough to give me a copy of this book.
David De Jong and Chetan Dave, Structural Macroeconometrics
(Textbook website, with code and errata.)
Here "structural macroeconometrics" means fitting dynamic stochastic general equilibrium models to aggregate time series, after trying to make the time series stationary by removing trends. (They give a lot of space to de-trending, without discussing how much of the apparent predictive power of the models is due to the trends.) The general procedure is, once the data have been beaten into stationarity: find a linear approximation to your DSGE around its long-run average; this will be a linear state-space model or hidden Markov model. (They give a lot of space to various combinations of linear algebra and Taylor expansion.) Now estimate the model; they consider generalized method of moments, indirect inference, maximum likelihood, and Bayesian inference. (They also have a chapter on "calibration", which is way too respectful, though they cite and quote some of the important critical articles.) A final pair of chapters considers some ways of avoiding linearizing the system, combining familiar ideas from reinforcement learning with particle filtering.
There is some material here on goodness of fit, especially in chapter 6, and on parametric specification testing — more than in Christensen and Kiefer, but still not as much as I'd like. I was a bit surprised by the lack of material on identifiability (cf.), though they do talk a bit about how flat the likelihood functions often are. Even beyond that, though, a DSGE is only identifiable "up to" the state-space model, because it's the latter that's brought into contact with the data — two DSGEs leading to the same hidden Markov model are observationally indistinguishable. Indeed, it is indistinguishable from any economic model, general equilibrium or not, which leads to the same HMM.
The representative agent assumption is used very freely, and without much comment. (Come to think of it, does any model in the book not have a representative agent?) There is no discussion of what the representative agent actually represents, whether representative agent models really follow from aggregating multi-agent microeconomic models*, or even whether welfare calculations based on them make any sense**. It would, I suppose, be out of place for an econometrics book, especially a textbook, to critique the sorts of models macroeconomics has fixated on...
The implied reader is, to my eyes, curious: someone who knows not just economic jargon, but actually lots of utility theory --- but needs to have their hand held through basic numerical optimization, and taking Fourier transforms of time series. Assuming this is a fair guess at the actual preparation of economics students, it seems like a very reasonable textbook.
Disclaimer: De Jong is an external member of my student Daniel McDonald's thesis committee.
*: They do not, except under incredibly strong and fragile assumptions, such as every person in the economy having identical tastes, resources, and information. (It would be interesting to know how many macroeconomists who work with representative agent models also, in their teaching or public engagements, deploy the Hayekian commonplace that the wonderful thing about market economies is the way they make use of dispersed information, and coordinate divergent preferences.) There is not even, so far as I know, any result to the effect that general equilibrium with heterogeneous agents can typically be approximated by a representative agent model.
**: De Jong and Dave reproduce Lucas's calculations which assume a single agent representing the whole economy, getting utility from current consumption, and compare its utility under the actual history of the U.S. economy since WWII, including booms and busts, and a fictional history where consumption grew, without fluctuations, at the average historical rate. The difference in utility, under these assumptions, is very small, which is supposed to tell us how wonderfully optimal the economy is, and that we live in the "Republic of the central bankers", who we should leave the to do their jobs as they see fit.
But of course, when we have a recession, everyone does not just evenly reduce their consumption by 5%, possibly smoothed over time by perfectly-lubricated credit markets. Rather, many people are thrown out of work and suffer massive losses in income and wealth, to say nothing of humiliation and anxiety, degradation of skills, etc. Others keep their incomes, at least most of it, with more-than-usual fear about what would become of them and their families if they should lose their jobs for any reason. Everyone is in a worse bargaining position with employers. Lucas is correct, however, that recessions have not been a big deal for someone who lives off the dividends of a well-diversified portfolio.
Jane Langton, The Deserter: Murder at Gettysburg
17th (!) installment in Langton's consistently excellent mystery series; in which her heroes' latest enthusiasm is the Civil War. (Previously in the series.)
Victor Pelevin, The Sacred Book of the Werewolf
In which an underage Moscow prostitute who is actually a two-thousand-year old Chinese fox spirit recounts her adventures with assorted taxi drivers, portfolio investors, fellow foxes in Phuket and London, right-wing liberal humanists, simple foresters, eccentric English lords and werewolves working for the state security apparatus, allowing, in passing, for many improving conversations regarding, inter alia, the structure and function of the hypnotic organ in the fox's tail, the nature and meaning of post-Soviet life, the Russian soul (and how and to what extent it resembles a long-distance trucker indulging in sordid proclivities by picking up hitchhikers), the works of Vladimir Nabokov, westerns, In the Mood for Love, Buddhism, the role of amphetamines and cocaine in the formation of post-modern discourse, bewitchment by language and names, the sordid proclivities of portfolio investors, why no way which can be expressed in words can be the true way, how foxes hunt chickens, kundalini yoga, the philosophy of Berkeley, how foxes hunt English aristocrats, the advantages of living in tombs, the place of verse in prostitutes' self-advertisements, the role of ketamine in lycanthropy, whether there exist lower forms of life than Internet columnists and bloggers, and the liberating power of love (even when — this is no spoiler — it ends badly).
Shorter me: Oh, Victor Olegovich, where have you been all my life?
Rudolf and Margot Wittkower, Born Under Saturn: The Character and Conduct of Artists: A Documented History from Antiquity to the French Revolution
A wonderful, thought-provoking, anecdote-filled book which sets out to undermine readers' ideas about what artists are typically like, ideas derived from Romanticism (and its late off-shoot, psychoanalysis). Partly it does this by sensitively exploring Renaissance and early-modern European ideas about artists, e.g., their "melancholic" temperament, caused simultaneously by an excess of black bile, and, astrologically, by being "born under Saturn". Still more, though, the book is about the range of artists' actual behavior, as shown in period documents — and how much, if at all, that differed from the way their peers behaved. The Wittkowers are agreeable and skeptical authors, who wear a vast learning lightly, and have considerable, if somewhat amused, sympathy for their subjects (and less sympathy for fellow scholars). The imagined audience is the general educated public, not art historians, and I think anyone who enjoys the paintings and is curious about (but not reverent towards) the people who produced them should like the book.
(It's in print from the New York Review, and that's what the link above points to, but I haven't seen that edition; I read an old Norton paperback.)
Diana Rowland, Secrets of the Demon
Series mind-candy. (Previously: 1, 2.) ROT-13'd spoilers: V unq gur fhfcvpvba gung gur tbyrz jnf pbagebyyrq ol Zvpunry irel rneyl ba, ng yrnfg sebz gur fprar va gur ubhfr jurer ur'f urneq cynlvat gur cvnab. Ohg jul qvq V fhfcrpg gung?
Surveillance
Despite being warned by the cover-art about "twisted" and "depraved", I thought I was seeing a reasonably ordinary Rashomon homage for the first hour or so, and then was taken completely by surprise. Not sure I'd watch it again in retrospect.

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; The Dismal Science; Scientifiction and Fantastica; Writing for Antiquity; Enigmas of Chance; The Collective Use and Evolution of Concepts; Commit a Social Science; Philosophy; The Running Dogs of Reaction

Posted by crshalizi at January 31, 2011 23:59 | permanent link

January 19, 2011

"The Universal Glivenko-Cantelli Property" (Next Week at the Statistics Seminar)

For the first statistics seminar of 2011, we are very happy to welcome —

Ramon van Handel, "The Universal Glivenko-Cantelli Property"
Abstract: Uniform laws of large numbers are basic tools in many problems in probability theory, statistics, and machine learning. On the other hand, the law of large numbers is ultimately "just" a special case of two fundamental probabilistic limit theorems: the reverse martingale convergence theorem and the pointwise ergodic theorem. What can one say about uniform convergence in the more general setting? Surprisingly, it turns out that for a given class of functions, universal uniform convergence in the law of large numbers, the reverse martingale convergence theorem, and the pointwise ergodic theorem are all equivalent. Moreover, such classes of functions (which are more general than the well-known Vapnik-Chervonenkis classes) can be characterized by certain geometric and combinatorial properties. As an application, I will discuss the pathwise optimality of sequential decisions under partial information.
Place and time: Scaife Hall 125, on Monday, 24 January 2011, 4--5 pm
As always, the seminar is free and open to the public.

Enigmas of Chance

Posted by crshalizi at January 19, 2011 14:50 | permanent link

January 18, 2011

Must Macroeconomic Theories Have Microfoundations?

Attention conservation notice: Obvious reflections on a tired question, written down to get them out of my head while I work on other stuff, and posted to amuse connoisseurs with their naive presumption.

1. Obviously, macroeconomic phenomena are the aggregated (or, if you like, the emergent) consequences of microeconomic interactions. What else could they be? Analogously, the macroscopic physical properties of condensed matter all ultimately emerge from molecular interactions.

2. Macroeconomic theories which do not derive such phenomena from microscopic interactions are thus incomplete, and intellectually unsatisfying. Analogously, theories of condensed matter which do not derive the phenomena from molecular interactions are incomplete.

So: the true and complete theory of macroeconomics must emerge from the true and complete theory of microeconomics.

3. Incomplete theories are not (necessarily) false, or even lacking in value. (A model of the bulk properties of steel, or plastic, or bone, which doesn't include a derivation from molecular dynamics can be accurate, precise and useful. It could even be more accurate than a micro-founded model, if, e.g., we lack a precise understanding of the microscopic structure, or we can only calculate macroscopic consequences through crude approximations.)

4. If a well-established macro-level theory does not, currently, have any micro-foundations, the scientific approach would seem to be to dig those foundations, not to pull down the theory.

5. If a good macro-level theory cannot be founded on our current micro-level theory, this could be due to: (a) defects or weaknesses in our techniques for calculating aggregate consequences of micro-level interactions; (b) specifying the wrong sort of initial/boundary conditions, or interaction structures, in the microscopic models; (c) errors in our understanding of micro-level interactions and dynamics; (d) errors in our formulation of the macro-level theory.

There will certainly be some situations where (d) is right, e.g., because there is no possible way to derive the macro theory from any micro-level one. But it is hard for me to see why (d) should always be the preferred option in economics. Some adjustment of our various theories, models and techniques is required, but it seems mere prejudice that it should always be macro which adjusts. Even if one thought that standard microeconomic theory was very securely established and successful (which is dubious), even well-established and successful theories can contain systematic mistakes, which might (e.g.) only be detectable when one looks at aggregate consequences. (If nothing else, statistical power grows with sample size!) Or again: perhaps there's nothing wrong with the specification of what individuals are like and how they interact, but the simplification of always solving for the equilibrium is wrong, and while it's not very wrong for any one market over a short period, the error accumulates when one goes to the level of whole economies over years and decades.

To continue the analogy, circa 1900 classical mechanics and electromagnetism were extremely well-confirmed theories, in much better shape that microeconomics is. Nonetheless, any attempt to explain condensed matter physics on that basis, starting from molecular interactions, was doomed. (For instance, classical physics predicts that matter should be unstable.)

6. Even if one has a true microscopic theory, the best way to develop theories of macroscopic phenomena is not, necessarily, to start from a microscopic model. Sometimes it will be, but there's no reason to think that's a general rule for all problems. The answer might even vary from theorist to theorist, depending on skills, experience, etc.

7. Micro-founded models would be more suitable for policy-making only if it is easier to develop an accurate causal model of how individuals and their interactions respond to policy changes, and to aggregate the results, than it is to develop a macro-level causal model. Why should we think this?

For "economics", read "sociology", "political science", "ecology", etc., etc., as appropriate.

Update, 24 January: Both J. W. Mason at the Slack Wire and Tom Slee at Whimsley have written substantial responses which deserve detailed replies, but are unlikely to get them soon. Since I can't stand to work on tomorrow's lecture any more tonight, I will just say a few words about Slee's, which is easier for me to reply to.

First, effective evolutionary explanations resolve themselves into causal ones. This is because they contain feedback mechanisms which make selection operative. This in turn means that the current properties of organisms and populations are the result of the causal interactions of their ancestors with past environments, and so causes do indeed precede effects. To use a distinction which I believe originated with Monod, these explanations are teleonomic, not teleological.

Second, Slee raises the issue of when descriptions and explanations in terms of coarse-grained macro-variables are not just more familiar but actually in some sense more effective than ones in terms of fine-grained micro-variables is a very deep one. (Mason also brings this up, but invokes additional considerations which I don't feel I have time to go into now.) To my mind this is the heart of emergence, at least in the sense in which I can make sense of the word and don't find it trivial. I have tried to tackle this at length elsewhere, by giving an information-theoretic account of when a set of macroscopic variables, or more precisely the states defined by them, emerge from microstates by enabling more efficient and self-contained prediction at the higher level. My paper with Cris Moore (here or here) has details, though we presumed some familiarity with stat. mech. (I also took a stab at connecting this to cognitive science; and you could always read the last chapter of my dissertation, if you want to chance death by boredom.) I am not, obviously, well-placed to judge my own efforts in this line, but if it's even roughly right, then there is, indeed, no contradiction between insisting on reductionist accounts for higher-level phenomena, and pursuing autonomous (or nearly autonomous) causal explanations in terms of higher-level variables. Whether the variables used in current macroeconomics have the right properties is, of course, a different question. I should also mention, in this connection, Clark Glymour's "When Is a Brain Like the Planet?"; I believe Clark's arguments mesh very well with those in my papers, though we've never hashed that out and he might disagree.

Third, I suspect anyone who likes Slee's examples will also enjoy Wolfgang Beirl's notes on the predictability of physicists, and vice versa.

Mason next time, inshallah.

Manual trackback: The Slack Wire; Beyond Microfoundations; Blake Riley; Grasping Reality with $Numerosity $Instrumentalities; Critiques of Libertarianism; Whimsley; Critiques of Collectivism; the blog formerly known as The Statistical Mechanic; D2 Digest (I agree; see point 5 above); Unfogged [I have explained why I do not find "supervenience" a useful notion elsewhere]

Complexity; The Dismal Science

Posted by crshalizi at January 18, 2011 20:45 | permanent link

January 17, 2011

Down the Stairs and to the Left (This Week at the Philosophy Colloquium)

I'm speaking at the CMU philosophy department's colloquium this week. I do not pretend to fully understand how this happened, but no doubt that by the end of the day I will enjoy a simultaneously higher and more profound level of puzzlement about many matters.

"Praxis and Ideology in Bayesian Statistics"
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise of Bayesian statistics in applications. In this talk, I hope to persuade you that the most successful practices of Bayesian statistics do not actually support that philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. Drawing on the literature on the consistency of Bayesian updating and also on experience of applied work, I examine the actual role of prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. I argue that good Bayesian practice is very like good frequentist practice; that Bayesian methods are best understood as regularization devices; and that Bayesian inference is no more inductive than frequentist inference, i.e., not very. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not conform to their ideology.
Based on joint work with Andrew Gelman.
Date: Thursday, 20 January 2011
Time and place: Reception 4:00--4:35 in Doherty Hall, talk 4:45--6:00 in Baker Hall A53.

Self-Centered; Bayes, anti-Bayes; Philosophy

Posted by crshalizi at January 17, 2011 12:45 | permanent link

January 15, 2011

A Fermi Problem in Western Pennsylvania

Attention conservation notice: A 2000-word attempt to reduce decades of painstaking empirical work and careful theorizing in economic geography to a back-of-the-envelope calculation; includes a long quotation from a 19th century textbook of political economy. An outtake from a post that turned into a paper-in-progress, posted now because I'm stuck on a proof in another paper, and don't want to work on writing the next problem set for 402.

Physicists are fond of a kind of rough estimation exercise they call "Fermi problems", since our folklore attributes them to the great Enrico Fermi. A classic instance is the one I first encountered, as a physics undergrad at Berkeley: how many piano tuners are there in the East Bay? Well, there are about a million people living around the eastern shore of the San Francisco Bay, i.e., on the order of 106. How many people are there per piano? 10 per piano seems high, but 10,000 per piano seems low, say 103 per piano. How often a piano needs to be tuned? Clearly not every day, or even every week, but also not once a decade, so something like once a year. Thus the East Bay needs about 103 piano-tunings per year. How quickly can a piano be tuned? Probably in less than a week but more than an hour, so something like a day, or about 10-2 years. So there should be about 10 piano tuners in the East Bay. The professor, having elicited these numbers, then told us to "look it up in the phone book"; having pulled the same stunt myself since then, I can tell you that any number between 5 and 50 will be declared "the right order of magnitude".

Suppose we were interested not in greater San Francisco but Stewart Township, Pennsylvania, the site of Fallingwater: how many piano tuners does it have? Stewart Township has a population 7*102, and our reasoning above says it's got something like one piano, and so demands one day of piano-tuning per year. What does the piano tuner do the rest of the time? They could be an ordinary citizen, who only becomes a piano tuner once a year when it's called for. Or it could be that Stewart Township shares a specialist piano-tuner (or three) with the 6*105 other people of the Laurel Highlands. Since tuning a piano is a reasonably demanding skill, it's much more likely that it's done by a specialist.

What goes for piano tuners goes for other specialists. Most people need their skills rarely, or need only a small fractional share of their output, or need it only indirectly. (You want to hear piano music, so the pianist needs to find a tuner.) Small settlements cannot keep them occupied full time. But there are fixed costs to specialist services --- tools, of course, but more essentially the time and effort needed to acquire, maintain and develop the specialist's skills. It is more efficient for one specialist to serve many people, thereby spreading the fixed costs over many customers, which rules out the part-time amateur in each village. (More exactly, since the local amateurs lack the skills to do the job well, they can only compete with the specialists by being much cheaper, or if customers can't tell the difference.) This will tend to divide up a dispersed population into regions served by one or another specialist; increasingly specialized skills will require increasing large population bases.

It is not required by this argument that the specialists be located near each other; but it tends to happen. After all, they need each others' services, and being located near each other reduces transport costs for them, and there will often be economies of scope in setting up specialists near each other. (If everyone needs to take or make freight deliveries, they can share one set of loading docks, etc.) If demand is high enough to support multiple specialists, there can be "agglomeration economies": they can begin to benefit from each other by sharing information and knowledge, creating a local market for their specialist suppliers, etc. There is a famous passage from Alfred Marshall (in 1890) which is traditionally trotted out on these occasions, and far be it from me to break with tradition:

When an industry has thus chosen a locality for itself, it is likely to stay there long: so great are the advantages which people following the same skilled trade get from near neighbourhood to one another. The mysteries of the trade become no mysteries; but are as it were in the air, and children learn many of them unconsciously. Good work is rightly appreciated, inventions and improvements in machinery, in processes and the general organization of the business have their merits promptly discussed: if one man starts a new idea, it is taken up by others and combined with suggestions of their own; and thus it becomes the source of further new ideas. And presently subsidiary trades grow up in the neighbourhood, supplying it with implements and materials, organizing its traffic, and in many ways conducing to the economy of its material.

Again, the economic use of expensive machinery can sometimes be attained in a very high degree in a district in which there is a large aggregate production of the same kind, even though no individual capital employed in the trade be very large. For subsidiary industries devoting themselves each to one small branch of the process of production, and working it for a great many of their neighbours, are able to keep in constant use machinery of the most highly specialized character, and to make it pay its expenses, though its original cost may have been high, and its rate of depreciation very rapid.

Again, in all but the earliest stages of economic development a localized industry gains a great advantage from the fact that it offers a constant market for skill. Employers are apt to resort to any place where they are likely to find a good choice of workers with the special skill which they require; while men seeking employment naturally go to places where there are many employers who need such skill as theirs and where therefore it is likely to find a good market. The owner of an isolated factory, even if he has access to a plentiful supply of general labour, is often put to great shifts for want of some special skilled labour; and a skilled workman, when thrown out of employment in it, has no easy refuge. Social forces here co-operate with economic: there are often strong friendships between employers and employed: but neither side likes to feel that in case of any disagreeable incident happening between them, they must go on rubbing against one another: both sides like to be able easily to break off old associations should they become irksome. These difficulties are still a great obstacle to the success of any business in which special skill is needed, but which is not in the neighbourhood of others like it: they are however being diminished by the railway, the printing-press and the telegraph.

What we have argued ourselves into, on the basis of little more than a realization that comparatively high fixed costs matter, is to think that there should be spatial clumps of economic activity, where we find a lot of specialists, and that these clumps should come in grades, with more clumps containing less-specialized enterprises with less-increasing returns, and fewer clumps containing the more-specialized, more-increasing-returns enterprises. We call the clumps "towns" and "cities". (And indeed, if I can trust my searching, the nearest piano tuner to Fallingwater is located in the town of Connellsville, population 9*103.) The gradations of the clumps form the "hierarchy of urban places", an idea which has been familiar to economic geographers since at least the work of Christaller and Lösch in the 1930s. It implies that there isn't just quantitatively more economic activity in a bigger settlement, but generally different kinds of activity. Stewart Township is not a scaled-down version of Connellsville, which is not a scaled-down Pittsburgh, which is not a scaled-down Chicago or New York.

Moreover, the argument is more generally than just specialized services. It turns on having low marginal costs (a day of a heart surgeon's time to do an operation) compared to high fixed costs (ten years of training to become a heart surgeon). But the fixed costs don't have to be time, and similar logic will work for just about any industry with increasing returns, if transport costs are not prohibitive. So as we move up the hierarchy of urban places, we should find not only more, and more specialized, service providers, but also more industries with increasing returns, and, you should forgive the expression, increasingly increasing returns at that. One way industries come to have increasing returns is by being relatively capital- (as opposed to labor-) intensive, which will tend to increase the output per worker.

All of the above applies with great force to creating and disseminating new abstract, formalized, discursive knowledge. It is highly specialized, the fixed costs of entering are very high, economies of scope are important, the effects of agglomeration are important, and the cost of transporting the finished product is zero. All else being equal, we should expected knowledge production to be concentrated towards the top of the urban hierarchy.

All of this is, as I said, very standard stuff in economic geography and urban and regional economics. I learned much of it at (pretty literally) my father's knee, and it was old when he learned it from his teachers. (There is even a version of it in ibn Khaldun's Muqaddimah, from 1377: see ch. 5, sec. 15--22 [pp. 314--318 of the Rosenthal/Dawood translation] on the crafts, and again ch. 6, sec. 7--8 [pp. 340--343] on the sciences.) Of course the version I gave above was a bit of a cheat, in at least two ways. First, it was a story about how a certain outcome would be efficient, but that efficiency rested on a lot of unspoken or hinted-at premises about the relative sizes of different sorts of costs and values. (How many camel caravans are there in the East Bay?) Second, even granting the efficiency, would it really be brought about by the acts of interacting decision-makers, in the absence of a super-detailed coordinating plan?

Both of these questions, but especially the latter, have been the focus of a lot of very interesting work in economics over the last few decades. (Filial piety requires me to recommend this paper as an overview, but it's good, so that's easy to do.) One of those involved in this has been none other than Paul Krugman, who was one of the people who realized that new techniques for modeling imperfect competition with increasing returns could be used to attack the origin of cities and of industrial clusters. One of the things he also realized is that the problem of where the specialists should locate themselves is one of symmetry breaking, just like many kinds of pattern formation from physics — and named it as such, in a lovely little book from 1996, The Self-Organizing Economy. A later book, Fujita, Venables and Krugman's The Spatial Economy elaborated on that analysis, showing how the mixing increasing returns with the logic of comparative advantage leads naturally to spatial patterns of what can only be called combined and uneven development, again through symmetry breaking. (The nucleation of a high-productivity center not only inhibits the growth of other centers near it, it de-industrializes its periphery.) In my humble supremely arrogant opinion, this one of the few places where interesting ideas from physics have been productively used in the social sciences.

Update, next day: typo fixed, thanks to Cris Moore.

The Dismal Science

Posted by crshalizi at January 15, 2011 17:45 | permanent link

January 14, 2011

Are the Stars Right?

As every schoolchild knows, the real 13th sign of the Zodiac is really Arachne (May 13 to June 9). Please re-calibrate accordingly.

Psychoceramica

Posted by crshalizi at January 14, 2011 12:25 | permanent link

January 11, 2011

Advanced Data Analysis from an Elementary Point of View

At the intersection of Enigmas of Chance and Corrupting the Young.

Course announcement.

Course homepage.

Lectures
  1. Regression: Predicting and Relating Quantitative Features
  2. The Truth About Linear Regression
  3. Evaluating Statistical Models
  4. Using Nonparametric Smoothing in Regression
  5. Moving Beyond Conditional Expectations: Weighted Least Squares, Heteroskedasticity, Variance Functions
  6. Density Estimation
  7. Simulation
  8. The Bootstrap
  9. Re-capitulation and Q&A, no notes
  10. Testing Regression Models
  11. Splines
  12. Additive Models
  13. More on hypothesis testing
  14. Logistic Regression and Logistic-Additive Models
  15. Generalized Linear Models and Generalized Additive Models
  16. GLM Practicals
  17. Principal Components Analysis
  18. Factor Analysis
  19. Mixture Models
  20. Mixture Model Examples and Complements
  21. Graphical Models
  22. Graphical Causal Models
  23. Estimating Causal Effects
  24. Discovering Causal Structure
  25. Conclusion: Statistical Data Analysis
Homeworks
  1. What's That Got to Do with the Price of Condos in California?
  2. The Advantages of Backwardness
  3. Old Heteroskedastic
  4. An Insufficiently Random Walk Down Wall Street
  5. Bootstrapping Will Continue Until Morale Improves
  6. Nice Demo City, But Will It Scale?
  7. Diabetes among the Pima
  8. Fair's Affairs
  9. Patterns of Exchange
  10. Estimating with DAGs
Exams
  1. Midterm: Urban Scaling, Continued
  2. Second exam: Mystery Multivariate Data
Other Handouts
Writing R Functions
Re-Writing Your Code

Self-Evaluation and Lessons Learned

Posted by crshalizi at January 11, 2011 10:30 | permanent link

January 07, 2011

Why I Am So Uncannily Prescient

<pomposity level="more than usual"> As may be verified from the date-stamps (and confirmed, in case there should be any question, through the Wayback Machine), I posted my neutral model of scientific inquiry days and days before the appearance of the risible ESP paper, and the nearly equally risible New Yorker piece on something being wrong with "the scientific method". (I shall not dignify either with a direct link.) Before, yet not so long before! Clearly, this is no mere coincidence! A vulgar mind, bound to what it misleadingly regards as material "realities", might suggest that I was led to write about a long-standing pet idea when several acquaintances who has been exposed to the pre-publication publicity for the ESP paper asked me what I thought of it. Higher beings, on the contrary, will clearly perceive that I now possess powers of prediction which allow me to see through the mist of time itself as though it were clear mountain air. (As those links suggest, I attribute the development of my powers to rigorously following the secret ascetic practices transmitted to me by the ascended masters of the turquoise trail, whom I sought out in the demon-haunted western deserts many years ago.) While it is gratifying to have so many people bring these proofs of my pre-cognitive abilities to my attention (gratifying, yet, in the nature of the case, quite unsurprising), you may now cease to do so. My spiritual energies are currently fully devoted to helping my students achieve enlightenment, and I must leave the pre-refuted to their fates. </pomposity>

Manual trackback: An Ergodic Walk

Self-centered; Learned Folly; Enigmas of Chance

Posted by crshalizi at January 07, 2011 16:52 | permanent link

January 01, 2011

End of Year Inventory, 2010

Papers finished during 2010: 7
Papers written in response to "what's up with this?" e-mails from readers: 1
Papers accepted: 1
Papers in refereeing limbo: 2
Papers where I am grumbling about the third referee: 3
Papers which will be submitted next week, after giving typos a chance to ripen and become obvious: 1
Papers rejected: 0

Paper with co-authors waiting on my contributions: 2
"We should totally write a paper about this" conversations with non-trivial follow-up: 7

Manuscripts refereed: 34, for 14 journals and conferences
Manuscripts waiting for me to referee: 2
Manuscripts for which I was the responsible associate editor at Annals of Applied Statistics: 6

Grant proposals submitted: 4
Proposals funded: 1
Proposals in refereeing limbo: 2
Proposals rejected: 1

Students who finished and defended their dissertations: 1
Students who are now ABD: 1
Students at work on their dissertation proposals: 1
New classes taught or co-taught: 3 [i, ii, iii]

Talks given: 23, in 14 cities (or really 12 cities, one tourist village and one weird mansion in the middle of nowhere)

Book reviews published on dead trees: 3 [i, ii, iii]
Non-book-reviews published on dead trees: 1

Weblog posts written: 80
Substantive posts written: 34, counting algal growths

Books started: 202
Books finished: 186
Books bought: 366
Books sold: 304
Books donated: 750

Book manuscripts completed: 0

Major life changes: 1

Self-Centered

Posted by crshalizi at January 01, 2011 16:45 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems