January 31, 2010

Books to Read While the Algae Grow in Your Fur, January 2010

Virginia Swift, Hello, Stranger
Mind-candy. Enjoyable mystery with eccentric academics, God-botherers and gentrification in present-day Laramie. 4th book in a series; I'll keep an eye out for the others. [Later: vols. 1--3]
Intelligence
Smart crime/spook drama set in one of the most attractive cities in the world (Vancouver), which could only be improved if it didn't end in the WORST CLIFFHANGER EVER. (Ahem.) Not, of course, as good as The Wire, but then nothing is.
Daniel Waley, The Italian City-Republics
Short, readable political-institutional history of the communes of northern and central Italy. He begins with the communes starting to take form in the towns and wrest control from their bishops, say around 1000, and ends by about 1400, by which point the towns had almost all, except for Venice, descended into some form of monarchy, generally under the domination of the local feudal land/war-lords. (Waley says little about Venice, which in retrospect seems odd, though it didn't strike me while reading it.) While Waley is good at describing this historical trajectory, he says little about why so many Italian cities followed it. I'd think it'd be natural to compare the Italian case to contemporary cities elsewhere, but I think there is exactly one sentence on them. (I imagine all kinds of interesting comparative work could be or has been done.) But within those limits, it's a nice book. Waley has also written studies on Siena and Orvieto, which sound interesting.
Terry Pratchett, Nation
You don't really need me to recommend Terry Pratchett to you, especially when he's writing about how people find ways to go on when their world has been pointlessly destroyed.
Richard Hofstadter, Anti-Intellectualism in American Life
Astonishingly, this still feels like it fits after a lapse of half a century. The whole "tax-raising, latte-drinking, sushi-eating, Volvo-driving, New-York-Times-reading, body-piercing, Hollywood-loving, left-wing freak-show" nonsense of the last thirty years now makes a lot more sense; and the chapters about the history of American education were frankly a revelation to me. (The chapter on Dewey and his pedagogical influence seems like a model of being respectfully but unrelentingly critical.) No doubt for real historians, this is all painfully outdated, and whatever's actually sound has long since been incorporated into other works, which don't provide such unintentional moments of amusement as, when listing the unfair accusations heaped on Jefferson, including keeping a slave mistress and having children by her. (For that matter I don't care for the Beats very much, but they certainly contributed more to our literature than he thought they would.) Still: the man could write.
ObLinkage: Steve Laniel on AIiAL.
D. N. MacKenzie (trans.), Poems from the Divan of Khushâl Khân Khattak
The first significant body of poetry in Pashto; Khushal was a 17th century warlord in what is now the Northwest Frontier, owing his position to a combination of tribal authority and appointment by the Mughals. This seems to be the most recent translation of a selection from his poetry in English, dating from 1965. It is arranged on no particular principles (some Pashto editions are, following tradition, arranged alphabetically by the first letter of the poem), which produces a rather odd effect, that I might summarize as follows: Khushal is happily in love: wow is the beloved a hottie. Khushal is unhappily in love: separation is awful, especially if it's because the beloved doesn't want to see Khushal. Khushal is a fierce warrior who is also a keen hunter; falconry rules. Khushal has a remarkable capacity for drink. (Go ahead, try and tell me that's allegorical.) Aurangzeb sucks, especially in comparison to his father. (Well, he did, and sticking Khushal in jail can't have won him any points.) The Afghans should rally to Khushal and defeat Aurangzeb! Men are treacherous, false-faced bastards, but Afghans are really worse than the rest. (To be fair, having one of your own sons wage war on you in the name of Aurangzeb has got to be pretty embittering.) Khushal will withdraw from the sinful world and spend his days in pious penance. Khushal glorifies God. Repeat.
My grandfather's extemporized translations were better English poetry, but I will never hear those again.
Moez Draief and Laurent Massoulié, Epidemics and Rumors in Complex Networks
A nice short (< 120 pp.) account of the connections among stochastic network models, branching processes, and epidemic models, of the "susceptible-infectious-susceptible" or "susceptible-infectious-recovered" type, including epidemics on networks. ("Rumors" are assumed to fall under such models.)
They begin with the basic Galton-Watson branching process model, where each member of a population produces a random number of descendants (possibly zero), independently of everyone else, and this distribution is constant both within and across generations. Following over a century of tradition, they look at whether the population survives forever or goes extinct, how large it gets, how long it takes to go extinct if it does, etc. This then gets turned into a simple epidemic model ("member of population" = infected individual). It also maps on to the Erdos-Renyi network model, with "has an edge with" taking the place of "is a descendant of": pick your favorite node, and connect it to a random selection of other nodes, the number following a binomial distribution; connect each of them in turn to more random nodes. The size of the branching process's population corresponds to the size of the connected component in the graph. The mapping really only really works in the limit of low-density graphs (the size of the component is roughly a sum of independent quantities when there are no loops), but it's enough to study the emergence of a giant component and the behavior of the diameter of the graph. As a prelude to more sophisticated models, they then prove a form of Kurtz's Theorem on the convergence of Markov chains to ordinary differential equations in the large-population limit. The second half of the book rehearses Watts-Strogatz small-world and Barabási-Albert scale-free networks (including mention of Yule but not, oddly, of Herbert Simon), before wrapping up with epidemic models on graphs, and the "viral marketing" problem of deciding where, on a known and fixed network, to start an epidemic for maximum impact.
Of course, since it's a mathematics book, the problem of how to link these models to data isn't even dismissed.
This isn't a ground-breaking work, but it's nice to have all this in a single book, and one a bit more accessible than, say, Durrett's Random Graph Dynamics (though by the same token less comprehensive). The implied reader is comfortable with stochastic processes at the level of something like Grimmett and Stirzaker; measure-theoretic issues are avoided, even when discussing Kurtz's Theorem. (Their version is thus much less precise and powerful than his, but vastly easier to understand.) Anyone comfortable with that level of probability could read it without much trouble, and I'd happily use it in a class.
Disclaimer: I read a draft of the manuscript for the publisher in 2007, and they sent me a free copy of the book, but I have no stake in its success.
Joseph L. Graves, Jr., The Emperor's New Clothes: Biological Theories of Race at the Millennium
There are places where he lapses into biological jargon, and others where I think lay readers would have benefited from more detailed rebuttals of the common counter-arguments, but over-all I recommend this very strongly. (Thanks to I.B. for lending me her copy.)
Pascal Massart, Concentration Inequalities and Model Selection
Using empirical process theory, and more specifically concentration of measure, to get finite-sample, i.e., non-asymptotic, risk bounds for various forms of model selection. The basic strategy is to find conditions under which every model in a reasonable class will, with high probability, perform about as well on sample data as they can be expected to do on new data; this involves constraining the richness or flexibility of the model class. A little extra work, and the addition of suitable penalties to the fit, gets bounds that extend over multiple classes of model, even over a countable infinity of classes. Among other highlights, Massart shows why the famous AIC heuristic is often definitely sub-optimal, and how to correct it; it also offers corrections to Vapnik's (much better) structural risk minimization, and a nice treatment of data-set splitting (= 1-fold cross-validation). All of this is for IID data, so the usual caveats apply. Formally self-contained, but realistically some previous exposure to empirical processes (at the level of Pollard's notes if not higher) will be needed. Available for free as a large PDF preprint, but I found it much more convenient to read a dead-tree copy.
Elizabeth Bear, New Amsterdam
Alternate-history fantasy mystery stories. Owing something, perhaps, to Randall Garrett's "Lord Darcy" stories (the name of the heroine is distinctly suspicious), but without their complacency about the benevolence of the powers that be.
David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining
I've used this three times now in teaching 36-350, with about 75 students total over the years. I keep using it because it's the best textbook on data-mining I know. It covers the whole process, soup to nuts: data collection (and the importance of understanding what the data actually mean, if anything), cleaning, databases, model construction, model evaluation, optimization, visualization, etc. All of this is organized around four crucial questions: what kind of pattern are we looking for in the data, and how do we represent those patterns? how do we score representations against each other? how do we search for good representations? what do we need to do to implement that search efficiently? All of the basic methods (and many not so basic ones) are in here, all seen as different answers to these questions. I find its explanations extremely clear, and my students seem to as well. I regard it as a strength that it is not tied to pre-canned software, which would only encourage dependency and thoughtlessness.
The only real competition, to my mind, is Hastie, Tibshirani and Friedman. But the Stanford book is distinctly more about statistics, and has more statistical theory and math (though not, from my point of view, a lot of either), whereas this one is distinctly focused on data-mining and on computation. It would be nice if Hand &c. had material on support vector machines, and more on ensemble methods; perhaps it's time for a second edition?
Disclaimer: I almost took a post-doc under Smyth rather than coming to CMU, back in 2004; also, the MIT Press sent me a free review copy of this book (in 2001).

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Scientifiction and Fantastica; Writing for Antiquity; Afghanistan and Central Asia; The Natural Science of the Human Species; Networks; The Beloved Republic; The Commonwealth of Letters; Learned Folly

Posted by crshalizi at January 31, 2010 23:59 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems