July 06, 2006

Statistical Arbitrage in the Sky

Reading this piece in the New York Times about Farecast, Oren Etzioni & co.'s attempt to forecast airline ticket price movements (through the power of machine learning!) leaves me with three reactions.

First, it's a relief to read something about data-mining and airlines which isn't Yet Another Unconstitutional Step Towards a National Surveillance State. In fact, this would be cool if it works.

Second, I'm professionally curious about how well Farecast's predictions, with over 100 independent variables, would compare to simple alternatives, like low-memory hidden Markov models, or throwing out all the variables except the time remaining until departure. (Since I'm teaching data mining in the fall, I'm also professorially curious.) I suspect that, even if there is some real improvement, it is small. Come to that, the trivial predictor which always forecasts a price increase is going to set a pretty high baseline accuracy...

Third, I'll be surprised if does work, not on technical grounds, but because I don't see how it makes sense for airlines to cooperate. The ideal airline pricing scheme is one which gouges you just enough that you're indifferent between taking the flight and not going at all (or taking some alternate mode of transport, etc.). This is why flights at ungodly hours are cheaper than those on the same route at decent times: by showing up at five in the morning to have your luggage prodded, etc., you signal to the airline that they really can't get any more money out of you. [1] Now suppose that you want to take a certain flight, and there's a maximum amount you'd pay for it. As things currently work, you look up the price and see what the airline is currently charging. If that price is less than its value to you, you buy the ticket, and the difference is your "consumer surplus". Now Farecast comes along and says, in effect "sure, you could do that, but the price is going to drop --- hold on and you'll do even better." So you buy when the price hits its trough, and are better thereby. (Yes, some people act like this now, I'd guess not many, and not very successfully.) From the airlines' point of view, however, every dollar by which your consumer surplus grows is a dollar they could have had. ("And I would have gotten away with it, too, if it hadn't been for you meddling KDDs!") Consumers and airlines are engaged in a zero-sum competition over the potential surplus, and this doesn't help the airlines.

Which, again, makes me very puzzled about why they would cooperate with it. The smart move on their side, I think, would be to systematically undermine the reliability of Farecast. This could be done very simply, without even attempting to reverse-engineer the predictor: monitor its forecasts of your own flights, and, all else being equal, do the opposite. It's true that a reliable forecast of a price increase isn't so bad, for the airlines, as a forecast of a price decrease, but systematically jamming and confusing Farecast should be easier than selectively doing so. But I defer to real economists about the importance of this wrinkle.

The larger moral ought to be a familiar one: in strategic interactions, you have to assume that the other side will adapt to you. This doesn't mean that statistical methods have no place in studying strategic interaction (see, e.g., the second paper here), but it does mean we should be very dubious about the ability of simple data mining to give us an advantage over an opponent as smart and determined as a commercial airline.

(Thanks to K. for sending me the article, and discussing it.)

1: If you want to understand the logic of airline pricing, among much else, a great read is Carl Shapiro and Hal Varian's Information Rules: A Strategic Guide to the Network Economy. [I have been sitting on a draft review for seven years now, and am not about to stop.] Despite the very late-1990s title, this is really about the general economic principles involved in any industry where high first-unit costs and low marginal costs give you positive economies of scale, or where there are strong positive network externalities. Airlines are in the first category, because the cost of getting a jet from New York to LA with 100 passengers is almost the same as getting it there with 101 passengers, and is mostly the cost of getting it there empty. Alas, appreciating the rational essence of the process does not help make the lived experience any more endurable.

Enigmas of Chance; The Dismal Science

Posted by crshalizi at July 06, 2006 22:00 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems