You may have heard about the recent paper “Physical and situational inequality on airplanes
predicts air rage”. It claims that the existence of first-class seating on a plane causes air rage, especially if economy-class passengers have to walk past it on the way to their seats.
The paper received quite a lot of uncritical publicity from the likes of Science, who really ought to know better. I was looking forward to some thorough debunking, and clicked through to articles by Andrew Gelman and Runway Girl Network with great anticipation. But they were a bit of a let-down.
Both articles correctly pointed out that the paper merely showed correlation, not causation, and that all the speculation about inequality was just that – speculation. The RGN article also provided some useful background on the aviation industry, deducing that the airline in question was probably a US network carrier, whose only all-economy planes are short-haul regional jets (which turns out to be important!) They also made Science look pretty silly by pointing out that the air rage example they provided was on an all-economy Air Asia flight.
But otherwise the complaints were fairly petty. Andrew Gelman mostly criticized the over-precision of the quoted coefficients in the paper, while RGN was disturbingly ad-hominem and got a bit hysterical about the paper’s failure to account for code-share flights. Sure, the paper should have acknowledged the existence of code-share flights in their data, but RGN couldn’t show how they would bias the results.
Now, my employer mostly flies me business class, so I’ll be damned if I’ll let this threat to my privilege go unchallenged. So here are the problems with the paper’s methodology.
The authors of the paper had access to flight data containing variables such as number of seats, flight distance, seat pitch, first-class configuration, delays, etc, as well as the number of air rage incidents that occurred.
They plugged all this data into a linear regression model, which generated coefficients for each of the variables in a way that best matched the number of air rage incidents. Coefficients that are large and positive are assumed to be statistically significant, and that was the case with the variable relating to first-class.
Now, linear regression is a wonderful tool – I’ve used it myself many times – but if you read the fine print you’ll find that it’s only valid if the variables involved are likely to be linear. And I suspect that the most important variable for predicting air rage – namely, flight duration – is non-linear.
Another caveat with linear regression is that you need to pick variables that are independent. If two variables are highly correlated, the allocation of coefficients between the two of them can be somewhat arbitrary, which is a problem if one of the variables is highly predictive.
I can’t know for certain without seeing the (confidential) raw data, but I’d bet money that the relationship between flight distance and air rage incidents is upward-curving, maybe parabolic. In other words, people get angrier the longer they’ve been flying.
When you use linear regression to predict a parabolic variable you get a line-of-best-fit that under-estimates incidents on long flights. Linear regression tries to correct for this by generating spurious coefficients on variables that correlate with long flights.
And that’s probably what happened here. Because the airline in question only ran all-economy configurations on short-haul, the first-class variable becomes a proxy for long flights, and was assigned a large coefficient to correct for the non-linearity of air rage incidents vs flight distance. Thus it was assumed to be a significant predictor of air rage.
It may not be a significant factor in this study, but it’s also problematic that the first-class variable probably correlates strongly with flight distance in the data set they used. That makes it difficult to disentangle the two effects.
There are two other problems with the paper’s methodology. They may not bias the result, but really should be addressed.
- Instead of using flight distance as a variable, they should have used flight duration. It’s time in the air that affects air rage, not distance traveled, and flight duration information is readily available.
- The variable being predicted was incidents per flight. Now, the authors tried to correct for that by providing number of seats as a variable, but that’s unreliable and could lead to a bias against larger aircraft. A better technique would be to divide the number of incidents by the number of seats, and predict the incidents per seat. Even better would be incidents per passenger, if passenger numbers are available, since that would correct for varying load factors on different routes.
There are a few ways the authors could improve their analysis. For a start, they should use linear regression to predict incidents per passenger-hour instead of incidents per flight, while keeping flight duration as a variable. This will correct for the non-linearity of incidents per hour if it’s merely quadratic.
However, if it’s true that all the economy flights are on small short-range aircraft, then you still need to extrapolate a long way out of a tight cluster to compare them with the big long-range aircraft. That’s statistically dubious, and the only way forward may be to discard all flights longer than, say, 90 minutes.
Best of all would be to include a bunch of long-haul all-economy flights – such as those flown by Air Asia and Ryanair – into the analysis and see what coefficients are generated. Given the feral nature of those flights, I suspect the first-class effect would vanish, or even become negative.