An AA jet on the runway

“First class causes air rage” paper contains serious data flaws


A research paper by Katherine DeCelles and Michael Norton, published by Proceedings of the National Academy of Sciences, has taken the media by storm, with the takeaway — initially endorsed on DeCelles’ Twitter account — that first class cabins cause air rage. Yet Runway Girl Network data analysis and information from DeCelles in interviews show that the data contain serious flaws both in quality and in they way they are presented, raising significant questions about the expertise, methodology, oversight and review of this paper.

The press this paper has received has been unsurprisingly breathless. “Air rage? Blame the first-class cabin,” says Science Magazine, leading with an air rage incident that happened on an all-economy AirAsia A320. “How to start a mid-air fight: Put a first class cabin on the plane,” says Mashable, captioning an A380 cabin with Emirates IFE clearly shown as ‘Qatar Airways’, and an Airbus concept cabin as Lufthansa first class. “First Class Cabins Set Off ‘Air Rage’, Study Finds” says Time. All three articles were retweeted by lead author DeCelles, who has markedly changed the tone of her response to media coverage since Runway Girl Network started asking questions on 5 May.

Runway Girl Network spoke to and emailed with DeCelles, who noted that the “predicts” in the title of the article has a specific meaning in statistical analysis. “It means that it is significantly predictive of variance in the dependent variable (air rage) in a statistical model — it is not causal, like ‘causes, triggers,’ etc. It simply means it is signify [significantly -ed] associated with it controlling for all the other factors we had data on and could include as covariates (controls).”

Yet before 5 May, as mentioned, DeCelles was seen retweeting and commenting positively on Twitter about general media articles that have taken the article to be determining causality. This raises new concerns about the extent to which academic articles, aimed at provoking “a new study says…” responses from the media, are written, researched and targeted responsibly.

This image, prominent on PNAS' website, is concerning — should media coverage be the aim of academic papers? Image: PNAS

This image, prominent on PNAS’ website, is concerning — should media coverage be the aim of academic papers? Image: PNAS

In terms of the other factors DeCelles mentions as potentially contributory to air rage, the immediate question for many in the commercial aviation industry was to ask to see the data, or at least understand whether the data are of adequate quality and have been analysed by those with airline knowledge.

kontron newestBoth DeCelles and Norton work at management and business schools, and neither DeCelles nor Norton’s professional websites, nor their Google Scholar pages, nor any other source RGN could find, list a single publication regarding aviation or travel. RGN asked DeCelles for information about the commercial aviation industry’s input into the methodology. “I’ve been in extensive conversations with airline and security executives and ground personnel for years,” is all DeCelles said. The journal PNAS confirmed to RGN that no aviation expertise was used to review the article.

DeCelles refused to provide information about the identity of the airline used in the study, citing a confidentiality agreement, but RGN also learned from DeCelles that — in a key failing that makes many of the statements in the article factually incorrect — the study did not just look at the flights of one airline, but multiple airlines.

RGN asked DeCelles whether the data included codeshare flights that were operated by flights other than subsidiary and contract airlines. “Yes although we do not know which ones were code shared versus not so we cannot look at the effect in the dataset,” DeCelles replied. DeCelles did not reply to followup questions from RGN about this highly problematic codeshare data.

Codeshare flights are simple to discard when using OAG data; without discarding them, the data are invalid

This journalist spent approximately eighteen months working on a daily basis with proprietary flight cabin and amenity data that was matched on a flight by flight basis to OAG’s scheduling data, and is extensively familiar with the complexities of matching cabins to filed airline three-character aircraft codes (73G, 76W, and so on). This is a complex matter but the skills to do so are not unique.

Part of the OAG dataset includes indicators of whether or not a flight (say, coded as AA1234) is operated by the marketing airline (in this case American Airlines) or a codeshare partner.

Not to discard codeshare data means that the study is not comparing data from a single airline, and the study has not controlled for the existence of multiple airlines in its data. This is a colossal flaw that should have been picked up within minutes in an adequate, professional peer review.

This is a fundamental problem for the dataset, and calls every conclusion of the study into serious question. It also calls into question the amount of subject matter expert contribution, and the level of review that the paper was given by DeCelles and Norton, and by the PNAS itself. The PNAS editorial board did not answer RGN’s questions on this matter.

The authors describe their data set in ways that do not fit any North American airline, despite the flight profile necessarily matching a North American carrier

Putting aside for the moment the serious problems with codeshare data being included, the article and its supplementary information state that the data are from “circa 2010”, and are from a single large international airline, which operates both one, two and three class flights.

The study says the dataset contains flights on routes that always include North America (to/from Western Europe, North Asia, Caribbean, Central and South America, Middle East, Southwest Pacific), with the exception of some Caribbean-to-Caribbean and Lower South America to Lower South America flights.

This is a very limited subset of international carriers, with only American and United in that list within North America. Spokespeople from both United and American categorically denied to RGN that their data was provided for this study.

The study states that the airline has an “economy plus” extra-legroom economy product, and it has one longhaul aircraft type that operates in both a two-class and a three-class configuration. Just 0.03 percent of flights on this airline, DeCelles told RGN, are on three-class aircraft. The mean flight distance was 1445 miles, while 95% of flights ranged between 2700 and “a couple hundred” miles. The dataset contained “∼150–300 unique arrival and departure airports, and between 500 and 1,000 unique flight routes”.

The only airlines that this route and cabin profile would reasonably match were American and United, but United’s network would likely have included a significant amount of intra-Asia flying and intra-Pacific flying.

Only United and American offered both three-class (first, business, economy) service in the “circa 2010” time period. Only United offered extra-legroom economy products in the “circa 2010” time period, since American introduced Main Cabin Extra in 2012, although RGN is not ruling American out because DeCelles and Norton may have stated “circa 2010” to disguise the airline’s identity.

There are many reasons why passenger experience on a large mainline jet might differ from a tiny regional 50-seater. Image: AA

There are many reasons why passenger experience on a large mainline jet might differ from a tiny regional 50-seater. Image: AA

Only American has had just one longhaul aircraft type operating in both two- and three-class configuration, the Boeing 777-200ER, but that has only been the case in recent years since a refit program started in 2014.

United has had both the 777-200/ER and 767-300ER in this category since the Continental merger. Of the two merged airlines that currently form United, only United had three-class service in the “circa 2010” time period, and United has denied its data were used.

The dataset as described by the authors cannot match any single US airline, and the two closest matches specifically deny involvement.

Even if the dataset did not include serious codeshare errors, the analysis has not excluded numerous potential factors

Let us for a moment conjecture that the codeshare information in the dataset is not utterly damaging to its validity, taking the example of American Airlines as an airline that might roughly match the profile of the airline data.

There are significant passenger experience differences, including many that would likely affect air rage, between the aircraft an airline like American operates in all-economy and multi-cabin configurations.

According to American’s website, the only all-economy aircraft operated for American Airlines are the Bombardier CRJ200 and DHC300, and the Embraer ERJ140 and ERJ145. (United’s also include the DHC200, and several similar aircraft have been included in both United and American’s fleet in the last 5-10 years.)

As a proportion of American’s fleet, and as a representative portion in comparison with every other flight that American and similarly sized carriers operate, this is a very specific part of its operations — and, importantly, not actually operated by American Airlines itself but by subsidiary and contract carriers under the American Eagle brand.

There are very real passenger experience differences between the smallest of regional jets and turboprops operated in the United States on ultra-short hops by American Eagle subsidiaries (the 15-minute hop between Columbus and Atlanta in Georgia, say), and some of the largest widebodies in the sky flown by mainline American on ultra-longhaul flights.

DeCelles insisted that several other factors — flight length, delay, seat pitch, seat width, cabin area, whether it was an international flight, number of seats on the aircraft — were taken into account and not significant. DeCelles would not explain how the research had taken them into account despite RGN providing her with multiple opportunities, stating only that they had been.

It is concerning if an author carrying out an academic study regarding an industry cannot explain the basics of their methodology to an informed journalist with extensive experience in that industry and a professional background in analysing precisely the kind of dataset that the study involves.

Given DeCelles’ extensive retweeting of general media articles of the work, it is hard to draw any other conclusion than that the authors desire media exposure for their work, but have not engaged adequately with providing non-statisticians with a correct understanding of terms like “predict”, which have specific meaning. DeCelles may have had “fun being interviewed by CNN”, as she tweeted, but with media exposure comes responsibility.

Pitch and width dismissed by the study, which used questionable sources for this data

DeCelles also told RGN that the pitch and width data came “largely from the airline/from the OAG data company which gets its information from the airline”. To the best of RGN’s knowledge, to the knowledge of this journalist who previously worked extensively with OAG data, and to the knowledge of an OAG sales associate with whom RGN spoke, OAG does not provide pitch and width data.

It is also unclear how — or even if — whichever airline provided DeCelles with the pitch and width data also provided the pitch and width data for its codeshare partners.

In the article’s supplementary information, DeCelles and Norton admitted using SeatGuru data, which the industry and knowledgeable travellers consider notoriously inaccurate, particularly on width but also on pitch.

In the study’s methodology, the authors explain that they matched aircraft type to OAG scheduling data. This requires a level of industry knowledge, since data filed to OAG by airlines does not always include information about subfleets, which can vary substantially within an aircraft type. Neither the article nor RGN’s interviews with DeCelles have shown that level of industry knowledge. Without confirmation of how many flights went unmatched, it may well be that significant data have not been included.

The study’s conclusions suggest a lack of industry understanding has affected its findings

At the end of the day, RGN’s concern is that academic data must be accurate and well-researched, and that academic studies reported in the media are represented in a way that encourages accurate reporting.

RGN asked DeCelles to explain, specifically, how these data suggest that air rage is linked to the presence of first class, rather than any of the other factors, flight length in particular.

DeCelles quoted the thesis of the piece rather than explaining how the data support that thesis: “We have a theory about how — drawing on decades of social science research that shows that inequality, such as when others have more income, resources, react with negative emotion and can be associated with violent crime and aggression. We do not have a measure of this ‘mechanism’ in the dataset, such as feelings of inequality of the passengers on board,” DeCelles said. “The data say that air rage is nearly 4x more common in coach on planes that have versus do not have a first class cabin, holding constant other factors like delays, departure and arrival areas, length of the flight in miles, etc.”

DeCelles admits failing to control for the most important factor: the airline concerned. Moreover, DeCelles has still not explained how flight length — clearly the most obvious difference between the tiniest regional jets or turboprops and mainline aircraft — has been held constant, despite RGN asking for such an explanation numerous times.

Further, RGN was, within ten minutes of starting to consider the factors that lead to air rage, able to come up with numerous reasons not controlled for in the report why a regional airline CRJ-200 or ERJ-145 might be less conducive to air rage to a mainline Boeing 737 or ten-abreast 777: no middle seats. Flights from smaller, less chaotic airports. A smaller length of each cabin segment. Faster, smoother boarding. Shorter lines. The presence or absence of complimentary and for-purchase alcohol. Crew more present in the cabin. A higher crew to economy passenger ratio. Rapport during the manual safety demonstration. Feeling closer to the flight deck door, the seat of authority. A sense that crew have more of an eye on them. Inflight entertainment. Connectivity.

Small regional jets (like this refitted CRJ-700) have no middle seats and sometimes only one seat on one side of the aircraft. Image: AA

Small regional jets (like this refitted CRJ-700) have no middle seats and sometimes only one seat on one side of the aircraft. Image: AA

That these factors and others have seemingly not been considered when it would be fairly simple for someone familiar with passenger experience to create controls for them is a matter of concern.

RGN contacted PNAS, the journal in which this article was published, outlining our concerns around the quality of the data, the expertise of its processing and the engagement of aviation industry subject matter expertise. The journal forwarded all questions to the editor of the article, Dr Susan Fiske, a professor of psychology and public affairs, who replied only: “We treated this manuscript as a piece of social science research. As such, it was reviewed by experts in aggressive inclinations in face-to-face settings, the psychology of inequality, egocentrism/narcissism, and the socially situated self-concept.”

RGN asked specific questions about PNAS’ position regarding DeCelles’ and Norton’s failure to acknowledge and control for flights operated by different airlines within the data, the resulting impossibility of drawing many conclusions from these data, and the inappropriateness of making statements of salience and prediction from them. PNAS did not respond.

A further matter of concern is a seeming lack of understanding either of the commercial aviation industry or of trends within the airline cabin. “Class-based seating is both more prevalent and more unequal in recent years, with first class cabins claiming an increasingly large share of total space,” claims the study, citing only in support the Elizabeth Popp Berman blog post that has been covered in RGN previously, and which does not indeed claim that first class cabins claim an increasingly large share of total space.

As industry observers know, the trends of both premium economy and extra-legroom economy have reduced the size of first and business class cabins. In particular, most current-generation US cabins replacing older offerings (either via replacement or refit) have fewer, not more, first class seats on board.

Articles like DeCelles’ and Norton’s, and the media narrative that they both engender and are clearly written and publicised to spur, contribute unhelpfully to the ongoing discussion of air rage. There are real safety questions — which remain unstudied — around whether shrinking seat width and pitch are contributors to disruptive passengers. DeCelles and Norton dismissing this as a factor on the basis of faulty data is irresponsible.

It is concerning to see academic research relating to aviation produced on the basis of data that does not stand up to even a brief reading and thirty minute telephone interview by an aviation journalist. It is even more frustrating when the authors of that research bask in the spotlight of inaccurate pieces by non-expert media keen on eyeballs from negative stories about the airline industry, without correcting the record.

DeCelles stopped responding to RGN’s questions after we raised the more serious methodological flaws regarding codeshares, the source of the study’s pitch and width data, other contributory factors, and further questions.

DeCelles has also blocked this RGN journalist on Twitter.