A new paper, from a large team of researchers headed by Martin Schweinsberg, a psychologist at the European School of Management and Technology, in Berlin, helps shed some light on why. Dr Schweinsberg gathered 49 different researchers by advertising his project on social media. Each was handed a copy of a dataset consisting of 3.9m words of text from nearly 8,000 comments made on Edge.org, an online forum for chatty intellectuals.
Dr Schweinsberg asked his guinea pigs to explore two seemingly straightforward hypotheses. The first was that a woman’s tendency to participate would rise as the number of other women in a conversation increased. The second was that high-status participants would talk more than their low-status counterparts. Crucially, the researchers were asked to describe their analysis in detail by posting their methods and workflows to a website called DataExplained. That allowed Dr Schweinsberg to see exactly what they were up to.
In the end, 37 analyses were deemed sufficiently detailed to include. As it turned out, no two analysts employed exactly the same methods, and none got the same results. Some 29% of analysts reported that high-status participants were more likely to contribute. But 21% reported the opposite. (The remainder found no significant difference.) Things were less finely balanced with the first hypothesis, with 64% reporting that women do indeed participate more, if plenty of other women are present. But 21% concluded that the opposite was true.
The problem was not that any of the analyses were “wrong” in any objective sense. The differences arose because researchers chose different definitions of what they were studying, and applied different techniques. When it came to defining how much women spoke, for instance, some analysts plumped for the number of words in each woman’s comment. Others chose the number of characters. Still others defined it by the number of conversations that a woman participated in, irrespective of how much she actually said.
Academic status, meanwhile, was defined variously by job title, the number of citations a researcher had accrued, or their “h-index”, a number beloved by university managers which attempts to combine citation counts with the importance of the journals those citations appear in. The statistical techniques chosen also had an impact, though less than the choice of definitions. Some researchers chose linear-regression analysis; others went for logistic regression or a Kendall correlation.
Truth, in other words, can be a slippery customer, even for simple-sounding questions. What to do? One conclusion is that experimental design is critically important. Dr Schweinsberg hopes that platforms such as DataExplained can help solve the problem as well as revealing it, by allowing scientists to specify exactly how they chose to perform their analysis, allowing those decisions to be reviewed by others. It is probably not practical, he concedes, to check and re-check every result. But if many different analytical approaches point in the same direction, then scientists can be confident that their conclusion is the right one. ■
This article appeared in the Science & technology section of the print edition under the headline "Methods and madness"
Reuse this contentThe Trust Project