Null results are 1.4-1.6 times underrepresented in social science literature
Three days ago, Annie Franco, Neil Malhotra, and Gabor Simonovits of Stanford published a paper in Science that discusses an issue that's been often covered on this blog:
TIME, SciAm, and others.
They looked at some NSF-funded body of social science research – TESS surveys among the U.S. citizens – where one knows how much research was actually performed, how much was written down, and how much was published.
Their calculations imply that papers with "strong claims" are 1.6 times more likely to be written than papers about surveys that confirm the null hypothesis, and they are 1.4 times more likely to be published. Consequently, the composition of the literature isn't a good indicator of the probability that particular theories are correct or particular claims are true.
The authors pretty much recommend the null results to be published as often as the new discoveries. Mr and Ms social scientists, tear down the wall hiding your file drawers.
This publication bias (or, almost equivalently, "file drawer problem") has been discussed on this blog many times. Most recently, the whole controversy about the Sleeping Beauty Problem was closely related to this bias. too. To say the least, a close cousin of the publication bias is the frequentist-probability-based explanation why the "halfers" are right. (I've described some mistakes of the "thirders" from a Bayesian viewpoint as well but those are not directly linked to this article.) More precisely, I would use the term "sampling bias" for the problem of the wrong answer of the "thirders".
Recall that the Sleeping Beauty in the problem named after her is asked about the state of the coin two times more often (or more likely) if the coin shows "heads" than when it shows "tails", and the "thirders" incorrectly conclude that this rule implies that the probability that the coin ended up "heads" actually increases from 1/2 to 2/3.
Of course, it doesn't. If she were checking the actual state of the coin after she is asked, the observations of the "heads" coin would be overrepresented by a factor of two relatively to the observations of "tails". This increase by a factor of two is an example of "sampling bias" and if she wants to interpret these observations as information about the coin itself, she must correct her results for this sampling bias. It is an indisputable mistake not to apply this correction and to directly interpret the frequency of the biased observations as probabilities of a property of the coin. If we were directly asking about the relative ratio of some observations by the Sleeping Beauty, those could be 2/3 vs 1/3 but this relative ratio of her acts cannot be directly interpreted as a probability that a purely external object (the coin) has a certain property. The "thirders" are just wrong and the "third way" defenders who try to make the situation (and the definition of probabilities) ambiguous and who say that the answer can go "both ways" are in between being right and being wrong – and they are even more wrong than the "thirders" when it comes to the question whether the concept of probability is well-defined.
In a similar way, the Friday research in science shows that it is significantly less likely for null results to get published. (I am totally convinced that these TESS surveys had to be a special innocent segment of the social sciences. My estimate is that the "strong" or, more precisely, "politically correct" results of surveys are about 30 times and not 1.4 times more likely to be published than the "null results". Well, a huge chunk of that research is downright fraudulent and we would have to carefully decide whether we count fraudulent papers as being "strong" or "null".)
It means that one can't interpret the relative fractions of papers claiming that "some dramatic pattern exists in the society" and papers claiming that "it doesn't" as probabilities that the propositions are right.
A long-term fix is to publish the null, less interesting results with the same vigor and determination as the strong results. I find this recommendation unrealistic, however. Uninteresting papers that conclude that "no interesting correlation exists" or "nothing interesting is going on" are less likely to be published for a simple reason: people want to read things that are interesting and avoid wasting time with things that are not! You can't force everyone to spend most of their time by reading uninteresting stuff. The folks have completely rational reasons to focus on the interesting things. For example, the life is short!
So instead of publishing all null results of experiments whose "positive" results would otherwise be published, I would recommend everyone to refine his or her idea about the coefficient quantifying the publication bias – i.e. the numbers saying how many times more likely it is for a "positive" result of a certain type to get published than a negative result.
For example, it is about 200 times more likely for a paper claiming some "climate problem" or the mankind's role in a problem to get published than a similar paper whose conclusion is the opposite one – null. This simply means that when you "sociologically" estimate the likelihood of the existence of such a problem by surveying the literature (and let's ignore an equally important fact that this "sociological" method is a very lousy one even after the fixes), you must assign the alarmist papers about 200 times lower weight than the papers with the null results. One doesn't actually have to print millions of papers with null results. It's enough to know that they would get printed if the bias were non-existent.
In hard sciences, like particle physics, the bias is ideally non-existent. Both ATLAS and the CMS are publishing the results of all their sufficiently completed searches. Of course that the discoveries (or one discovery so far, the Higgs boson) get much more attention. But the null results are actually not being hidden in the file drawers. If the high-energy physics experimenters suffer from some bias, it is conservativeness or reticence. Papers that would imply an extraordinary claim – a discovery, especially of some deviation from the Standard Model – are being delayed and postponed, to say the least. People spend some extra time by looking for mistakes.
Again, this is a bias, too. But it is inevitable just like the "people prefer interesting reading" bias. I don't think that people should always spend exactly the same time by looking for mistakes in a measurement or calculation that seems to agree with the expectation based on state-of-the-art theories and other experimenters. If many things agree too well, chances are that it is probably because the research was done correctly. No need to spend a year by looking for mistakes. On the other hand, if your neutrinos seem to be faster than light, which is almost certainly impossible, you may determine that there is probably some error in your experiment and you may want to search for it (check the optical cables, too).
The amount of time one spends by looking for mistakes etc. may be correlated with the risk that there is a mistake. Extraordinary claims deserve extraordinary evidence.
But one should never forget that this fashionable rule is a source of a bias, too. And people should be aware of all the biases, try to quantify them – in many cases, as accurately as they can – and they should improve their estimates about objective quantities by correcting their results for the biases! In the case of the Sleeping Beauty Problem, the fix is easy. The "heads" are 2 times overrepresented, so the number of observations of (or thinking about) "heads" has to be divided by two. In other cases, the right mathematics needed to obtain the statistical correction canceling the bias may be more complex or subtle. But if you're aware of a bias that implies that the number of observations of a certain result can't be directly interpreted as a property of the observed object, you should never ignore the bias!