Tackling the Replication Crisis

By: Felipe Flores ‘19

We are in the midst of what has been dubbed the “replication crisis” of science. Recent retrospective analyses reveal the results of several important experiments are inconclusive. We expect research results to be consistent. For this to occur, they must be unbiased and unaffected by conflicts of interest, as well as timeless because others will build upon that knowledge for their own pursuit of truth. Then why do so many experiments fail the test of replication? Scientometrics, the analysis of science itself, reveals an exponential growth in the amount of data we produce, along with outside pressures and internal methodological flaws that have built up over time to culminate in this crisis [1]. This doesn’t imply science is wrong; conversely, it is the way experiments are performed and evaluated.

Since science is collective and cooperative, building on progress made by scientists in the past, individuals are not to blame. Instead of blaming those who have published irreplicable studies, or even worse–those who have committed academic fraud, we should look at the deeper flaws within our system. It is mostly people unaware of their methodological flaws that drives them to commit these mistakes.

The problem first begins when scientists must face system biases. Most published research is performed in the competitive environment of an academic setting. Publishing often in high-impact journals seems fundamental to advance one’s career; this pressure can lead to less rigorous, less reliable research. Scientists are often trapped in a dilemma: perform extra experiments to improve the statistical reliability of their results, or rush to publish. In academic settings, pressure to publish is a real, tangible phenomenon encountered by undergraduates and tenured professors alike. In addition, the financial support of grants and fellowships adds extra outside pressure While scientists are working hard to increase scientific knowledge and maintain their careers, the system is providing the most harm and is the one that should change [2].

Even undergraduate students encounter reproducibility issues, like statistic courses teaching the “cutoff ” for statistical significance as p ≤ 0.05. The reason for this number is merely historical; nothing specific about .05 makes it the standard cutoff other than historical usage. This number, and p-values in general, only became highly popular after Ronald Fisher determined them relevant in a work published in 1925. Nevertheless, Fisher is not to blame; he was trying to find a value that was simple, useful, and powerfully connected to mathematics as a way of improving research in general; and he succeeded. The framework of hypothesis testing and statistical significance remains fundamental to draw conclusions from experiment, but never did he anticipate the current, ongoing crisis. “We teach it because it’s what we do; we do it because it’s what we teach,” is the present issue [3]. Since incomplete approaches to statistical significance are still continuously taught in college and graduate school, this issue is not going away soon.

We have to restructure the way we analyze data. A change of mentality and a change in education will, in due time, correct the misuse of hypothesis testing. Statistical significance should be specific to the field, to the experiment’s methods and more importantly, to the discretion of the scientific community in the context of the study. Physicists, for example, use a 5-sigma confidence interval (a diminute p-value of around 3x 10^-7) because particle physics examines the building blocks of nature, and there is no chance of randomness. Other fields, like medicine can still provide useful insight at a lower level of statistical significance, like whether a new drug is effective against cancer or infectious diseases. By applying these solutions, scientific journals will be more selective, and scientists will take the time to improve experiment methods and perform them repeatedly. By holding research to a higher standard, science will escape the current replicability crisis and prevent a future one. This is not the time to criticize science, but the time to learn from past mistakes. This is not the time to give up on research, cut funding, or act in disbelief of the scientific community, but rather the time to recognize the community’s effort to make better, more rigorous, more truthful discoveries.

Felipe Flores ‘19 is a freshman in Hollis Hall.


[1] Begley & Ioannidis. Reproducibility in Science: Improving the Standard for Basic and Preclinical Research. Circ Res., 2015; p. 116- 126

[2] Chin, Jason M. Psychological Science’s Replicability Crisis and What it Means for Science in the Courtroom. Psychology, Public Policy, and Law, 2014; Vol. 20; p 225-238

[3] Wasserstein & Lazar. The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 2016; Vol 70; p. 129- 133

Categories: fall 2016

Tagged as: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s