Opinion COVID-19

Caveat emptor: preprint servers in biomedical science

While the advantages of preprint servers are numerous, researchers need to be very clear about the fact that these findings have not been formally assessed by the scientific community.

David Kent

December 02, 2020

Posted in

The Black Hole

0 Comments

I need to start this article with a disclaimer – I am a major supporter of open science and data sharing. Our longer term readers will know this from previous articles on PreReview and journal clubs on articles from preprint manuscripts. Our lab is also a participant in the European Research Council’s Open Research Data pilot program where we are committed to making our data and technologies as open as possible and as soon as possible. Preprint servers are one such tool to share data in an open and rapid way. The advantages are numerous, including increased citations, broader pre-publication feedback, the ability to stake a claim to a finding and increased exposure for a piece of work. Just as with any other transaction (informational or otherwise) though, the perfect product rarely exists, and there are troubling aspects that will accompany the wide-scale usage of preprint servers in biomedical science.

When practice fails to live up to the ideal

You might ask what could possibly be wrong with scientists sharing their data faster and more openly. You would be part of a consistently growing community of open science advocates. Preprint servers (e.g., arXiv, bioRxiv, and medRxiv) are the place where scientists can upload their unpublished manuscripts to be viewed openly by readers across the globe. They have been in existence for many decades, especially in the physical sciences. Near instantaneous feedback is possible and precedes the sometimes long and painful process of formal peer review at an academic journal. This peer review process, however, is an important component of this scientific process and splashing out findings prior to rigorous assessment by other scientists can have its drawbacks.

A prescient piece from Tom Sheldon in 2018 suggested that we should be careful about the potential for misinformation on preprint servers being picked up in particular by mainstream media and at the time said:

“I’ll admit that we do not yet have examples of harm from such stories, but this is probably because — at the moment — only a tiny fraction of preprints cover health-related or controversial fields.”

Chew on those words for a while and then read this – an opinion piece on preprint manuscripts on COVID-19 that suggests the very fabric of scientific integrity is at risk. When the healthcare community is desperate for information and steady streams (>10,000 articles at last count) of information hit preprint servers with potential or proposed solutions, it is not surprising that we challenged the credibility of preprint servers, with some serious consequences as noted in this commentary piece in the Lancet Global Health:

“despite the advantages of speedy information delivery, the lack of peer review can also translate into issues of credibility and misinformation, both intentional and unintentional. This particular drawback has been highlighted during the ongoing outbreak, especially after the high-profile withdrawal of a virology study from the preprint server bioRxiv, which erroneously claimed that COVID-19 contained HIV “insertions”.

In this particular case, the scientific community responded by countering the claim. Numerous articles have since been published to clarify the confusion and set the record straight, but the seed that COVID-19 might have been “engineered” to infect humans had been planted and almost certainly fed into the broader political discussion. Other instances of preprint papers informing policy decisions and making media headlines are plentiful with some showing the great promise of preprints (speed of information in times of crisis) and others showing the perils (confused policy discussions based on unverified claims).

While scientists in the exact areas of expertise will be able to discern fact from fiction and scientists of more general expertise able to smell a particularly bogus conclusion, journalists need to be very careful. The pressure to be first to break a story may trump proper research and homework. Currently, press embargoes at scientific journals deal with this in a coordinated way – journalists are given a number of days to get their facts and their stories straight prior to publishing. Papers on preprints just “appear” and then the race to find and write about it is on…. COVID-19 taught us very quickly how dangerous this can be. There is an excellent perspective from Fiona Fox on the benefits of embargoes where she asserts:

“Surely we have to entertain the possibility of many more new scientific claims being reported in the news media before they have been subject to any peer review, without the fact-checking time provided by an embargo, without a measured and cautious press release from the journal or university, without the benefit of the third party comments gathered by the [Science Media Centre]. Does any of this worry you too?”

Some preprint servers have mounted a counter movement to the inundation of COVID-19 preprint submissions by setting up additional checks prior to posting papers (e.g., bioRxiv avoids publishing solely computational papers related to COVID-19). bioRxiv has gone one step further as well to try and address the specific issue of COVID-19 papers with the following statement at the top of each article page “these are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information.” The reality, however, is that a first pass triage and a warning statement cannot possibly capture all of the things that might be dangerous to have publicly available and it would actually go against the point of preprint servers if there were too many checks and balances or a streamlining of “mainstream” ideas.

The future of preprints: knowing your audience

Despite these concerns, preprints must be viewed as an incredibly valuable way of communicating new scientific findings in a rapid manner. They are also not hidden behind any sort of protective paywall, making the most recent science accessible to everyone. There are clear benefits for the international scientific community that should outweigh the fear of having ideas or data pinched by others and may even outweigh the negative implications of an occasional rogue idea getting unjustified publicity.

That said, as a research community that populates preprint servers with content, we need to be responsible for how we share and promote this content. One quick example is that scientists now commonly promote their work on social media. While this is fantastic for sharing amongst other scientists, it can streamline information from good publicists and can also lead to premature (or incorrect) findings wriggling their way through to other sectors of society. If approached for comment on their preprint works, researchers should be very cautious and clear about the fact that these findings have not been formally assessed by the scientific community. With preprint servers coming into their own as a dominant primary mode of communicating scientific findings in the biological and medical sciences, researchers need to take note and engage so that we ensure that the correct audiences are reached and preprint servers do not turn into a “bad deal” for science.