Altered datasets raise more questions about reliability of key studies on coronavirus origins

Print Email Share Tweet

Revisions to genomic datasets associated with four key studies on coronavirus origins add further questions about the reliability of these studies, which provide foundational support for the hypothesis that SARS-CoV-2 originated in wildlife. The studies, Peng Zhou et al., Hong Zhou et al., Lam et al., and Xiao et al., discovered SARS-CoV-2-related coronaviruses in horseshoe bats and Malayan pangolins.

The studies’ authors deposited DNA sequence data called sequence reads, which they used to assemble bat- and pangolin-coronavirus genomes, in the National Center for Biotechnology Information (NCBI) sequence read archive (SRA). NCBI established the public database to assist independent verification of genomic analyses based on high-throughput sequencing technologies.

U.S. Right to Know obtained documents by a public records request that show revisions to these studies’ SRA data months after they were published. These revisions are odd because they occurred after publication, and without any rationale, explanation or validation.

For example, Peng Zhou et al. and Lam et al. updated their SRA data on the same two dates. The documents don’t explain why they altered their data, only that some changes were made. Xiao et al. made numerous changes to their SRA data, including the deletion of two datasets on March 10, the addition of a new dataset on June 19, a November 8 replacement of data first released on October 30, and a further data change on November 13 — two days after Nature added an Editor’s “note of concern” about the study. Hong Zhou et al. have yet to share the full SRA dataset that would enable independent verification. While journals like Nature require authors to make all data “promptly available” at the time of publication, SRA data can be released after publication; but it is unusual to make such changes months after publication.

These unusual alterations of SRA data do not automatically make the four studies and their associated datasets unreliable. However, the delays, gaps and changes in SRA data have hampered independent assembly and verification of the published genome sequences, and add to questions and concerns about the validity of the four studies, such as:

  1. What were the exact post-publication revisions to the SRA data? Why were they made? How did they affect the associated genomic analyses and results?
  2. Were these SRA revisions independently validated? If so, how? The NCBI’s only validation criterion for publishing an SRA BioProject– beyond basic information such as “organism name”– is that it cannot be a duplicate.

For more information: 

The National Center for Biotechnology Information (NCBI) documents can be found here: NCBI emails (63 pages)

U.S. Right to Know is posting documents from our public records requests for our biohazards investigation. See: FOI documents on origins of SARS-CoV-2, hazards of gain-of-function research and biosafety labs.

Background page on U.S. Right to Know’s investigation into the origins of SARS-CoV-2.

No peer review for addendum to prominent coronavirus origins study?

Print Email Share Tweet

The journal Nature did not assess the reliability of important claims made in a November 17 addendum to a study on the bat-origins of the novel coronavirus SARS-CoV-2, correspondence with Nature staff suggests.

On February 3, 2020, Wuhan Institute of Virology scientists reported discovering the closest known relative of SARS-CoV-2, a bat coronavirus called RaTG13. RaTG13 has become central to the hypothesis that SARS-CoV-2 originated in wildlife.

The addendum addresses unanswered questions about the provenance of RaTG13. The authors, Zhou et al., clarified they found RaTG13 in 2012-2013 “in an abandoned mineshaft in Mojiang County, Yunnan Province,” where six miners suffered acute respiratory distress syndrome after exposure to bat feces, and three died. Investigations of the symptoms of the sickened miners could provide important clues about the origins of SARS-CoV-2. Zhou et al. reported finding no SARS-related coronaviruses in stored serum samples of the sick miners, but they did not support their claims with data and methods about their assays and experimental controls.

The absence of key data in the addendum has raised further questions about the reliability of the Zhou et al. study. On November 27, U.S. Right to Know asked Nature questions about the addendum’s claims, and requested that Nature publish all supporting data that Zhou et al. may have provided.

On December 2, Nature Head of Communications Bex Walton replied that the original Zhou et al.  study was “accurate but unclear,” and that the addendum was an appropriate post-publication platform for clarification. She added: “With regards to your questions, we would direct you to approach the authors of the paper for answers, as these questions pertain not to the research that we have published but to other research undertaken by the authors, upon which we cannot comment” (emphasis ours). Since our questions related to research described in the addendum, the Nature representative’s statement suggests Zhou et al.’s addendum was not evaluated as research.

We asked a follow up question on December 2: “was this addendum subjected to any peer-review and/or editorial oversight by Nature?” Ms. Walton did not answer directly; she replied: “In general, our editors will assess comments or concerns that are raised with us in the first instance, consulting the authors, and seeking advice from peer reviewers and other external experts if we consider it necessary. Our confidentiality policy means we cannot comment on the specific handling of individual cases.”

Since Nature considers an addendum to be a post-publication update, and does not subject such post publication addenda to the same peer-review standards as original publications, it seems likely that the Zhou et al. addendum did not undergo peer-review.

Authors Zhengli Shi and Peng Zhou did not respond to our questions about their Nature addendum.