Nature and PLoS Pathogens probe scientific veracity of key studies linking pangolin coronaviruses to origin of SARS-CoV-2

Sign up to receive updates from the Biohazards Blog.

By Sainath Suryanarayanan, PhD

Here, we provide our emails with senior authors of Liu et al. and Xiao et al., and the editors of PLoS Pathogens and Nature. We also present an in-depth discussion of the questions and concerns raised by these emails, which put in doubt the validity of these key studies on the origin of the novel coronavirus SARS-CoV-2 that causes COVID-19. See our reporting on these emails, Validity of key studies on origin of coronavirus in doubt; science journals investigating (11.9.20)

Email communications with Dr. Jinping Chen, senior author of Liu et al:

Email communications with Dr. Jinping Chen_1

Email communications with Dr. Jinping Chen_2

Email communications with Dr. Jinping Chen_3

Email communications with Dr. Jinping Chen_4

Email communications with Dr. Jinping Chen_5

Email communications with Dr. Jinping Chen_6

Email communications with Dr. Jinping Chen_8

Email communications with Dr. Jinping Chen_9

Email communications with Dr. Jinping Chen_10

Email communications with Dr. Jinping Chen_11

Email communications with Dr. Jinping Chen_12

Email communications with Dr. Jinping Chen_13

Dr. Jinping Chen’s emails raise a number of concerns and questions:

1– Liu et al. (2020) assembled their published pangolin coronavirus genome sequence based on coronaviruses sampled from three pangolins, two samples from a smuggled batch in March 2019, and one sample from a different batch intercepted in July 2019. The National Center for Biotechnology Information (NCBI) database, where scientists are required to deposit sequence data to ensure independent verification and reproducibility of published results, contains the sequence read archive (SRA) data for the two March 2019 samples but is missing data for the July 2019 sample. Upon being asked about this missing sample, which Dr. Jinping Chen identifies as F9, Dr. Jinping Chen stated: “The raw data of these three samples could be found under NCBI accession number PRJNA573298, and the BioSample ID were SAMN12809952, SAMN12809953, and SAMN12809954, moreover, individual (F9) from different batch was also positive, the raw data can be seen in NCBI SRA SUB 7661929, which will be released soon for we have another MS (under review)” (our emphasis).

It is concerning that Liu et al. have not published data corresponding to 1 of the 3 pangolins samples that they used to assemble their pangolin coronavirus genome sequence. Dr. Jinping Chen also did not share this data upon being asked. The norm in science is to publish and/or share all data that would allow others to independently verify and reproduce the results. How did PLoS Pathogens let Liu et al. evade publishing crucial sample data? Why is Dr. Jinping Chen not sharing data pertaining to this third pangolin sample? Why would Liu et al. want to release unpublished data pertaining to this third pangolin sample as part of another study that has been submitted to a different journal? The concern here is that scientists would misattribute the missing pangolin sample from Liu et al. to a different study, making it difficult for others to subsequently trace important details about this pangolin sample, such as the context in which the pangolin sample was collected.

2– Dr. Jinping Chen denied that Liu et al. have had any relationship with Xiao et al.’s (2020) Nature study. He wrote: “We submitted our PLOS Pathogens paper on Feb.14, 2020 before the Nature paper (the Reference 12 in our PLOS pathogens paper, they submitted on Feb.16, 2020 from their submit date in Nature), our PLOS pathogens paper explain that SARS-Cov-2 is not from pangolin coronavirus directly and pangolin not as intermediate host. We knew their work after their news briefing on Feb. 7, 2020, and we have different opinions with them, the other two papers (Viruses and Nature) have been listed in the PLOS Pathogen paper as reference papers (reference number 10 and 12), we are different research groups from Nature paper authors, and there is no relationship with each other, and we took samples with detail sample information from the Guangdong wildlife rescue center with helps from Jiejian Zou and Fanghui Hou as our co-authors and we don’t know where the samples of the Nature paper from.” (our emphases)

The following points raise doubts about Dr. Chen’s claims above:

a– Liu et al. (2020), Xiao et al (2020) and Liu et al. (2019) shared the following authors: Ping Liu and Jinping Chen were authors on the 2019 Viruses paper and the 2020 PLoS Pathogens paper, senior author Wu Chen on Xiao et al. (2020) was a co-author of the 2019 Viruses paper, and Jiejian Zhou and Fanghui Hou were authors on both Xiao et al. and Liu et al.

b– Both manuscripts were deposited to the public preprint server bioRxiv on the same date: February 20, 2020.

c– Xiao et al. “renamed pangolin samples first published by Liu et al. [2019] Viruses without citing their study as the original article that described these samples, and used the metagenomic data from these samples in their analysis” ( Chan and Zhan ).

d– Liu et al.’s full pangolin coronavirus genome is 99.95% identical at the nucleotide level to the full pangolin coronavirus genome published by Xiao et al. How could Liu et al. have produced a whole genome that is 99.95% identical (only ~15 nucleotides difference) to Xiao et al. without sharing datasets and analyses?

When different research groups independently arrive at similar sets of conclusions about a given research question, it significantly increases the likelihood of truth of the involved claims. The concern here is that Liu et al. and Xiao et al. were not independently conducted studies as claimed by Dr. Chen. Was there any coordination between Liu et al. and Xiao et al. regarding their analysis and publications? If so, what was the extent and nature of that coordination?

3– Why did Liu et al. not make publicly available the raw amplicon sequencing data that they used to assemble their pangolin coronavirus genome? Without this raw data, the pangolin coronavirus genome assembled by Liu et al., others cannot independently verify and reproduce the results of Liu et al. As mentioned earlier, the norm in science is to publish and/or share all data that would allow others to independently verify and reproduce the results. We asked Dr. Jingping Chen to share Liu et al.’s raw amplicon sequence data. He responded by sharing Liu et al.’s RT-PCR product sequence results, which are not the raw amplicon data used to assemble the pangolin coronavirus genome. Why is Dr. Jinping Chen reluctant to release the raw data that would allow others to independently verify Liu et al.’s analysis.

4– Liu et al. Viruses (2019) was published in October 2019 and its authors had deposited their pangolin coronavirus (sequence read archive) SRA data with NCBI on September 23, 2019, but waited until January 22, 2020 to make this data publicly accessible. Scientists typically release raw genomic sequence data on publicly accessible databases as soon as possible after the publication of their studies. This practice ensures that others can independently access, verify and utilize such data. Why did Liu et al. 2019 wait 4 months to make their SRA data publicly accessible? Dr. Jinping Chen chose not to directly answer this question of ours in his response on November 9, 2020.

We also got in touch with Dr. Stanley Perlman, PLoS Pathogens Editor of Liu et al. and this is what he had to say.

Notably, Dr. Perlman acknowledged that:

“PLoS Pathogens is investigating this paper in more detail”
He “did not verify the veracity of the July 2019 sample during pre-publication peer review”
“[c]oncerns about similarity between the two studies [Liu et al. and Xiao et al.] came to light only after both studies had been published.”
He “did not see any amplicon data during peer review. The authors provided an accession number for the assembled genome…although after publication it came to light that the accession number listed in the article’s Data Availability Statement is incorrect. This error and questions around the raw contig sequencing data are currently being addressed as part of the post-publication case.”

When we contacted PLoS Pathogens with our concerns about Liu et al. we got the following response from the Senior Editor of the PLoS Publication Ethics team:

Emails from Xiao et al.

On October 28, the Chief Biological Sciences Editor of Nature replied (below) with the key phrase “we take these issues very seriously and will look into the matter you raise below very carefully.”

On October 30, Xiao et al. finally publicly released their raw amplicon sequence data. However, as of the publication of this piece, the amplicon sequence data submitted by Xiao et al. is missing the actual raw data files that would allow for others to assemble and verify their pangolin coronavirus genome sequence.

Important questions remain that need to be addressed:

Are the pangolin coronaviruses real? The caption for Figure 1e in Xiao et al. states: “Viral particles are seen in double-membrane vesicles in the transmission electron microscopy image taken from Vero E6 cell culture inoculated with supernatant of homogenized lung tissue from one pangolin, with morphology indicative of coronavirus.” If Xiao et al. isolated the pangolin coronavirus, would they share the isolated virus sample with researchers outside of China? This could go a long way toward verifying that this virus actually exists and came from pangolin tissue.
How early in 2020, or even 2019, were Liu et al., Xiao et al., Lam et al. and Zhang et al. aware that they would be publishing results based on the same dataset?
a. Was there any coordination considering that one was preprinted on February 18 and three were preprinted on February 20?
b. Why did Liu et al. (2019) not make their sequence read archive data publicly accessible on the date they deposited it on NCBI’s database? Why did they wait until January 22, 2020 to make this pangolin coronavirus sequence data public.
c. Before the Liu et al. 2019 Viruses data was released on NCBI on January 22, 2020, was this data accessible to other researchers in China? If so, what database was the pangolin coronavirus sequencing data stored on, who had access, and when was the data deposited and made accessible?
Will the authors cooperate in an independent investigation to track the source of these pangolin samples to see if more SARS-CoV-2-like viruses can be found in the March to July 2019 batches of smuggled animals—which could exist as frozen samples or be still alive in the Guangdong Wildlife Rescue Center?
And will the authors cooperate in an independent investigation to see if the smugglers (were they imprisoned? or fined and let go?) have SARS virus antibodies from regular exposure to these viruses?

Get our newsletter | Weekly updates in your inbox

It's Your Right to Know