Thirteen genetic sequences – isolated from people infected with COVID-19 at the start of the pandemic in China – were mysteriously deleted from an online database last year but have now been recovered.
Jesse Bloom, a computer biologist and viral evolution specialist at the Fred Hutchinson Cancer Research Center in Seattle, found that the footage had been removed from an online database at the behest of scientists in Wuhan, China. But with some internet research, he was able to retrieve copies of the data stored on Google Cloud.
The footage does not fundamentally change scientists’ understanding of the origins of COVID-19 – including the thorny question of whether the coronavirus spread naturally from animals to humans or if it escaped in a lab accident . But their removal adds to concerns that Chinese government secrecy has hampered international efforts to understand how COVID-19 emerged.
Bloom’s results were published in a preprinted article, not yet evaluated by other scientists, published on Tuesday. “I think this is definitely an attempt to hide the footage,” he told BuzzFeed News.
Bloom became aware of the deleted data after reading an article by a team led by Carlos Farkas at the University of Manitoba in Canada on some of the early genetic sequences of SARS-CoV-2. Farkas’ article described sequences sampled from hospital outpatients as part of a project by Wuhan researchers who were developing diagnostic tests for the virus. But when Bloom tried to download the sequences from the Sequence Read Archive, an online database maintained by the United States National Institutes of Health, he received error messages saying they had been deleted. .
Bloom realized that the copies of the SRA data are also kept on servers managed by Google and was able to discover the URLs where the missing sequences could be found in the cloud. In this way, he recovered 13 genetic sequences that can help answer questions about the evolution of the coronavirus and where it came from.
Bloom found that the deleted sequences, like others collected at later dates outside the city, were more similar to bat coronaviruses – believed to be the ultimate ancestors of the virus that causes COVID-19 – than to the sequences related to the Huanan seafood market in Wuhan. This is in addition to previous suggestions that the seafood market may have been an early victim of COVID-19, rather than where the coronavirus first passed from animals to humans.
“This is a very interesting study done by Dr Bloom, and in my opinion the analysis is absolutely correct,” Farkas told BuzzFeed News via email. Scott Gottlieb, former head of the Food and Drug Administration, also praised the results on Twitter.
But some scientists were less impressed. “It really doesn’t add anything to the origins debate,” Robert Garry of Tulane University in New Orleans told BuzzFeed News by email. Garry argued that the Huanan Market or other Wuhan markets could still be the source of COVID-19.
Bloom is one of 18 scientists who published a letter in May criticizing the WHO-China study on the origins of SARS-CoV-2. Scientists argued that the WHO-China report failed to take into account competing ideas that the coronavirus spread naturally from animals to humans or escaped from a laboratory – a theory the report ruled. “extremely unlikely”. After the publication of the WHO-China report, the United States and 13 other governments complained that they “did not have access to complete and original data and samples.”
The suppressed viral sequences were first uploaded to the SRA in early March 2020, around the time researchers led by Yan Li and Tiangang Liu at Wuhan University released a prepublication describing their work using genetic sequencing. to diagnose COVID-19. A few days earlier, the Chinese State Council had ordered that all documents related to COVID-19 be approved at the central level.
The footage was then removed from the SRA in June, around the time the final version of the article appeared in a scientific journal. According to the NIH, the authors requested the removal of the footage. “The requester indicated that the sequence information had been updated, was submitted to another database, and wanted the data to be deleted from SRA to avoid version control issues,” the spokesperson for NIH, Amanda Fine, to BuzzFeed News via email.
However, it is not known if the footage has since been uploaded to another database.
“There is no plausible scientific reason for the deletion,” Bloom wrote in his prepublication, arguing that the footage was probably “deleted to obscure their existence.” This suggested, he wrote, “a less than sincere effort to trace the early spread of the epidemic.”
Although the sequences were removed, Garry pointed out that the key genetic mutations they contained were still published in a table in the Wuhan team’s outcome document. “Jesse Bloom hasn’t found anything new that isn’t already in the scientific literature,” Garry told BuzzFeed News, accusing Bloom of writing his preprint in “an inflammatory way that is unscientific and unnecessary.”
Bloom wrote to researchers in Wuhan to ask them why the footage was removed but received no response. Li and Liu also did not immediately respond to a request from BuzzFeed News.
This isn’t the first time scientists have worried about the deletion of data that could help answer questions about the origins of COVID-19. The main database with coronavirus sequence information held by the Wuhan Institute of Virology – which is the subject of speculation about a possible ‘lab leak’ of the virus – was taken offline in September 2019. When members of the WHO-China team who studied the origins of the pandemic visited the institute in February, they were told that the database, which reportedly included data on 22,000 coronavirus samples and records of sequences, had been deleted after repeated hacking attempts.