Coronavirus Origins: The Debate Flares Up, but the Evidence Remains Weak

The preprint claims that in the SARS-CoV-2 genome, the distribution of some restriction sites (the spots where the genome may have been cut and joined) is “anomalous” and compatible with the virus having been stitched together from multiple smaller fragments using type IIS enzymes called BsaI and BsmBI.

Notably, the restriction sites displayed an excess of silent mutations. These are nucleotide changes that don’t affect the characteristics of the virus and can be hallmarks of genetic engineering.

A Twist
When cutting and stitching together genomes using IIS enzymes, scientists can seamlessly erase any footprints of restriction sites through a method called “golden gate assembly”.

So, for the distribution of type IIS enzymes in SARS-CoV-2 to be interpreted as a signature of engineering, the IIS restriction sites would need to have been intentionally left in. Although not completely implausible, this isn’t standard practice, and scientists have questioned what the rationale would be for leaving these sites behind.

Questions have also been raised around some of the mathematical metrics on which the authors’ conclusions are based, in particular the presumed maximum length of the individual viral fragments. Meanwhile, the analysis has been criticized because it considered only the two type IIS restriction enzymes commonly used in this context.

All of these extremely technical points of contention illustrate the difficulty of formulating satisfying, testable hypotheses for complex questions.

What Are the Chances?
The study also explored how easily the distribution pattern of restriction sites observed in SARS-CoV-2 could be generated by chance (as opposed to engineering). The researchers simulated a process of random mutations starting from two close relatives of SARS-CoV-2. The probability of generating the same pattern was low – 0.1% and 1.2%.

Again, this analysis has been criticized. Coronaviruses can naturally gain and lose restriction motifs by accumulating mutations, but also through different viral strains exchanging genetic material, a process called genetic recombination.

As coronaviruses undergo frequent genetic recombination, a simulation process using a mix of recombination and mutation events may arguably be better suited to address this question.

This criticism is fair, but partly overlooks the fact that unusual patterns can be informative even if the process that generated them remains unknown. A single black sheep in a flock of 1,000 stands out irrespective of whether its coat color was caused by an unusual genetic makeup or because it fell in a barrel of tar.

The evidence reported in the preprint is neither conclusive nor final. These findings may turn out to be a fluke, or generated by a flaw in the method. The authors have been largely open about some limitations of their work and have invited comments and criticism.

Even if the findings can be replicated by others, and stand up once additional data has been analyzed, this study is unlikely to sway many opinions. At best – or at worst, depending on one’s prior belief – those results will just contribute a speck of additional weak, circumstantial evidence to the debate.

The reception of the work raises difficult questions. Some experts feel it’s unwise to discuss any evidence supporting a lab leak, as this may fuel conspiracy theories. Though, a public perception that existing evidence may be subjected to censorship is even more likely to have this effect. Notably, China has been largely uncooperative in investigations into the origin of the virus.

The nightmare scenario to me would not be the eventual confirmation of an accidental lab leak, but confirmation of a lab leak whose evidence has been aggressively suppressed.

Francois Balloux is Chair Professor, Computational Biology, UCL. This article is published courtesy of The Conversation.