Genomic adaptation in coronaviruses following a host shift

By Vin­cent Mon­toya & Jef­frey Joy

Last month we and our co-authors pub­lished our paper “Vari­able routes to gen­om­ic and host adapt­a­tion among coronavir­uses” in the Journ­al of Evol­u­tion­ary Bio­logy. Coronavir­uses are known to have spilled over into the human pop­u­la­tion on at least sev­en dif­fer­ent occa­sions. In our study we com­pare the gen­om­ic adapt­a­tions of each of these sev­en coronavir­uses to identi­fy pat­terns and dis­tin­guish­ing char­ac­ter­ist­ics along their evol­u­tion­ary paths to their human hosts.

Vir­uses are the most ubi­quit­ous bio­lo­gic­al entity on Earth. They exhib­it rates of muta­tions that are often sev­er­al orders of mag­nitude great­er than their hosts and as a res­ult are exquis­itely adap­ted to these hos­tile intra/extracellular envir­on­ments. This accel­er­ated evol­u­tion is revealed on a daily basis through anti­vir­al drug res­ist­ance, per­sist­ent immune eva­sion, rap­id changes in host cell recept­or bind­ing affin­it­ies, and finally host switch­ing events.

Host switch­ing events offer unique insights into vir­al evol­u­tion where this accel­er­ated evol­u­tion is strin­gently tested for its abil­ity to adapt to the new cel­lu­lar envir­on­ment. The amount of vir­al adapt­a­tion neces­sary to estab­lish endem­ic dis­ease in the nov­el host is thought to be depend­ent upon the extent of genet­ic over­lap between the donor and recip­i­ent hosts along with the fre­quency of con­tact between them [1]. Fur­ther­more, the great­er the evol­u­tion­ary dis­tance between hosts, the great­er adapt­a­tion is required by the vir­us to over­come bar­ri­ers to infection. 

Fig­ure 1. Mock-phylogenetic tree depict­ing host-switching events for the coronavir­uses in this study. Each vir­us evolved from with­in its pre­sumed primary, nat­ur­al host (bats or rats), sub­sequently spilled over into an inter­me­di­ate host, and finally jumped into humans. The estim­ated time since intro­duc­tion into the human pop­u­la­tion is shown at the node pre­ced­ing each human vir­al lin­eage (years ago: YA).

New host envir­on­ments place enorm­ous select­ive pres­sures on vir­uses to optim­ize host infec­tion. Selec­tion can be neut­ral where no changes were selec­ted, pos­it­ive which res­ult in an increase in fre­quency of a par­tic­u­lar muta­tion, or neg­at­ive when a muta­tion at a par­tic­u­lar site decreases in fre­quency or is purged (ie puri­fy­ing selection).

Coronavir­uses are known to have spilt over into the human pop­u­la­tion on at least sev­en dif­fer­ent occa­sions [2]. These range from HCOV-NL63 which is thought to have diverged from its most recent com­mon ancest­or >500 years ago to SARS-CoV­‑2 that is thought to have emerged in Octo­ber 2019 [3] (Fig­ure 1). A clear under­stand­ing of where, when, why, and how changes in the coronavir­us gen­omes occur fol­low­ing a host jump is cru­cial for under­stand­ing new vari­ants of the vir­us, vac­cine design, pub­lic health responses, and pre­dict­ing future pan­dem­ics. In this study we wanted to com­pare the gen­om­ic adapt­a­tions of each of these sev­en coronavir­uses in order to identi­fy under­ly­ing pat­terns and dis­tin­guish­ing char­ac­ter­ist­ics along their evol­u­tion­ary paths to their human hosts. 

In order to exam­ine the evol­u­tion­ary select­ive pres­sures placed upon each vir­us sep­ar­ately for each host, all avail­able gen­omes with host annota­tions were ana­lyzed. First, we wanted to see if there were pat­terns in terms of which por­tion of the gen­ome was under selec­tion. We found that in gen­er­al the genes asso­ci­ated with vir­al rep­lic­a­tion and the gene encod­ing the pro­tein involved in host cell recept­or bind­ing (the Spike pro­tein) were most com­monly selec­ted among each vir­us (Fig­ure 2).

Next, we wanted to com­pare select­ive pres­sures by host type (Fig­ure 3). We found that bats had sig­ni­fic­antly great­er amounts of selec­tion in the Spike and rep­lic­a­tion asso­ci­ated genes com­pared with human and inter­me­di­ate hosts (civ­ets, camels, and cows).

Fig­ure 2. Num­ber of sites under pos­it­ive (“Pos­it­ive Sel”) or neg­at­ive selec­tion (“Neg­at­ive Sel”) for each gene/ORF. Counts of selec­tion were nor­mal­ized by the log10 num­ber of sequences used in each ana­lys­is. Res­ults for civ­ets, camels and bovine derived vir­uses were com­bined into the “inter­me­di­ate” host category.
Fig­ure 3. Com­par­is­on of the pre­val­ence of pos­it­ively selec­ted sites for orf1ab and spike for each respect­ive host group. Since HCOV-HKU1 and HCOV-OC43 have no bat host, they were removed from this ana­lys­is. Pos­it­ively selec­ted sites for each gene were summed and divided by the log10 sequence counts. A Kruskal–Wallis stat­ist­ic­al test was per­formed to exam­ine dif­fer­ences between each group, and p-val­ues are shown above viol­in plots.

Finally, struc­tur­al fea­tures of the Spike pro­tein were examined to see wheth­er they influ­enced the sites of selec­tion for each of the coronavir­uses. As the Spike pro­tein is syn­thes­ized and sub­sequently shuttled to the cell sur­face, oli­gosac­char­ides are added and trimmed by host enzymes in a site-spe­cif­ic man­ner. One of the most com­mon forms of these post-trans­la­tion­al modi­fic­a­tions is N‑linked glyc­osyla­tion. For example, the SARS-CoV-2 tri­mer­ic spike pro­tein is glyc­osylated at 22 dif­fer­ent amino acid sites which col­lect­ively shield ~40% of the pro­tein sur­face (Fig­ure 4, [4]). These glycan molecules have sig­ni­fic­ant impacts on sev­er­al aspects of vir­al infec­tions includ­ing pro­tein sta­bil­ity, anti­gen­i­city, and host cell recept­or binding.

Fig­ure 4. Spike pro­tein struc­ture for SARS-CoV­‑2. Each Spike pro­tomer is col­oured a shade of red, where­as the recept­or bind­ing site is shaded blue. Pos­it­ively selec­ted sites shaded in green. Pre­dicted glycan struc­tures are shaded grey.

We sought to determ­ine how these glycan molecules influ­ence the evol­u­tion­ary select­ive pres­sures placed upon each coronavir­us. To address this ques­tion, we com­pared the prox­im­ity of each glycan molecule to each site that was under pos­it­ive selec­tion. If sites under pos­it­ive selec­tion were more com­monly found at a great­er dis­tance rel­at­ive to glycan molecules, then it sug­gests that glycans are shield­ing or block­ing selec­tion (for example, anti­body driv­en evol­u­tion). Con­sist­ent with pre­vi­ous res­ults we found these dis­tances var­ied, how­ever, a con­sist­ently high­er frac­tion of sites under pos­it­ive selec­tion were in close prox­im­ity to glycan molecules for SARS-CoV­‑2. This sug­gests that selec­tion, while mul­ti­factori­al, is per­haps less driv­en by anti­body eva­sion and to a great­er extent by pro­tein sta­bil­iz­a­tion and/or host recept­or binding. 

These res­ults sup­port the hypo­thes­is that each coronavir­us has tra­versed a dif­fer­ent path along the way to col­on­iz­ing humans and high­light the extraordin­ary capa­city of evol­u­tion to solve prob­lems in a myri­ad of ways. The select­ive regimes placed upon the gen­omes of each vir­us seems to reflect the del­ic­ate bal­ance between vir­al genet­ics and the unique eco­lo­gies of both the vir­uses and their respect­ive hosts.


[1]       K. J. Oliv­al, P. R. Hos­seini, C. Zam­brana-Tor­rel­io, N. Ross, T. L. Bogich, and P. Daszak. (2017). Host and vir­al traits pre­dict zoonot­ic spillover from mam­mals, Nature546, 646– 650, doi: 10.1038/nature22975.

[2]       J. Cui, F. Li, and Z. L. Shi. (2019). Ori­gin and evol­u­tion of patho­gen­ic coronavir­uses, Nature Reviews Micro­bi­o­logy. 2019, 17, 181–192, doi: 10.1038/s41579-018‑0118‑9.

[3]        J. Pekar, M. Worobey, N. Moshiri, K. Scheffler, and J. Wer­theim. (2021). Tim­ing the SARS-CoV­‑2 index case in Hubei province. Sci­ence, 372, 412–417, doi: 10.1126/science.abf8003.

[4]       Y. Watanabe, J. D. Allen, D. Wrapp, J. S. McLel­lan, and M. Crispin. (2020). Site-spe­cif­ic glycan ana­lys­is of the SARS-CoV­‑2 spike, Sci­ence369, 330– 333, doi: 10.1126/science.abb9983.