Proteins in organisms are primarily composed of 20 common amino acids encoded by 61 codons. Except for methionine and tryptophan, the other amino acids are encoded by 2 to 6 different codons. The phenomenon where multiple codons encode the same amino acid is known as codon degeneracy. Previous research commonly held that synonymous mutations (mutations that do not alter the protein sequence) have no impact on cellular function or organism adaptability due to their lack of change in the protein sequence.

However, recent studies have indicated that many synonymous mutations are not neutral. Synonymous codons can affect various biological processes, including mRNA structure and stability, protein translation efficiency, and protein folding, thereby influencing gene expression. Since the outbreak of the COVID-19 pandemic, research on genomic sequence variations of the SARS-CoV-2 virus has largely focused on non-synonymous mutations (able to alter the protein sequence), especially the impact of non-synonymous substitutions in the Spike (S) protein on virus transmissibility, pathogenicity, and immune evasion. However, research on synonymous mutations has been relatively limited.

Recently, a collaborative study titled “Optimization and deoptimization of codons in SARS-CoV-2 and related implications for vaccine development” was published in the journal Advanced Science by researchers from Peking University’s School of Life Sciences, led by Professor Lu Jian, and the Institute of Pathogen Biology at the Chinese Academy of Medical Sciences, led by Professor Qian Zhaohui.

This study revealed the patterns of codon usage in the SARS-CoV-2 virus genome since the COVID-19 outbreak and its potential mechanisms. The research found that the codon usage of the SARS-CoV-2 virus significantly differs from that of humans, showing a preference for non-preferred codons in the human genome.

As new variants of the virus emerge and evolve, there is some fluctuation in the codon adaptation index (CAI). However, overall, there is no clear trend towards the virus adopting codon usage that is more adapted to humans. Additionally, the study proposed a new strategy for optimizing the mRNA vaccine sequence for SARS-CoV-2 based on human codon usage. The researchers designed and validated candidate mRNA vaccine sequences expressing the Spike protein, providing technical support for the development of highly effective vaccines in the future.

To explore the changing trends in codon usage during the COVID-19 pandemic, this study evaluated the cumulative effects of variations on codon usage in 9,164,789 SARS-CoV-2 genome sequences using the CAI. The results revealed that the virus’s CAI gradually decreased over time, with some fluctuations. It was further found that the main driving factor causing CAI fluctuations was the combined effects of non-synonymous and synonymous substitutions altering codon usage, with the impact of synonymous substitutions alone being relatively minor.

The study also discovered that throughout the evolution of the virus, there were numerous C>U mutation hotspots at synonymous substitution sites in the genome. These C>U synonymous mutations often reduced the virus’s preference for codons in human cells, suggesting that these mutations might be a consequence of the C>U editing mechanism mediated by the APOBEC enzyme family in human cells, which acts against the virus. On the other hand, other types of mutations occurring in each virus genome could counteract the impact of C>U mutations on optimized codon usage, thus maintaining the overall tendency of the SARS-CoV-2 genome to use non-preferred codons.

In 70 pairs of SARS-CoV-2 sequences, the study used dual-luciferase experiments to find out how synonymous mutations affect how well proteins are made. The results showed that preferred codons tended to enhance protein expression efficiency, while non-preferred codons reduced protein expression efficiency, implying that changes in codon usage might play a significant role in the virus’s adaptation process. This study, for the first time on a large scale, revealed the influence of synonymous substitutions on viral protein expression efficiency, suggesting that a potential mechanism by which the APOBEC family-mediated C>U editing system in human cells combats the virus is by lowering the translation efficiency of viral proteins, providing a fresh perspective on understanding the interaction between the virus and host cells.

The translation efficiency of mRNA is a crucial factor in mRNA vaccine design. Enhancing mRNA translation efficiency can lead to more efficient viral immunization by using smaller vaccine doses and reducing the number of injections, which is an economically effective strategy. Considering the significance of codons in protein expression, this study proposes a novel strategy for codon optimization based on human cell translation preferences. The research team optimized the codons for the S gene of the Omicron BA.2.12.1, BA.4/5, and XBB.1.5 variants and confirmed through western blotting that the optimized mRNA sequences yielded higher protein expression levels, comparable to the commercial mRNA-1273 vaccine by Moderna.

In summary, this study emphasizes the importance of codon usage in viral evolution and introduces strategies for codon optimization, providing theoretical guidance for mRNA and DNA vaccine development. The findings uncover the impact of synonymous mutations on codon usage and protein expression efficiency, offering a new perspective on understanding the interaction between the virus and host cells. Furthermore, the study holds significance for designing more efficient vaccines and optimizing vaccine sequences to enhance protein expression efficiency.