He Zhang, Liang Zhang, Ziyu Li, Kaibo Liu, Boxiang Liu, David H. Mathews, and Liang Huang, "LinearDesign: Efficient Algorithms for Optimized mRNA Sequence Design", arXiv.org 4/21/2020:
A messenger RNA (mRNA) vaccine has emerged as a promising direction to combat the current COVID-19 pandemic. This requires an mRNA sequence that is stable and highly productive in protein expression, features which have been shown to benefit from greater mRNA secondary structure folding stability and optimal codon usage. However, sequence design remains a hard problem due to the exponentially many synonymous mRNA sequences that encode the same protein. We show that this design problem can be reduced to a classical problem in formal language theory and computational linguistics that can be solved in O(n^3) time, where n is the mRNA sequence length. This algorithm could still be too slow for large n (e.g., n = 3, 822 nucleotides for the spike protein of SARS-CoV-2), so we further developed a linear-time approximate version, LinearDesign, inspired by our recent work, LinearFold. This algorithm, LinearDesign, can compute the approximate minimum free energy mRNA sequence for this spike protein in just 11 minutes using beam size b = 1, 000, with only 0.6% loss in free energy change compared to exact search (i.e., b = +infinity, which costs 1 hour). We also develop two algorithms for incorporating the codon optimality into the design, one based on k-best parsing to find alternative sequences and one directly incorporating codon optimality into the dynamic programming. Our work provides efficient computational tools to speed up and improve mRNA vaccine development.
Read the rest of this entry »