Language Log

Parsing RNA vaccines

January 17, 2024 @ 9:06 am · Filed by Mark Liberman under Computational linguistics

A recent LinkedIn post by Liang Huang lists some of his recent achievements, experiences, and honors. This work is all connected with the project of creating better algorithms for predicting the secondary structure of macromolecules, initially by analogy to algorithms developed for efficient parsing. This all began more than 20 years ago, based on work by Aravind Joshi — one of the first papers was Yasuo Uemura et al., "Tree adjoining grammars for RNA structure prediction", Theoretical computer science, 1999.

I discussed the history starting with an IRCS workshop in 2000, and the situation as of a few years ago, in "The computational linguistics of COVID-19 vaccine design", 7/27/2020.

One of Liang's LinkedIn links is a YouTube video of his presentation at the 11th mRNA Health Conference a couple of months ago in Berlin, where 22 minutes of your time will give you the whole story.

He also links to a 2023 Nature paper by Anna Blakney, "A tool for optimizing messenger RNA sequence", and a less technical Nature News article by Elie Dolgin, "‘Remarkable’ AI tool designs mRNA vaccines that are more potent and stable".

All this is yet another example of productive interdisciplinary cross-fertilization via mathematical and algorithmic analogies. (Or perhaps I should say, "homologies"…)

In the background is something I've been thinking about since that 2000 Penn workshop, as discussed in my 2020 post:

Together with David Searls, a former Penn colleague who was then Senior Vice President for Computational Biology at GlaxoSmithKline, Aravind had organized a workshop on the mathematical analysis of two types of strings of discrete elements: sentences and macromolecules.

One of the presentations was by Yasuo Uemura, Aki Hasegawa, and Satoshi Kobayashi, who presented work published as "Tree adjoining grammars for RNA structure prediction", Theoretical computer science 1999, which was "concerned with identifying a subclass of tree adjoining grammars (TAGs) that is suitable for the application to modeling and predicting RNA secondary structures". […]

Another presentation was by Elena Rivas and Sean Eddy, who described work published as "A dynamic programming algorithm for RNA structure prediction including pseudoknots", Journal of Molecular Biology 1999. From that paper's abstract:

We describe a dynamic programming algorithm for predicting optimal RNA secondary structure, including pseudoknots. […] The description of the algorithm is complex, which led us to adopt a useful graphical representation (Feynman diagrams) borrowed from quantum field theory.

After Elena's talk, Aravind and Elena spent the coffee break and some time thereafter at the whiteboard, exploring in detail the nature of the connections between Feynman Diagrams and Tree Adjoining Grammars. (Crudely, as I understand it, Feynman Diagrams offer an efficient solution for a certain class of integrals where most of an problematically increasing number of terms cancel as positive and negative pairs; these canceling pairs are analogous to the matching left and right brackets — or lexical dependencies — of a parsed sentence; both formalisms offer an analogy to the matching elements in macromolecule secondary structure. )

On my todo-list ever since: the idea of explaining in an accessible way (to myself as well as others) exactly what the analogy between Feynman Diagrams and parsing algorithms is.

January 17, 2024 @ 9:06 am · Filed by Mark Liberman under Computational linguistics

Permalink

2 Comments

JPL said,

January 18, 2024 @ 1:57 am

An interesting and fundamental problem, and probably a solvable, although open-ended, one, is probably behind all this, something I've also wondered about in my ignorance. Feynman diagrams, RNA/DNA structure, parsing algorithms; but I know nothing of these. So what is the current state of your understanding on this question: what exactly is the analogy (assuming there is one) between Feynman Diagrams and parsing algorithms? Worldly phenomena that do puzzle me, I could say, might be: molecules in a biological context, the differentiation of whose internal chemical structure has a further (beyond physical) significance; mathematical equations used to describe events in the physical world, specifically the internal structure of their (i.e., the equations') significance, i.e., what they express; and the internal structure of natural language sentences which parsing algorithms are intended to model; but here again, is the model describing the internal structure of the significance of the sentence, i.e., what is expressed by it, or something else? (Humans engaging in conversations, quite common, are not just trying to make funny noises; there is some further importance or significance in what they are doing; perhaps what we call "thoughts that they are trying to express"?) We would probably want an explanation of where this further significance comes from, and what are the conditions determining the nature of the specific differentiations of the internal structures. Does this make sense? Far from Aravind Joshi and the molecular biologists' problem, but I wonder in ignorance about the possibilities.
JPL said,

January 20, 2024 @ 8:30 pm

I'm disappointed that people have not chimed in on this post, but I listened to the video, and although their problem is one of practical application rather than pure inquiry, the reason their application of TAG-type parsing formalisms to the problem of determining a useful model of the internal structure of the mRNA molecule is effective seems to be that there is a (formal) similarity in the internal structures of mRNA molecules and NL sentences, specifically having to do with the fact that these are structures of inclusion dependencies (also called "hierarchical"), rather than mere linear strings with a serial order relation. Since structures of inclusion dependency in nature are going to have a history of construction, an explanation of why these structures are the way they are, specifically, will look for the reasons, or the principles, determining this evolutionary process of construction by differentiation, but this question of pure inquiry is outside of what these researchers are working on. But, to add the internal structure of mRNA molecules to the objects on your to-do list, you would want to know something of the types of reasons (or principles) in general that determine the evolution of differentiations. A crude and simple-minded response, no doubt, but the question is an interesting one.

RSS feed for comments on this post

Parsing RNA vaccines

2 Comments

JPL said,

JPL said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta