Language Log

The possessive Jesus of composition

September 29, 2016 @ 4:07 pm · Filed by Geoffrey K. Pullum under Computational linguistics, Humor, Language and computers, Prescriptivist poppycock, Usage advice

« previous post | next post »

Let me explain, very informally, what a predictive text imitator is. It is a computer program that takes as input a passage of training text and produces as output a new text that is composed quasi-randomly except that it matches the training text with regard to the frequencies of word or character sequences up to some fixed finite length k.

(There has to be such a length limit, of course: the only text in which the word sequence of Melville's Moby-Dick is matched perfectly is Melville's Moby-Dick, but what a predictive text imitator trained on Moby-Dick would do is to produce quasi-random fake-Moby-Dickish gibberish in which each sequence of not more than k units matches Moby-Dick with respect to the transition probabilities between adjacent units.)

I tell you this because a couple of months ago Jamie Brew made a predictive text imitator and trained it on my least favorite book in the world, William Strunk's The Elements of Style (1918). He then set it to work writing the first ten sections of a new quasi-randomly generated book. You can see the results here. The first point at which I broke down and laughed till there were tears in my eyes was at the section heading 'The Possessive Jesus of Composition and Publication'. But there were other such points too. Take a look at it. And trust me: following the advice in Jamie Brew's version of the book won't do your writing much more harm than following the original.

My reasons for despising the original work by Strunk (and the even worse 1959 revision by E. B. White) are given here, and in greater detail here. Jamie Brew's astonishingly spare website is here. His code and various technical details are available on Github here. His own description of his program is that it is "a writing engine intended to imitate the predictive text function on smartphones." His description of the form of the training corpus his program uses is that it consists of "a tree with the frequencies of all n-grams up to a certain size present in a source, and information about which words precede and follow these." (The n-grams might be character sequences; I'm not sure about that.) I don't know anything about whether he tweaked the output to get his version of Elements or whether the program just spat out the whole hilarious thing, formatting and all.

[Update, October 5, 2016: Having now heard from Jamie I can now clarify (if you've finished reading his distortion of Elements and wiped the tears from your eyes). What Jamie wrote is not a fully automatic generator of text from given transition frequencies. There are such things, and they produce gibberish that locally seems like the right language but on a broader view is neither grammatical nor meaningful. However, the program Jamie wrote isn't one of those. He wrote it not to generate text directly but to emulate the guessing of most probable next words that predictive texting algorithms on smartphones do, and then he interacted with it to pick the specific words, choosing from the list of candidates provided by the program at each stage. In other words, the nonsense produced was a collaboration between two humans and a computer: Strunk wrote the original book and thus established a certain set of transition probabilities; the algorithm used a bounded view of what follows what in Strunk's text in order to compute a sequence of lists of reasonably probable next words given any (bounded) preceding context, and Jamie Brew started it running and chose a word from each new list of probables, gradually building a new text. I thank Jamie for getting in touch to explain all this. I have paraphrased him, so any errors in this exposition are my fault.]

[Comments on this post have been closed, with a probability approaching 1.0, by the possessive Jesus of composition and publication. There is no probable next word.]

September 29, 2016 @ 4:07 pm · Filed by Geoffrey K. Pullum under Computational linguistics, Humor, Language and computers, Prescriptivist poppycock, Usage advice

Permalink

Comments are closed.

The possessive Jesus of composition

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta