Whorf invents generative phonology?

After stumbling on Benjamin Lee Whorf's affiliation with the Theosophical Society, I read two articles that he contributed to the MIT Technology Review in 1940: "Science and Linguistics" in the April issue, and "Linguistics as an Exact Science" in the December issue. Something in the second article surprised me.

Whorf gives a formal account of English syllable structure in terms of what he calls "pattern symbolics", presenting the term and a sketch of the associated formalism as if they were standard linguistic theory, like "Maxwell's equations" in physics. But I've never heard the phrase "pattern symbolics" before, and web search turns up no examples other than this article. And the formalism seems similarly idiosyncratic.

Here's Figure 1, Whorf's "Structural formula of the monosyllabic word in English (standard midwestern American)":

He introduces the formula this way:

To strive at higher mathematical formulas for linguistic meaning while knowing nothing correctly of the shirt-sleeve rudiments of language is to court disaster. Physics does not begin with atomic structures and cosmic rays, but with motions of ordinary gross physical objects and symbolic (mathematical) expressions for these movements. Linguistics likewise does not begin with meaning nor with the structure of logical propositions, but with the obligatory patterns made by the gross audible sounds of a given language and with certain symbolic expressions of its own for these patterns. Out of these relatively simple terms dealing with gross sound patterning are evolved the higher analytical procedures of the science, just as out of the simple experiments and mathematics concerning falling and sliding blocks of wood is evolved all the higher mathematics of physics up into quantum theory. Even the facts of sound patterning are none too simple. But they illustrate the unconscious, obligatory, background phenomena of talking as nothing else can.

For instance, the structural formula for words of one syllable in the English language (Fig. 1) looks rather complicated; yet for a linguistic pattern it is rather simple. In the English-speaking world, every child between the ages of two and five is engaged in learning the pattern expressed by this formula, among many other formula.s By the time the child is six, the formula has become ingrained and automatic; even the little nonsense words the child makes up conform to it, exploring its possibilities but venturing not a jot beyond them. At an early age the formula become for the child what it is for the adult; no sequence of sounds that deviates from it can even be articulated without the greatest difficulty. New words like "blurb," nonsense words like Lewis Carroll's "mome raths," combinations intended to suggest languages of savages or animal cries, like "glub" and "squonk" — all come out of the mold of this formula. When the youth begin to learn a foreign language, he unconsciously tries to construct the syllable according to this formula. Of course it won't work; the foreign words are built to a formula of their own.

He gives a plug for the value of linguistic theory in language instruction:

Usually the student has a terrible time. Not even knowing that a formula is back of all the trouble, he think his difficulty is his own fault. The frustrations and inhibitions thus set up at the start constantly block his attempts to use foreign tongues. Or else he even hears by the formula, so that the English combinations that he makes sound to him like real French, for instance. Then he suffers less inhibition and may bescome what i called a "fluent" speaker of French — bad French!

If, however, he is so fortunate as to have his elementary French taught by a theoretic linguist, he first has the patterns of the English formula explained in such a way that they become semi-conscious, with the result that they lose the binding power over him which custom has given them, though they remain automatic as far as English is concerned. Then he acquires the French patterns without inner opposition, and the time for attaining command of the language is cut to a fraction (see Fig. 2). To be sure, probably no elementary French is ever taught in this way — at least not in public institutions. Years of time and millions of dollars' worth of wasted educational effort could be saved by the adoption of such methods, but men with the grounding in theoretic linguistics are as yet far too few and are chiefly in the higher institutions.

And then we come to his explanation of the formula, which starts this way:

Let us examine the formula for the English monoyllabic word. It look mathematical, but it isn't. It is an exprssion of pattern symbolics, an analytical method that grows out of linguistics and bear to linguistics a relation not unlike that of higher mathematics to physics. With such pattern formulas various operations can be performed, just as mathematical expressions can be added, multiplied, and otherwise operated with; only the operations here are not addition, multiplication, and so on, but are meanings that apply to linguistic contexts. From these operations conclusions can be drawn and experimental attacks directed intelligently at the really crucial point in the welter of data presented by the language under investigation. Usually the linguist does not need to manipulate th formula on paper but simply perform the symbolic operations in his mind and then says: "The paradigm of Class A verb can't have been reported right by the previous investigator"; or "Well, well this language must have alternating stress, though I couldn't hear them at first"; or "Funny, but d and l must be variants of the same sound in this language," and so on. Then he investigates by experimenting on a native informant and finds that the conclusion is justified. Pattern-symbolic expressions are exact, as mathematic is, but are not quantitative. They do not refer ultimately to number and dimension, as mathematics does, but to pattern and structure. Nor are they to be confused with theory of groups or with symbolic logic, though they may be in some ways akin.

Returning to the formula, the simplest part of it is the eighth term (the terms are numbered underneath), consisting of a V between plus signs. This mesan that every English word contains a vowel (not true of all languages). As the V is unqualified by other symbols, any one of the Englissh vowels can occur in the monosyllabic word (not true of all syllables of the polysllabic English word). Next we turn to the first term, which is a zero and which means that the vowel may be preceded by nothing; the word may begin with a vowel — a structure impossible in many languages. The commas between the terms mean "or." The second term is C minus a long-tailed n. This means that a word can begin with any single English consonant except one — the one linguists designate by a long-tailed n, which is the sound we commonly write ng, as in "hang." This ng sound is common at the end of English words but never occurs at the beginnings. In many languages, such as Hopi, Eskimo, or Samoan, it is a common beginning for a word. Our patterns set up a terrific resistance to articulation of these foreign words beginning with ng, but as soon as the mechanism of producing ng has been explained and we learn that our inability has been due to a habitual pattern, we can place the ng wherever we will and can pronounce these words with the greatest of ease. The letterse in th formula thus are not always equivalent to the letters by which we express our words in ordinary spelling but are unequivocal symbols such as a linguist would assign to the sounds in a regular and scientific system of spelling.

You can read the rest for yourself — and maybe translate the Figure 1 formula into a regular expression. My point is not to work through the details of Whorf's system of "pattern symbolics", but to give you the basis for asking the question that I asked myself: Where does Whorf's "pattern symbolics" come from? And where does it go after 1940?

There were certainly formal linguistic systems available in 1940 for expressing things like syllable-structure patterns, for example in the work of Leonard Bloomfield. But as far as I can tell, none of them used the term "pattern symbolics", or used the particular notation exemplified by Whorf's Figure 1. Perhaps a commenter will be able to enlighten us further.

Update — Penny Lee's 1996 book The Whorf theory complex: A critical reconstruction seems to be well worth reading. But nowhere in its 324 pages does the phrase "pattern symbolics" occur, at least according to Google Books search. There's just a brief footnote allusion on p. 39:

Whorf said that just as one is unaware of the intricate laws of phonemic pat­terning with which we comply whenever we speak or even make up nonsense words in our own language (p.254)8, so in the selection of words, the "personal mind" is also under sway of "a far more intellectual mind which […] can systematize and mathematize on a scale and scope that no mathematician of the schools ever remotely approached" .

8 Whorf was very well aware of these intricate phonetic patterns, having developed a model of the English monosyllable "which was at that time an original synthesis of facts about English sound clusters" (Carroll 1956:32) and (Whorf 1940d[LTR]:220-232). According to Darnell (1995:p.c), Sapir also "played with nonsense words pretty seriously for a stretch during his Ottawa years". She considers it quite possible that his Yale students would have done class exercises of this kind as a means of encouraging them to explore the intricacy of their own unconscious pattern systems.



  1. Jerry Packard said,

    November 26, 2023 @ 6:45 pm

    I don’t think I can provide anything particularly enlightening, but I have seen that notation before, specifically to describe the English syllable, though not as complex as Whorf’s model, and I can’t provide a specific reference. At any rate I wouldn’t consider Whorf’s model to be a precursor to generative phonology, because basically all the possibilities are explicitly listed, whereas in GP there would be underlying and derived forms (though to be fair, he does mention ‘variants’).

  2. John Coleman said,

    November 27, 2023 @ 4:21 am

    1940 is very early for such a phonotactic/combinatorial expression; it's very innovative. Firthian linguists used C-V "formulae" of varying degrees of complexity, but not until after 1945. Hjelmslev was keen on algebraic-style notation, but his Prologomena to a Theory of Language was not published until 1943.

  3. AntC said,

    November 27, 2023 @ 4:29 am

    Where does Whorf's "pattern symbolics" come from? [1940]

    A wild guess: work on Markov Chains had been bubbling along through the early C20th.

    Andrey Kolmogorov developed in a 1931 paper a large part of the early theory of continuous-time Markov processes.

    Would a 1931 Soviet Mathematical paper have got circulation as far as Yale or MIT by late 1930's? This was a little before Stalin's purges of intellectuals: Kolmogorov had toured giving lectures in Germany/Paris 1930. (I'm not suggesting Whorf would have read it, so much as that it would have generated discussion/circulated in the intellectual milieu.

    If you squint at that Fig 1. the right way, it seems to have a sequence of non-terminals (upper case Latin letters) each 'emitting' a concrete symbol (phoneme) and moving on to the next 'state' — i.e. non-terminal.

    The commas between the terms mean "or."

    Sounds pretty Markovian to me. Also a 'null' symbol 'O' meaning emit nothing and move on to the next state — that is, there is notionally a consonant at the start of English syllables, but it's not always realised audibly. (I see Whorf studies Biblical Hebrew at an early stage: this would correspond to a triliteral beginning with an inaudible aleph or ayin.)

  4. AntC said,

    November 27, 2023 @ 5:47 am

    BTW I gotta object to Whorf's

    Polynesian has the next most simple formula … Contrast this with the intricacy of English word structure,

    Culturally imperialist, much? Just because IPA struggles to represent Kīlauea. Stuff that up your Fig. 1! Or Ngāruhoe.

  5. Mark Liberman said,

    November 27, 2023 @ 6:34 am

    @Jerry Packard: "I wouldn’t consider Whorf’s model to be a precursor to generative phonology, because basically all the possibilities are explicitly listed, whereas in GP there would be underlying and derived forms".

    By "generative phonology" I didn't mean SPE or any other specific theory, but rather an algorithm for defining (aspects of) a language by generating a list of strings, with structures implicit in the process.

  6. Kenny Easwaran said,

    November 27, 2023 @ 7:20 am

    Re: AntC

    I don't think there's anything culturally imperialist about observing that Polynesian languages tend to have the interesting feature that a syllable basically always consists of a consonant plus a vowel (with no following consonant), or just a vowel, and that *any* consonant can begin a syllable. IPA has no struggle to represent the two names you mention, which both fit the pattern just fine (and the fact that the second one begins with "Ng" indicates that the relevant language doesn't have the complexity English does of excluding a particular consonant from syllable-initial position).

  7. J.W. Brewer said,

    November 27, 2023 @ 7:45 am

    There may be some rough conservation-of-total-complexity dynamic, where a language with more "intricate" phonotactic possibilities will be simpler in some other dimension and a language with simpler phonotactics (or more restrictive rules, is another way of looking at it …) will be more complex in another dimension. Or maybe it's as simple as observing that Polynesian words tend to be more polysyllabic than English words, because English gives you a larger range of potential monosyllables (and thus a larger range of two-syllable combinations etc etc). I don't know if there's a formal information-theory-type analysis of the pros and cons of having a more constrained set of potential syllables and thus needing more syllables to encode a given "amount" of meaning versus having a wider set of potential syllables and thus needing fewer syllables but I suppose also more energy expended on differentiating each syllable from its possible alternatives.

    IIRC, Whorf thought that Hopi was a more promising L1 than Indo-European languages for understanding quantum mechanics, which I suppose could be thought a quasi-imperialist attitude, albeit not in the expected direction.

  8. Mark Liberman said,

    November 27, 2023 @ 8:08 am

    @J.W. Brewer: "I don't know if there's a formal information-theory-type analysis of the pros and cons of having a more constrained set of potential syllables and thus needing more syllables to encode a given "amount" of meaning versus having a wider set of potential syllables and thus needing fewer syllables "

    See e.g. "Speech rate and per-syllable information across languages", 4/12/2008.

    Or Coupé et al., "Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche", 2019.

  9. Jonathan Smith said,

    November 27, 2023 @ 9:01 am

    FWIW on top of Fig. 1 there seem to be some onset-coda cooccurrence restrictions… e.g., krork, blalt, swew and the like are phonotactically marginal at best as single morphemes ('flailed' etc. OK). Non-matching liquid clusters (at times?) can work e.g. 'flirt', but is CrVlC a thing…?

  10. Mark Liberman said,

    November 27, 2023 @ 11:47 am

    @Jonathan Smith: "but is CrVlC a thing…?"


  11. J.W. Brewer said,

    November 27, 2023 @ 12:33 pm

    Thanks to myl for the links. I note with interest (to circle back to Whorf's Polynesian example) that the 2008 LL post includes a comment from Rob Malouf saying it was a shame the underlying study didn't have a corpus of Hawaiian speech to work with — Japanese being the most "impoverished" language in terms of syllable options that was studied. Although of course that post also indicates that rather than call English phonotactics "intricate" you can call them "entropic" which sounds rather less like praise? (Although I assume that in the technical sense associated with Shannon "entropy" (and its derivatives) does not have the pejorative valence it often does in the mouths of laypersons.)

  12. Milan said,

    November 27, 2023 @ 9:03 pm


    It is difficult to deny that there is much less variation in syllable structure in Polynesian syllables than in English ones. Despite possibly misleading connotations, saying that Polynesian syllable structure is 'simpler' strikes me as a perfectly straightforward of describing that state of affairs. Using a positively valenced word like 'intricate' to describe English, rather than the more neutral 'complex' or even 'complicated', on the other hand may betray some prejudice. The same is true for the choice not to contrast the English syllable with an even more complex one, such as that of Nuxalk.

