Archive for Linguistic history

Radial dendrograms

From Sarah Gao and Andrew Gao, "On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models", 7/19/2023:

That's not a vinyl — it's a "radial dendrogram" — showing the evolutionary tree of nearly 6,000 Large Language Models posted at Hugging Face. Zeroing in on one quadrant, so you can read the labels:

Read the rest of this entry »

Comments (2)

The Origin of Speeches? or just the collapse of Uruk?

I've wondered for a long time why Biblical inerrantists have a big problem with biological evolution, which contradicts Chapter 1 of Genesis, but not so much with historical linguistics, which contradicts Chapter 11.

But in "Linguistic Confusion and the Tower of Babel", National Catholic Register 6/21/2023, Dave Armstrong argues that the usual interpretation of the Tower of Babel story is simply a mistake, due to a bad job of sense disambiguation:

[T]he Hebrew word for “earth” (eretz) can mean many things, including the entire world (e.g., Genesis 1:1, 15; 2:1, 4), but also things like the “land” or “ground” of countries, such as Egypt (eretz mitzrayim) and Canaan (eretz kana’an), the dry land (Genesis 1:10), and ground from which seeds grow (Genesis 1:12). The New American Standard Bible translates eretz: country or countries 59 times, ground 119 times, land 1638 times; compare to earth, 656 instances, and world (3).

Read the rest of this entry »

Comments (22)

"On Dialogic Speech"

Thanks to yesterday's post on "Linguistic Laws", I spent a few minutes looking into the life and works of the Russian linguist Lav Jakubinskiy (or Lev Yakubinsky, or whatever transliteration you prefer). I don't think I've heard of him before — but a couple of things (and not Jakubinskiy's Law) convinced me that I should have. The main thing was what I learned about his 1923 work О диалогической речи ("On Dialogic Speech"). I haven't been able to find any online scans of the Russian original, but there's a 1997 PMLA article by Michael Eskin that offers some translated fragments along with a "Translator's Introduction", and a 2016 book, also due to Eskin, that offers a larger translated sample.

Read the rest of this entry »

Comments (8)

Inaugural embedding depth

Following up on yesterday's "Embedding depth" post, I've done the same analysis to the 62 Inaugural Addresses of U.S. presidents. (Actually, 61 of them — I had to omit John Adams' 1797 address, because its 35th sentence is 797 words long, which made the standard version of the Berkeley Neural Parser break down in tears…)

Read the rest of this entry »

Comments (8)

Embedding depth

In "Trends" (3/27/2022) I compared the distributions of sentence lengths in Ernest Hemingway's A Moveable Feast and Ursula K. Le Guin's The Wave in the Mind. The background, and some of the conclusions, can be found in the slides for my SHEL12 presentation. Hemingway is known for his short and simple sentences — see e.g. "Homo Hemingwayensis", 1/9/2005, for some discussion — but as I showed, his average sentence length is actually a bit on the long side for his time. And his overall distribution of sentence lengths is essentially identical that found in (later) work by Ursula K. Le Guin, despite her hilarious discussion of an alleged difference in her 1992 essay "Introducing Myself":

Read the rest of this entry »

Comments (9)

The mysterious Yale Burma embarrassment

Ben Zimmer just sent an update to a thread that started with a series of posts on the mobilization of American linguists during WWII:

"A tale of two societies", 3/1/2007
"Linguistics in 1940", 3/11/2007
"The Intensive Language Program", 3/20/2007
"The Chinese episode", 3/21/2007
"The Burmese Story", 3/22/2007

 J. Milton Cowan's account of the Burmese Story (from American Linguistics in Peace and at War) ends with the following passage:

Things went well for about a month then one day Franklin Edgerton turned up in our office looking very embarrassed. He said that Alamon had not been entirely frank about his sources of income, and although he rather enjoyed the atmosphere at Yale and Spotty was happy and well-adjusted, he was losing money on the deal. It seems he had been running a little numbers racket in lower Manhattan. Our work was so far along and the problem of getting a replacement so great that we finally settled for doubling his salary. The unwritten history of Burmese linguistics is loaded. Alamon's successor, the other Burmese-sounding name on the Roster, gave rise to an embarrassment of the Yale linguists and the University which was as funny to outsiders as it was painful for those involved. But enough for Burmese.

Read the rest of this entry »

Comments (12)


About six weeks from now, I'm scheduled to give a (virtual) talk with the (provisional) title "Historical trends in English sentence length and syntactic complexity". The (provisional) abstract:

It's easy to perceive clear historical trends in the length of sentences and the depth of clausal embedding in published English text. And those perceptions can easily be verified quantitatively. Or can they? Perhaps the title should be "Historical trends in English punctuation practices", or "Historical trends in English conjunctions and discourse markers." The answer depends on several prior questions: What is a sentence? What is the boundary between syntactic structure and discourse structure? How is message structure encoded in speech (spontaneous or rehearsed) versus in text? This presentation will survey the issues, look at some data, and suggest some answers — or at least some fruitful directions for future work.

So I've started the "look at some data" part, so far mostly by extending some of the many relevant earlier LLOG Breakfast Experiment™ explorations, such as "Inaugural embedding", 9/9/2005, or  "Real trends in word and sentence length", 10/31/2011, or "More Flesch-Kincaid grade-level nonsense", 10/23/2015. 

In most cases, the extensions just provide more data to support the ideas in the earlier posts. But sometimes, further investigation turns up some twists.

Read the rest of this entry »

Comments (15)

Henry Lee Smith Jr.

Amazingly, it appears that Henry Lee Smith Jr. has no Wikipedia page, despite a notable career in science, public service, and the media. According to his 1972 NYT obituary:

In 1940, when Dr. Smith was 27 and a member of the Department of English at Brown University, he came to public attention on the radio program, “Where Are You From?” over WOR. He selected people from a studio audience, listened to them talk and told them where they came from. He was right in four out of five tries.

For more about that radio program, see "Dr. Smith", The New Yorker 11/22/1940 (page image here), or "Radio: Where Are You From?", Time Magazine 5/6/1940.

According to a "Flashback" by the UB Reporter ("55 Years Ago: Henry Lee Smith, Linguist", 10/27/2011):

After receiving his PhD from Princeton and lecturing at Barnard, Columbia, and Brown, Smith headed the Language Section, Information and Education Division of the U.S. Army from 1942 to 1946.

Prior to the war, there were no foreign language materials for the bulk of the military and civilian personnel, and Smith, along with linguists he recruited, produced language guides, phrase books and military and general-purpose dictionaries in many different languages. Under Smith’s direction, the linguists also developed what came to be known as the Army method of language instruction—later adopted by colleges and universities—emphasizing the use of phonograph records on which a native speaker recited the foreign words and allowed a pause for repetition by the student.

Smith founded the State Department’s School of Language and Linguistics in 1946, and served as the school’s director prior to coming to UB.

For more about the role of linguists in (what became) the Defense Language Institute, see "A tale of two societies" (3/1/2007) and "Linguistics in 1940" (3/11/2007).

My personal exposure to Smith's work was through the influential 1951 monograph that we used to call "Trager Smith"  — I remember being struck by how many of the examples in Chomsky & Halle's 1968 The Sound Pattern of English were reproduced exactly from that source. (A link to a .pdf, courtesy of the Internet Archive, is here.)

Read the rest of this entry »

Comments (10)


Making coffee this morning made me think about brewing — not the process, but the English verb brew and its semantic evolution. In particular, it made me wonder again about nativist versions of semantic atomism, which hold that word meanings are (perhaps structured) collections of innate atomic features. Versions of these ideas go back thousands of years, but their most prominent recent exponent was Jerry Fodor.

The Internet Encyclopedia of Philosophy's article puts it this way:

Fodor was also a staunch defender of nativism about the structure and contents of the human mind, arguing against a variety of empiricist theories and famously arguing that all lexical concepts are innate. Fodor vigorously argued against all versions of conceptual role semantics in philosophy and psychology, and articulated an alternative view he calls “informational atomism,” according to which lexical concepts are unstructured “atoms” that have their content in virtue of standing in certain external, “informational” relations to entities in the environment.

Read the rest of this entry »

Comments (35)

Interfaces and Interactions

Going through a box of papers from years ago, I found one of Sally Thomason's famous doodles:

I've set it aside to be framed and hung, facing the Haida frog that was a gift a decade earlier from Nicola Bessel.

Read the rest of this entry »

Comments (14)

LLOG image search

Where is this picture from?

I tried Google Image Search without useful results.

Read the rest of this entry »

Comments (25)


In our 1992 chapter "The stress and structure of modified noun phrases in English" (in Sag & Szabolcsi, Lexical Matters), Richard Sproat and I noted that the normal order in English puts a nominal modifier before its head, but "there are some cases where it appears to be necessary to assume that the head of the construction is on the left and the modifier is on the right". We gave the examples

vitamin C, route 1, brand X, exit 14, peach Melba, steak diane, Cafe Beethoven, Club Med

My email address and cell phone number have recently found their way onto some political contact lists. And as a result, I get dozens of messages a day from Team X, where X is some politician's name: Team Trump, Team Joe, Team Warren, Team Collins, …

This led me to wonder about the history of the Team NAME construction. I'm not sure that I've got it right, so please explain in the comments what I've missed or misunderstood.

Read the rest of this entry »

Comments (54)

IRCS Prosody Workshop 1992: Undoing bit rot

Recently, Antônio Simões wrote to Cynthia McLemore to ask about a 28-year-old proceedings:

I used to find on the internet the Proceedings from 1992 that you edited with Mark Liberman. I tried to find them, but they are not on the internet anymore. Do you still have that volume in pdf? Or is it accessible somewhere on the internet? This is the volume:

McLemore, Cynthia, and Mark Liberman, eds. 1992. Proceedings of the IRCS Workshop on Prosody in Natural Speech. IRCS Report No. 92-37.

"IRCS" stands for "Institute for Research in Cognitive Science", an NSF research center founded in 1990 by Lila Gleitman and Aravind Joshi. IRCS  died in 2016 after a lingering siege of academic politics, and its website seems to have been purged last year. Penn's library has some IRCS technical reports in its repository, but not the one that Antônio is looking for. Many others are clearly missing, along with event recordings and so on — I'll see whether there are backups somewhere from which things can be restored.

Meanwhile, Cindie found a paper copy of the requested proceedings, and this page provides a table of contents with links and abstracts for scanned versions of the 26 papers it contains. Most of them are still interesting and relevant today!


Comments (1)