Archive for Linguistic history

The evolution of verbal interpolations

Philip Castle, "Quelles sont les expressions les plus utilisées dans la langue française courante?", Quora 6/20/2024:

On va commencer par voilà. O-bli-ga-toi-re ! Il faut parsemer votre discours de "voilà", sans trop vous préoccuper de leur place ni de leur utilité dans la phrase, bien au contraire. Exemple : "Je me suis dit que voilà ce serait bien de voilà faire des efforts pour voilà améliorer mon français". Il faut aussi garder à l'esprit que ce mot merveilleux peut tout remplacer, y compris une fin de phrase. Exemple entendu ce matin sur France Inter : "En fait, le SMIC à 1600 €, je suis patron alors voilà". Vous avez compris le principe, il n'est pas nécessaire de terminer votre phrase, votre interlocuteur la finira lui même en remplaçant le voilà par ce qu'il veut.

We'll start with "voilà". O-bli-ga-to-ry! You need to sprinkle your speech with (instances of) "voilà", without worrying much about their place or their use in the phrase, in fact the opposite. Example: "Je me suis dit que voilà ce serait bien de voilà faire des efforts pour voilà améliorer mon français". You also need to keep in mind that this marvelous word can replace anything, including the end of a phrase. An example heard this morning on France Inter: "En fait, le SMIC à 1600 €, je suis patron alors voilà". You've understood the principle, it's not necessary to end your phrase, your interlocutors will finish it for themselves, replacing the "voilà" with whatever they like.

Read the rest of this entry »

Comments (4)

Le Nouchi

Elian Peltier, "How Africans Are Changing French — One Joke, Rap and Book at a Time", NYT 12/12/2023:

French, by most estimates the world’s fifth most spoken language, is changing — perhaps not in the gilded hallways of the institution in Paris that publishes its official dictionary, but on a rooftop in Abidjan, the largest city in Ivory Coast.

There one afternoon, a 19-year-old rapper who goes by the stage name “Marla” rehearsed her upcoming show, surrounded by friends and empty soda bottles. Her words were mostly French, but the Ivorian slang and English words that she mixed in made a new language.

To speak only French, “c’est zogo” — “it’s uncool,” said Marla, whose real name is Mariam Dosso, combining a French word with Ivorian slang. But playing with words and languages, she said, is “choco,” an abbreviation for chocolate meaning “sweet” or “stylish.”

A growing number of words and expressions from Africa are now infusing the French language, spurred by booming populations of young people in West and Central Africa.

Read the rest of this entry »

Comments (12)

Whorf invents generative phonology?

After stumbling on Benjamin Lee Whorf's affiliation with the Theosophical Society, I read two articles that he contributed to the MIT Technology Review in 1940: "Science and Linguistics" in the April issue, and "Linguistics as an Exact Science" in the December issue. Something in the second article surprised me.

Whorf gives a formal account of English syllable structure in terms of what he calls "pattern symbolics", presenting the term and a sketch of the associated formalism as if they were standard linguistic theory, like "Maxwell's equations" in physics. But I've never heard the phrase "pattern symbolics" before, and web search turns up no examples other than this article. And the formalism seems similarly idiosyncratic.

Read the rest of this entry »

Comments (12)

Radial dendrograms

From Sarah Gao and Andrew Gao, "On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models", 7/19/2023:

That's not a vinyl — it's a "radial dendrogram" — showing the evolutionary tree of nearly 6,000 Large Language Models posted at Hugging Face. Zeroing in on one quadrant, so you can read the labels:

Read the rest of this entry »

Comments (2)

The Origin of Speeches? or just the collapse of Uruk?

I've wondered for a long time why Biblical inerrantists have a big problem with biological evolution, which contradicts Chapter 1 of Genesis, but not so much with historical linguistics, which contradicts Chapter 11.

But in "Linguistic Confusion and the Tower of Babel", National Catholic Register 6/21/2023, Dave Armstrong argues that the usual interpretation of the Tower of Babel story is simply a mistake, due to a bad job of sense disambiguation:

[T]he Hebrew word for “earth” (eretz) can mean many things, including the entire world (e.g., Genesis 1:1, 15; 2:1, 4), but also things like the “land” or “ground” of countries, such as Egypt (eretz mitzrayim) and Canaan (eretz kana’an), the dry land (Genesis 1:10), and ground from which seeds grow (Genesis 1:12). The New American Standard Bible translates eretz: country or countries 59 times, ground 119 times, land 1638 times; compare to earth, 656 instances, and world (3).

Read the rest of this entry »

Comments (22)

"On Dialogic Speech"

Thanks to yesterday's post on "Linguistic Laws", I spent a few minutes looking into the life and works of the Russian linguist Lav Jakubinskiy (or Lev Yakubinsky, or whatever transliteration you prefer). I don't think I've heard of him before — but a couple of things (and not Jakubinskiy's Law) convinced me that I should have. The main thing was what I learned about his 1923 work О диалогической речи ("On Dialogic Speech"). I haven't been able to find any online scans of the Russian original, but there's a 1997 PMLA article by Michael Eskin that offers some translated fragments along with a "Translator's Introduction", and a 2016 book, also due to Eskin, that offers a larger translated sample.

Read the rest of this entry »

Comments (8)

Inaugural embedding depth

Following up on yesterday's "Embedding depth" post, I've done the same analysis to the 62 Inaugural Addresses of U.S. presidents. (Actually, 61 of them — I had to omit John Adams' 1797 address, because its 35th sentence is 797 words long, which made the standard version of the Berkeley Neural Parser break down in tears…)

Read the rest of this entry »

Comments (8)

Embedding depth

In "Trends" (3/27/2022) I compared the distributions of sentence lengths in Ernest Hemingway's A Moveable Feast and Ursula K. Le Guin's The Wave in the Mind. The background, and some of the conclusions, can be found in the slides for my SHEL12 presentation. Hemingway is known for his short and simple sentences — see e.g. "Homo Hemingwayensis", 1/9/2005, for some discussion — but as I showed, his average sentence length is actually a bit on the long side for his time. And his overall distribution of sentence lengths is essentially identical that found in (later) work by Ursula K. Le Guin, despite her hilarious discussion of an alleged difference in her 1992 essay "Introducing Myself":

Read the rest of this entry »

Comments (9)

The mysterious Yale Burma embarrassment

Ben Zimmer just sent an update to a thread that started with a series of posts on the mobilization of American linguists during WWII:

"A tale of two societies", 3/1/2007
"Linguistics in 1940", 3/11/2007
"The Intensive Language Program", 3/20/2007
"The Chinese episode", 3/21/2007
"The Burmese Story", 3/22/2007

 J. Milton Cowan's account of the Burmese Story (from American Linguistics in Peace and at War) ends with the following passage:

Things went well for about a month then one day Franklin Edgerton turned up in our office looking very embarrassed. He said that Alamon had not been entirely frank about his sources of income, and although he rather enjoyed the atmosphere at Yale and Spotty was happy and well-adjusted, he was losing money on the deal. It seems he had been running a little numbers racket in lower Manhattan. Our work was so far along and the problem of getting a replacement so great that we finally settled for doubling his salary. The unwritten history of Burmese linguistics is loaded. Alamon's successor, the other Burmese-sounding name on the Roster, gave rise to an embarrassment of the Yale linguists and the University which was as funny to outsiders as it was painful for those involved. But enough for Burmese.

Read the rest of this entry »

Comments (12)


About six weeks from now, I'm scheduled to give a (virtual) talk with the (provisional) title "Historical trends in English sentence length and syntactic complexity". The (provisional) abstract:

It's easy to perceive clear historical trends in the length of sentences and the depth of clausal embedding in published English text. And those perceptions can easily be verified quantitatively. Or can they? Perhaps the title should be "Historical trends in English punctuation practices", or "Historical trends in English conjunctions and discourse markers." The answer depends on several prior questions: What is a sentence? What is the boundary between syntactic structure and discourse structure? How is message structure encoded in speech (spontaneous or rehearsed) versus in text? This presentation will survey the issues, look at some data, and suggest some answers — or at least some fruitful directions for future work.

So I've started the "look at some data" part, so far mostly by extending some of the many relevant earlier LLOG Breakfast Experiment™ explorations, such as "Inaugural embedding", 9/9/2005, or  "Real trends in word and sentence length", 10/31/2011, or "More Flesch-Kincaid grade-level nonsense", 10/23/2015. 

In most cases, the extensions just provide more data to support the ideas in the earlier posts. But sometimes, further investigation turns up some twists.

Read the rest of this entry »

Comments (15)

Henry Lee Smith Jr.

Amazingly, it appears that Henry Lee Smith Jr. has no Wikipedia page, despite a notable career in science, public service, and the media. According to his 1972 NYT obituary:

In 1940, when Dr. Smith was 27 and a member of the Department of English at Brown University, he came to public attention on the radio program, “Where Are You From?” over WOR. He selected people from a studio audience, listened to them talk and told them where they came from. He was right in four out of five tries.

For more about that radio program, see "Dr. Smith", The New Yorker 11/22/1940 (page image here), or "Radio: Where Are You From?", Time Magazine 5/6/1940.

According to a "Flashback" by the UB Reporter ("55 Years Ago: Henry Lee Smith, Linguist", 10/27/2011):

After receiving his PhD from Princeton and lecturing at Barnard, Columbia, and Brown, Smith headed the Language Section, Information and Education Division of the U.S. Army from 1942 to 1946.

Prior to the war, there were no foreign language materials for the bulk of the military and civilian personnel, and Smith, along with linguists he recruited, produced language guides, phrase books and military and general-purpose dictionaries in many different languages. Under Smith’s direction, the linguists also developed what came to be known as the Army method of language instruction—later adopted by colleges and universities—emphasizing the use of phonograph records on which a native speaker recited the foreign words and allowed a pause for repetition by the student.

Smith founded the State Department’s School of Language and Linguistics in 1946, and served as the school’s director prior to coming to UB.

For more about the role of linguists in (what became) the Defense Language Institute, see "A tale of two societies" (3/1/2007) and "Linguistics in 1940" (3/11/2007).

My personal exposure to Smith's work was through the influential 1951 monograph that we used to call "Trager Smith"  — I remember being struck by how many of the examples in Chomsky & Halle's 1968 The Sound Pattern of English were reproduced exactly from that source. (A link to a .pdf, courtesy of the Internet Archive, is here.)

Read the rest of this entry »

Comments (10)


Making coffee this morning made me think about brewing — not the process, but the English verb brew and its semantic evolution. In particular, it made me wonder again about nativist versions of semantic atomism, which hold that word meanings are (perhaps structured) collections of innate atomic features. Versions of these ideas go back thousands of years, but their most prominent recent exponent was Jerry Fodor.

The Internet Encyclopedia of Philosophy's article puts it this way:

Fodor was also a staunch defender of nativism about the structure and contents of the human mind, arguing against a variety of empiricist theories and famously arguing that all lexical concepts are innate. Fodor vigorously argued against all versions of conceptual role semantics in philosophy and psychology, and articulated an alternative view he calls “informational atomism,” according to which lexical concepts are unstructured “atoms” that have their content in virtue of standing in certain external, “informational” relations to entities in the environment.

Read the rest of this entry »

Comments (35)

Interfaces and Interactions

Going through a box of papers from years ago, I found one of Sally Thomason's famous doodles:

I've set it aside to be framed and hung, facing the Haida frog that was a gift a decade earlier from Nicola Bessel.

Read the rest of this entry »

Comments (14)