Tukey's birthday
Mouseover title: "Numbers can be tricky. On the day of my 110th birthday, I'll be one day younger than John Tukey was on his."
Read the rest of this entry »
Mouseover title: "Numbers can be tricky. On the day of my 110th birthday, I'll be one day younger than John Tukey was on his."
Read the rest of this entry »
When I was going through the TSA checkpoint in Philadelphia at the beginning of this run down the Mississippi, something very unfortunate happened. The TSA agent who was going through my carry-on belongings approached me and said, "Is this your stick?" "Yes, sir," I replied.
"I have a problem with your stick," he said.
"What's wrong with it?", I asked him.
"It's a blunt instrument."
"It's my walking stick," I said.
"You can't fly with this stick," he insisted. "It's a blunt instrument."
"But, sir, I've flown with it dozens of times, often right through Philadelphia, through this very checkpoint."
Read the rest of this entry »
YouTube's speech-to-text system is way behind the state of the art, or maybe has a good sense of humor. From its transcription of Donald Trump's 5/15/2025 speech in Qatar (the whitehouse.gov version):
Read the rest of this entry »
I have always been deeply intrigued by George Kingsley Zipf (1902-1950), but Mark's recent "Dynamic Philology" (5/24/25) rekindled my interest.
Put simply,
He is the eponym of Zipf's law, which states that while only a few words are used very often, many or most are used rarely,
where Pn is the frequency of a word ranked nth and the exponent a is almost 1. This means that the second item occurs approximately 1/2 as often as the first, and the third item 1/3 as often as the first, and so on. Zipf's discovery of this law in 1935 was one of the first academic studies of word frequency.
Although he originally intended it as a model for linguistics, Zipf later generalized his law to other disciplines. In particular, he observed that the rank vs. frequency distribution of individual incomes in a unified nation approximates this law, and in his 1941 book, "National Unity and Disunity" he theorized that breaks in this "normal curve of income distribution" portend social pressure for change or revolution.
Read the rest of this entry »
[This is a guest post by Mok Ling]
I happen to know a few students (of varying ages and learning experiences) who want to learn (or re-learn, for some of them) Mandarin the "right" way (that is, focusing on speaking and listening before reading and writing, unlike what is prescribed by most HSK courses). Right now, I've got them chewing on the revised Pinyin edition of Princeton's Chinese Primer (which is in pure Pinyin — not a single sinograph until halfway into the course), but they obviously need something outside of a textbook to read.
I'd planned on giving them a Pinyinized Kong Yiji as a "goal text" to read once they have a firm command of the spoken language, but thinking back this seems like a bad idea because of how flowery Lu Xun can get.
Read the rest of this entry »
Submitted by Charles Belov:
I've been browsing through the proposed Unicode 17 changes, currently undergoing a comment period, with interest. While I don't have the knowledge to intelligently comment on the proposals, it's good to see that they are actively improving language access.
I'm puzzled that some new characters have been added to the existing Unicode CJK Unified Ideographs Extension C (6 characters) and Unicode CJK Unified Ideographs Extension E (12 characters) rather than added to a new extension. But the most interesting is the apparently brand-new Unicode CJK Unified Ideographs Extension J, with over 4,000 added characters.
Read the rest of this entry »
We've talked about Dungan a lot on Language Log. That's the northwest Sinitic topolect written in Cyrillic that has been transplanted to Central Asia. See "Selected readings" below.
For those of you who are interested and would like to hear what it sounds like in real life — spoken and sung by male and female voices — we are fortunate to have a series of ten radio broadcast recordings (here).
Note the natural, easy, undistorted insertion of non-Sinitic borrowings, e.g., "Salam alaikum" (Arabic as-salāmu ʿalaykum السَّلَامُ عَلَيْكُمْ ("Peace be upon you"). That would not be possible in sinographic transcription of northwest Sinitic speech. This and other aspects and implications of alphabetic Dungan have been extensively discussed on LL.
Read the rest of this entry »
Running down the road in Clarksdale, Mississippi, I screeched to a halt (felt like Rroad Runner) when I passed by a Chinese restaurant with the odd name Rice Bowl (in Chinese it was Fànwǎn lóu 饭碗楼 — the only characters I saw on the premises). It was a tiny, nondescript establishment, with six or so chairs against the walls where you sat while you waited for your order to be prepared. Most people, however, stood in line or just came in to pick up what they had ordered over the phone.
The owner did a brisk business, but it was strictly take out. There were about 8 spaces for cars to park outside, though they were constantly coming and going.
Read the rest of this entry »
That's the title of a valuable Wikipedia article. I have no idea who wrote it, but I'm very glad to have access to this comprehensive article, since it touches on so many topics that concern my ongoing research.
Here are some highlights:
Before British colonisation, the Persian language was the lingua franca of the Indian subcontinent and a widely used official language in the northern India. The language was brought into South Asia by various Turkics and Afghans and was preserved and patronized by local Indian dynasties from the 11th century, such as Ghaznavids, Sayyid dynasty, Tughlaq dynasty, Khilji dynasty, Mughal dynasty, Gujarat sultanate, and Bengal sultanate. Initially it was used by Muslim dynasties of India but later started being used by non-Muslim empires too. For example, the Sikh Empire, Persian held official status in the court and the administration within these empires. It largely replaced Sanskrit as the language of politics, literature, education, and social status in the subcontinent.
Read the rest of this entry »
"Does GPT-4 Surpass Human Performance in Linguistic Pragmatics?" Bojic, Ljubiša et al. Humanities and Social Sciences Communications 12, no. 1 (June 10, 2025). Ljubiša Bojić, Predrag Kovačević, & Milan Čabarkapa. Humanities and Social Sciences Communications volume 12, Article number: 794 (2025)
Abstract
As Large Language Models (LLMs) become increasingly integrated into everyday life as general-purpose multimodal AI systems, their capabilities to simulate human understanding are under examination. This study investigates LLMs’ ability to interpret linguistic pragmatics, which involves context and implied meanings. Using Grice’s communication principles, we evaluated both LLMs (GPT-2, GPT-3, GPT-3.5, GPT-4, and Bard) and human subjects (N = 147) on dialogue-based tasks. Human participants included 71 primarily Serbian students and 76 native English speakers from the United States. Findings revealed that LLMs, particularly GPT-4, outperformed humans. GPT-4 achieved the highest score of 4.80, surpassing the best human score of 4.55. Other LLMs performed well: GPT-3.5 scored 4.10, Bard 3.75, and GPT-3 3.25; GPT-2 had the lowest score of 1.05. The average LLM score was 3.39, exceeding the human cohorts’ averages of 2.80 (Serbian students) and 2.34 (U.S. participants). In the ranking of all 155 subjects (including LLMs and humans), GPT-4 secured the top position, while the best human ranked second. These results highlight significant progress in LLMs’ ability to simulate understanding of linguistic pragmatics. Future studies should confirm these findings with more dialogue-based tasks and diverse participants. This research has important implications for advancing general-purpose AI models in various communication-centered tasks, including potential application in humanoid robots in the future.
Read the rest of this entry »
From Adam Rasgon and Natan Odenheimer, "U.S. Embassy in Jerusalem Braces for Possible Israeli Strike on Iran" NYT 6/12/2025:
More recently, however, Mr. Trump has said he was less convinced that talks with Iran would yield a new nuclear deal.
“I’m getting more and more less confident about it,” he told The New York Post in a podcast broadcast on Wednesday.
Read the rest of this entry »
In a comment on yesterday's post "A 12th-century influencer", Laura Morland wrote:
Thanks for sharing "to abelard," the new verb of the month! Note to AP: the grammarians will insist that it be spelled with a lower-case "a". (Verbs are never capitalized, not even in German, I don't believe.)
This is one where The Errorist might have the upper hand.
Read the rest of this entry »