Language Log

PRC cyberspace authorities fight against bad memes and distorted pronunciation

October 16, 2024 @ 9:35 pm · Filed by Victor Mair under Censorship, Etymology, Language on the internets, Pronunciation

Judging from these alarming top level official government proclamations, one might think that the Chinese language is going to hell in a handbasket, and it's all because of the deleterious effects of the internet, strict policing of which is absolutely necessary.

Rénmín wǎng: Rénmín rè píng: Jǐnfáng hēi huà làn gěng de yǐnxìng qīnshí

Lín Fēng 2024 nián 10 yuè 13 rì 10:31 | Láiyuán: Rénmín wǎng-guāndiǎn píndào xiǎo zìhao jìnrì, zhōngyāng wǎng xìn bàn, jiàoyù bù yìnfā tōngzhī, bùshǔ kāizhǎn “qīnglǎng·guīfàn wǎngluò yǔyán wénzì shǐyòng” zhuānxiàng xíngdòng. Zhuānxiàng xíngdòng jùjiāo bùfèn wǎngzhàn píngtái zài rè sōu bǎng dān, shǒuyè shǒu píng, fāxiàn jīngxuǎn děng zhòngdiǎn huánjié chéngxiàn de yǔyán wénzì bù guīfàn, bù wénmíng xiànxiàng, zhòngdiǎn zhěngzhì wāi qǔ yīn, xíng, yì, biānzào wǎngluò hēi huà làn gěng, lànyòng yǐnhuì biǎodá děng túchū wèntí.

人民網：人民熱評：謹防黑話爛梗的隱性侵蝕
林風
2024年10月13日10:31 | 來源：人民網-觀點頻道
小字號

近日，中央網信辦、教育部印發通知，部署開展“清朗·規范網絡語言文字使用”專項行動。專項行動聚焦部分網站平台在熱搜榜單、首頁首屏、發現精選等重點環節呈現的語言文字不規范、不文明現象，重點整治歪曲音、形、義，編造網絡黑話爛梗，濫用隱晦表達等突出問題。

Read the rest of this entry »

Permalink Comments (3)

Annals of lenition

October 16, 2024 @ 5:07 am · Filed by Mark Liberman under Phonetics and phonology

What do you hear?

…or here?

Read the rest of this entry »

Permalink Comments (23)

Trespassed update

October 15, 2024 @ 7:36 am · Filed by Victor Mair under Grammar

I'm at a motel in Nampa, Idaho.

A sign posted on a side entrance reads:

DO NOT LEAVE DOOR

OPEN YOU WILL BE

TRESPASSED.

Read the rest of this entry »

Permalink Comments (25)

Invisible text via Unicode tag characters

October 15, 2024 @ 6:10 am · Filed by Mark Liberman under WTF

If you open this file in your browser, you'll see only an an left square bracket followed by a right square bracket, with nothing in between:

But if I run the file through a perl script that I wrote long ago to print out character codes and their names, I get

|[| 0x005B "LEFT SQUARE BRACKET"
|| 0xE004C "TAG LATIN CAPITAL LETTER L"
|| 0xE0061 "TAG LATIN SMALL LETTER A"
|| 0xE006E "TAG LATIN SMALL LETTER N"
|| 0xE0067 "TAG LATIN SMALL LETTER G"
|| 0xE0075 "TAG LATIN SMALL LETTER U"
|| 0xE0061 "TAG LATIN SMALL LETTER A"
|| 0xE0067 "TAG LATIN SMALL LETTER G"
|| 0xE0065 "TAG LATIN SMALL LETTER E"
|| 0xE0020 "TAG SPACE"
|| 0xE004C "TAG LATIN CAPITAL LETTER L"
|| 0xE006F "TAG LATIN SMALL LETTER O"
|| 0xE0067 "TAG LATIN SMALL LETTER G"
|]| 0x005D "RIGHT SQUARE BRACKET"

Read the rest of this entry »

Permalink Comments (12)

Shaikh Zubayr

October 14, 2024 @ 6:36 am · Filed by Mark Liberman under Etymology

Sean Swanick, "Shaikh Zubayr", Duke University Libraries Blog, 4/13/2016:

A man lost at sea, having drifted far away from his native Iraqi lands, comes a shore in England. In due time he will be nicknamed the Bard of Avon but upon landing on the Saxon coast, his passport reportedly read: Shaikh Zubayr. A knowledgeable man with great writing prowess from a small town called Zubayr in Iraq. He came to be known in the West as Shakespeare and was given the first name of William. William Shakespeare of Zubayr.

Read the rest of this entry »

Permalink Comments (15)

Passed

October 14, 2024 @ 6:06 am · Filed by Victor Mair under Usage

There are many euphemisms for saying that someone died, two of the most common being "passed away" and "passed on". Lately, I've been hearing more and more people announce that so-and-so simply "passed". The first few times that I heard it spoken that way, I thought it sounded strange. Now, however, I'm so accustomed to this usage that it almost sounds normal, though I'm still barely to the point of being comfortable in saying it myself.

Read the rest of this entry »

Permalink Comments (45)

Southeast Asians learning Mandarin

October 13, 2024 @ 7:34 am · Filed by Victor Mair under Language reform, Language teaching and learning, Pedagogy, Writing systems

Anh Yeo is a Chinese from Vietnam. Currently she is studying in a graduate program of Chinese language and literature at Tsinghua University. To earn pocket money, she has taken up a job teaching Southeast Asia office workers Mandarin online. In response to this post "Aborted character simplification in the mid-1930s" (10/5/24), which had much to do with character simplification (or not) in Singapore, she wrote to me as follows:

I had two lessons tonight teaching Pinyin. Southeast Asians learn Pinyin fast (similar alphabet + existence of tones in Thai and Vietnamese), but because of that students are reliant on Pinyin and cannot remember characters! I have students learning for 3-4 months and still have to read off Pinyin (recognizing fewer than 50 characters). I always thought the coexistence of characters and Latin alphabet in Mandarin interesting!

Read the rest of this entry »

Permalink Comments (26)

"Deppenapostrophe": Is English guilty after all?

October 12, 2024 @ 5:37 pm · Filed by Mark Liberman under Orthography

Andreas Stolcke responds to "English is innocent" (10/10/2024):

The historical facts cited are correct, but they don't explain why the frequency of 's rose in the post-WW2 period, and again after about 2005 (= the internet), as indicated by the Google Ngrams plot below.

The bump in the post-war era (after 1957) could be an effect of the Allied occupation (delayed by the book publishing process), which was reversed by the mid-1990s, and then encouraged again by the internet half a century later.

So my bet is still on an English (language) influence.

Read the rest of this entry »

Permalink Comments (16)

AI Hyperauthorship

October 12, 2024 @ 11:42 am · Filed by Mark Liberman under Artificial intelligence

This paper's content is interesting — Mirzadeh, Iman, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models." arXiv preprint arXiv:2410.05229 (2024). In short, the authors found that small changes in Grade-School Mathematics benchmark questions, like substituting different numerical values or adding irrelevant clauses, caused all the tested LLMs to do worse. You should read the whole thing for the details, to which I'll return another time.

Read the rest of this entry »

Permalink Comments (3)

AI Overview: bits and pieces (German "ich" and Starbuck WA)

October 11, 2024 @ 7:08 pm · Filed by Victor Mair under Artificial intelligence, Language and business, Language on the internets, Punctuation, Toponymy

Things are happening very fast with this "experimental" internet search tool. I realize that it is a new technology, so naturally there are bugs and kinks that have to be worked out, and I don't want to be too harsh with it. Moreover, at a certain level, it is already serving a yeomanly purpose.

For instance, I asked Google, "should ich be capitalized in the middle of a sentence". AI Overview (henceforth AIO) promptly stepped in and provided the following straightforward response:

No, ich should not be capitalized in the middle of a sentence in German. In German, the first-person singular pronoun ich (I) is only capitalized when it is the first word in a sentence.

In English, the singular “I” is always capitalized when used as a first-person personal pronoun. This includes all contractions of “I”, such as “I'm” and “I'll”.

Other things to capitalize in the middle of a sentence include…. [details omitted here]

Read the rest of this entry »

Permalink Comments (8)

Aurora (Colorado and Borealis)

October 11, 2024 @ 3:58 pm · Filed by Mark Liberman under Humor, Linguistics in the comics

Jonathan Weisman, "As Trump Arrives, Aurora Insists It’s Not the ‘War Zone’ He Sees", NYT 10/11/2024.

And today's xkcd:

Read the rest of this entry »

Permalink Comments (2)

Ben Zimmer on Keywords

October 11, 2024 @ 6:58 am · Filed by Mark Liberman under Words words words

Christine Oh, "Wolf Humanities Center hosts linguist, columnist Ben Zimmer for lecture on 'keywords'", The Daily Pennsylvanian 10/11/2024:

The Wolf Humanities Center hosted Wall Street Journal language columnist Ben Zimmer at the ARCH building for a talk titled “Lexical Sleuthing in the Digital Age: On the Trail of Keywords and their Cultural Worlds” on Oct. 9.

Zimmer — who was a research associate at Penn’s former Institute for Research in Cognitive Science from 2005 to 2006 — gave a presentation on lexicology and linguistics followed by a question and answer session with roughly 40 attendees. The event drew a crowd of linguists and language enthusiasts from Penn's campus and the Philadelphia area.

Read the rest of this entry »

Permalink Comments (14)

AI Overview: Snake River and Walla Walla

October 10, 2024 @ 10:26 am · Filed by Victor Mair under Artificial intelligence, Etymology, Language on the internets, Toponymy

[N.B.: If you don't have time to read through this long and complicated post, cut to the "Closing note" at the bottom.]

Lately when I do Google searches, especially on obscure and challenging subjects, AI Overview leaps into the fray and takes precedence at the very top, displacing Wikipedia down below, and even Google's own responses, which have been increasingly frequent in recent months, are pushed over to the top right.

AI Overview, on first glance, seems convenient and useful, but — when I start to dig deeper, I find that there are problems. As an example, I will give the case of the name of the Snake River, and maybe mention a few other instances of AI Overview falling short, but still being swiftly, though superficially, helpful.

Read the rest of this entry »

Permalink Comments (8)

Language Log

PRC cyberspace authorities fight against bad memes and distorted pronunciation

Annals of lenition

Trespassed update

Invisible text via Unicode tag characters

Shaikh Zubayr

Passed

Southeast Asians learning Mandarin

"Deppenapostrophe": Is English guilty after all?

AI Hyperauthorship

AI Overview: bits and pieces (German "ich" and Starbuck WA)

Aurora (Colorado and Borealis)

Ben Zimmer on Keywords

AI Overview: Snake River and Walla Walla

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta