Language Log

"National Language" in the Xinjiang Uyghur Autonomous Region

December 14, 2019 @ 12:04 pm · Filed by Victor Mair under Language and ethnicity, Language and politics

Many people have been asking me about the use of the term Guóyǔ 国语 ("National Language") for "Mandarin" in Xinjiang today. Here's an inquiry from Peter Moody:

I have encountered what seems to be an anomaly in contemporary Chinese usage, and have been assured that you are among those most capable of addressing it.

I was reading an analysis by a Darren Byler, a "Xinjiang Scholar," of a 2017 classified directive from Zhu Hailun, Gauleiter of Xinjiang, on how properly to run the concentration camps in that territory (https://supchina.com/2019/12/04/a-xinjiang-scholars-close-reading-of-the-china-cables/). (I have not looked either at the full English translation of these directives, or the Chinese text, although both are available. I figured the analysis would give the gist of them.)

Read the rest of this entry »

Permalink Comments (17)

Amazing new Japanese words

December 13, 2019 @ 5:27 pm · Filed by Victor Mair under Borrowing, Neologisms, Transcription, Translation

These come from the following nippon.com article:

"Pay It Forward: The Top New Japanese Words for 2019" (12/13/19)

I'll list the words first, then explain which one is my favorite.

A prefatory note: nearly half of the words on these lists are based wholly or partly on borrowings from English, though they are assimilated into Japanese in such a manner that they are unrecognizable to monolingual English speakers.

Read the rest of this entry »

Permalink Comments (8)

Seeding Mars

December 13, 2019 @ 1:33 pm · Filed by Mark Liberman under Humor

The set-up for yesterday's SMBC:

Read the rest of this entry »

Permalink Comments (3)

D for Dog, L for Love

December 12, 2019 @ 11:46 am · Filed by Victor Mair under Alphabets, Phonetics and phonology

When confirming reservations on the phone with clerical folks in certain southeast Asian countries, Paul Midler noticed they often used variations of the NATO phonetic alphabet. “D for Dog” and “L for Love” seemed to be a couple consistent additions. Passing through a travel agency in Thailand, he saw this:

Read the rest of this entry »

Permalink Comments (36)

Quantum Supremacy

December 12, 2019 @ 10:05 am · Filed by Mark Liberman under Linguistics in the comics

For the past couple of months, the phrase "Quantum Supremacy" has been on my to-blog list, based on points and counterpoints like "Google scientists say they’ve achieved ‘quantum supremacy’ breakthrough over classical computers" (WaPo 10/23/2019) and "IBM Says Google’s Quantum Leap Was a Quantum Flop" (Wired 10/21/2019). My interest, at least on the LLOG dimension, was not in the argument about how difficult a particular problem is for classical computers, but rather in the use of the word supremacy.

Now I can take this one off the stack, because a recent SMBC does a better job than I would have:

Read the rest of this entry »

Permalink Comments (10)

Please stoop

December 12, 2019 @ 10:03 am · Filed by Victor Mair under Language and society, Language play, Style and register

Photograph from Paul M in Taipei:

Read the rest of this entry »

Permalink Comments (4)

be;eza

December 12, 2019 @ 10:02 am · Filed by Victor Mair under Language and fashion, Spelling, Style and register

Sign on the front of a fashion store (shoes and handbags) in Taipei:

Read the rest of this entry »

Permalink Comments (3)

Exotic letter in Taipei

December 12, 2019 @ 10:01 am · Filed by Victor Mair under Alphabets, Language and advertising, Language and fashion

Paul M. sent in this photograph of the front of a fashion shop on Yongkang Street, Da’an District, Taipei City, Taiwan:

Read the rest of this entry »

Permalink Comments (6)

Cat chat and tax talk

December 12, 2019 @ 9:48 am · Filed by Victor Mair under Language and animals, Language and politics, Slogans

Photograph of a campaign billboard in Taiwan showing President Tsai Ing-wen, who is up for reelection on January 11, with one of her two beloved cats:

(Source: anonymous colleague)

Read the rest of this entry »

Permalink Comments (2)

AI is brittle

December 11, 2019 @ 4:21 am · Filed by Mark Liberman under Computational linguistics, Elephant semifics

Following up "Shelties On Alki Story Forest" (11/26/2019) and "The right boot of the warner of the baron" (12/6/2019), here's some recent testimony from engineers at Google about the brittleness of contemporary speech-to-text systems: Arun Narayanan et al., "Recognizing Long-Form Speech Using Streaming End-To-End Models", arXiv 10/24/2019.

The goal of that paper is to document some methods for making things better. But I want to underline the fact that considerable headroom remains, even with the massive amounts of training material and computational resources available to a company like Google.

Modern AI (almost) works because of machine learning techniques that find patterns in training data, rather than relying on human programming of explicit rules. A weakness of this approach has always been that generalization to material different in any way from the training set can be unpredictably poor. (Though of course rule- or constraint-based approaches to AI generally never even got off the ground at all.) "End-to-end" techniques, which eliminate human-defined layers like words, so that speech-to-text systems learn to map directly between sound waveforms and letter strings, are especially brittle.

Read the rest of this entry »

Permalink Comments (6)

The impact of phonetic inputting on Chinese languages

December 9, 2019 @ 9:03 am · Filed by Victor Mair under Alphabets, Language reform, Writing systems

The vast majority of people, both inside and outside of China, input characters on cell phones, computers, and other electronic devices via Hanyu Pinyin or other phonetic script. Naturally, this has had a huge impact on the relationship between users of the Chinese script and their command of the characters, since they are no longer directly writing the characters through neuro-muscular coordination and effort. Instead, their electronic devices do the writing of the characters for them by converting the Pinyin or other phonetic inputting to the desired characters, resulting in the widely lamented phenomenon of "character amnesia", which we have touched upon in dozens of LL posts.

There has in recent years been a lot of stuff and nonsense bandied about concerning how Chinese character inputting led to the development of predictive typing, whereas the actuality is that the extreme cumbersomeness of the Chinese writing system necessitated the development of one kind of predictive typing (other predictive algorithms were already in use long before) to rescue the characters from hasty extinction.

Read the rest of this entry »

Permalink Comments (56)

People of X

December 8, 2019 @ 1:34 am · Filed by Mark Liberman under Usage

In the discussion of Boris Johnson's misperceived phrase ("Was it 'people of colour' or 'people of talent'?", 12/6/2019), several people expressed the opinion that "people of talent" is an unexpected way to refer to the group that he wants to welcome. Thus Rose Eneri:

My question is why does Mr. Johnson use such as odd phrase. Why does he not say, "talented people" or "people with skills we need?" I don't know of any other use of the phrase, "people of…" This fracas demonstrates the perils of using one.

Actually there are quite a few other possible values for X in "people of X", where the phrase means something like "people who have X": faith, goodwill, conscience, influence, integrity, character, means, authority, importance, intelligence, vision, quality, . . .

Read the rest of this entry »

Permalink Comments (13)

The Mandarin grammatical particle "le" — one or many?

December 7, 2019 @ 8:31 pm · Filed by Victor Mair under Grammar, Historical linguistics, Idioms, Language teaching and learning

When I was learning Mandarin over half a century ago, the more grammatically minded Chinese language teachers argued that historically and functionally there were multiple "le" particles that just happened to end up being written with the simple two-stroke character 了. Then a contrary movement set in, and linguists tried to prune down all the "le" into two or even one, claiming that all of the different 了 developed out of an ur-了.

The irony of it all is that, before the 20th century, there was no established, systematic, explicit grammar for Sinitic languages in indigenous sources.

See, inter alia, Victor H. Mair (1997), "Ma Jianzhong and the Invention of Chinese Grammar," in Chaofen Sun, ed., Studies on the History of Chinese Syntax. Monograph Series Number 10 of Journal of Chinese Linguistics, 5-26. (available on JSTOR here)

Mǎshì wéntōng 馬氏文通 (conventionally rendered as "Ma's Grammar", though it would probably be closer to the original meaning in Chinese to translate it as "Written Language Unobstructedness"; 1898)

Just as we have seen in a recent post, before the 20th century there was no Chinese concept of "word":

"HouseHold GarBage" (12/6/19)

Which leads to the question: can you have grammar without words?

There have been countless papers, articles, dissertations, and monographs on le 了. Here I'm going to introduce two dissertations on le 了 written within the last few decades and the latest monograph on le 了 as representative of what has been happening with regard to the conceptualization of this protean particle in recent times.

Read the rest of this entry »

Permalink Comments (42)

Language Log

"National Language" in the Xinjiang Uyghur Autonomous Region

Amazing new Japanese words

Seeding Mars

D for Dog, L for Love

Quantum Supremacy

Please stoop

be;eza

Exotic letter in Taipei

Cat chat and tax talk

AI is brittle

The impact of phonetic inputting on Chinese languages

People of X

The Mandarin grammatical particle "le" — one or many?

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta