Indigenous languages of Taiwan

How many are there?

Taiwan’s unrecognized indigenous tribes are reviving dead languages to achieve recognition

There are currently 16 officially recognized indigenous peoples in Taiwan. The Pingpu — which comprise 10 groups on the island’s lowlands — are lobbying to make that number 17, and they’re doing it by reviving lost languages and culture.

By Jordyn Haime, The China Project (6/5/23)

In contemporary Mandarin, many of the speakers of these languages are called shāndì tóngbāo 山地同胞 ("mountain countrymen / compatriots"), which meshes well with the opening paragraph of Haime's article:

Long before Chinese settlers came to the flat, sprawling lands of the Pingtung plain — the southern Taiwanese county now known for its pineapple and mango production — the area was inhabited by Pingpu (plains indigenous) tribes like the Makatao. Waves of colonization pushed indigenous tribes from their ancestral lands and closer to the mountains, or in some cases, to the other side of the island.

Read the rest of this entry »

Comments (15)


Victorious Secret

The next event in the Salon Sanctuary concert series is "Victorious Secret: Love Gamed and Gender Untamed in the Sparkling Courts of the Baroque":

Before the bars of gender binaries caged the mainstream operatic imagination, a golden age of fluidity guided the vocal soundscape. Virility declared itself with the castrato’s clarion high notes, while femininity spoke in earthy tessiture that plunged to shimmering depths.

Texts of the period revel in ambiguity, unfurling genderless narratives of anonymous lovers and unnamed beloveds. Stories of active pursuit and passive reverie remain alike at loose ends, with neat resolutions many movements away.

Please join us for this special program in honor of Pride Month, as the music of the past reveals a golden underground of nonbinary riches, accompanying us in our witness to a new Renaissance.

Read the rest of this entry »

Comments (15)


Is it a rat's head or a duck's neck?

Main dish served as part of a college cafeteria lunch in Nanchang, China:

Read the rest of this entry »

Comments (8)


Old Sinitic "wheat" and Early Middle Sinitic "camel"

[This is a guest post by Chris Button]

OC uvulars tended to condition rounding (e.g OC q- becoming EMC kw-). In the case of ʁ-, we sometimes get m- (for a modern-day example, note how惟, which also had a ʁ- onset in Old Chinese, gives an m- reflex in Fuzhou Min). The classic example is 卯, where Pulleyblank once postulated ʁ- and Li Fang-kuei notes lack of evidence for a cluster, such as ml- or mr-, in its Tai loan. Unfortunately Li’s Tai evidence tends to either be ignored (e.g. 丑 hr- is often erroneously reconstructed with a nasal hn- based on misleading xiesheng evidence) or overly literally interpreted (e.g. 戌 χ- being treated as something like sm-).

Read the rest of this entry »

Comments (24)


Language change (about to be?) in progress

Current big news around here is the collapse of an elevated section of Interstate 95 due to a tanker truck fire.

As Wikipedia explains, I-95 "is the main north–south Interstate Highway on the East Coast of the United States, running from U.S. Route 1 (US 1) in Miami, Florida, north to the Houlton–Woodstock Border Crossing between Maine and the Canadian province of New Brunswick". It's also an important connection between the Great Northeast and the rest of Philadelphia, which enables this linguistic joke:

Read the rest of this entry »

Comments (27)


Ravens on the garden path

I just ran across a particularly impressive garden path sentence in Bernd Heinrich's book RAVENS IN WINTER (p. 268); it took me several tries to get this sentence to parse grammatically:

"Even the wolverine is said to do nothing to drive ravens off that land beside it and steal its food."

(Of course parsing is no problem if the sentence is spoken.  But in written form, for me at least, "and steal its food" just didn't seem to fit at first.  My mis-parse was reading "off" as the head of a prepositional phrase.)

Comments (19)


Thai to English translation gets injected with Tamil

[This is a guest post by Charles Belov]

I pasted the following Thai, which I got from a YouTube channel, into Google translate. The results were mostly in English, but Google Translate injected some apparent Tamil as well and then just gives up and leaves some of the Thai untranslated.

"ตลอดระยะเวลาการทำงานในวงการบันเทิงมันทำให้เราได้เรียนรู้ว่าจริงๆ เเล้วความสุขอยู่รอบตัวเราไปหมด เเล้วความสุขมันง่ายมาก จริงๆ บางทีความสุขมันก็ไม่ต้องมีเงินเยอะมากมาย ความสุขในชีวิตของผมมันคือการมีอิสรภาพ

ผมรู้สึกว่ามันเเค่ต้อง balance ชีวิตให้มากขึ้น รักตัวเองให้เป็น เงินก็ต้องหา เเต่ก็ต้องให้เวลากับตัวเอง เเคร์ตัวเอง เเคร์คนอื่นน้อยลง"

ฟิล์ม ธนภัทร คนหิวความสำเร็จ กับอิสรภาพของชีวิต

translated to English as:

"During the time of working in the entertainment industry, it made us learn that really, happiness doesn't need much money, so much happiness. in my life it is கெர்பியைப்ப்பு

I feel that you have to find balance in your life, but you have to make time for yourself, take care of yourself, and take care of others less"

Film ตันที่ร ตั้วิที่ สุ้วิต้ามี่ สุ้าวิต้วั่ม

Read the rest of this entry »

Comments (3)


Quirky speech-to-text, weird diarization

From Daniel Deutsch:

We had a long drive yesterday, so we listened to a “robot” reading the entire indictment. It certainly isn’t flawless, but I was surprised by how good it is, especially when it gets “excited” while enacting dialogue.

Indeed, the text-to-speech quality is quite good — though unfortunately they don't tell us which TTS software they used.

Here's the opening, which is indeed entirely clear and even nearly natural-sounding:

Read the rest of this entry »

Comments (2)


Apostrophes in Hanyu Pinyin

The most famous instance of the use of an apostrophe in Hanyu Pinyin romanization is in the place name "Xi'an", the capital of Shaanxi (the doubled "a" is another story) Province.

Xī'ān 西安 — two characters signifying "Western Peace"

If you don't use an apostrophe to separate the syllables, you end up with the monosyllable "xian", which — depending upon the tone and the character it is meant to represent — could mean dozens of different things.

Mark Swofford has carried out an interesting investigation on "Mandarin words with more than one apostrophe", Pinyin News (6/11/23).

Read the rest of this entry »

Comments (3)


ChatGPT has a sense of humor (sort of)

Benj Edwards has a mirthful article in Ars Technica (6/9/23)

Researchers discover that ChatGPT prefers repeating 25 jokes over and over

When tested, "Over 90% of 1,008 generated jokes were the same 25 jokes."

[includes an AI generated image of "a laughing robot"]

On Wednesday, two German researchers, Sophie Jentzsch and Kristian Kersting, released a paper that examines the ability of OpenAI's ChatGPT-3.5 to understand and generate humor. In particular, they discovered that ChatGPT's knowledge of jokes is fairly limited: During a test run, 90 percent of 1,008 generations were the same 25 jokes, leading them to conclude that the responses were likely learned and memorized during the AI model's training rather than being newly generated.

The two researchers, associated with the Institute for Software Technology, German Aerospace Center (DLR), and Technical University Darmstadt, explored the nuances of humor found within ChatGPT's 3.5 version (not the newer GPT-4 version) through a series of experiments focusing on joke generation, explanation, and detection. They conducted these experiments by prompting ChatGPT without having access to the model's inner workings or data set.

[Jentzsch and Kersting] listed the top 25 most frequently generated jokes in order of occurrence. Below, we've listed the top 10 with the exact number of occurrences (among the 1,008 generations) in parenthesis:

Read the rest of this entry »

Comments (12)


"Steak the First"

Enlightening article by Peter Backhaus in The Japan Times (6/9/23):

"Za grammar notes: How to properly handle the 'the' in Japanese"

Japanese seems to be able to assimilate any English word, including the ubiquitous definite article "the", which is unlike anything in Japanese itself.

If there’s something like a Murphy’s Law for syntax, the name of this restaurant near my school is a pretty good example of it. Reading “Steak The First,” it always makes me wonder how these three words came to be aligned in just that order. “The first steak,” “first the steak,” “the steak first” — all of these seem safe for consumption. But “steak the first”?

In order to understand what’s going on here, we need to appreciate the very specific way the little word “the” is used in Japanese, where it is normally pronounced ザ (za). Note that the reading may change to ジ (ji) when the following word starts with a vowel, as in the name of the invincible Japanese rock band The Alfee, which officially reads ジ・アルフィー (ji arufī).

But since Japanese is a language that normally gets along perfectly well without articles, it’s a bit challenging to understand what use it can make of ザ in the first place. Even more puzzling is that, more often than not, ザ shows up in places where English syntax wouldn’t want you to put an article at all.

Read the rest of this entry »

Comments (12)


"The beautiful mind paper boxes"

The most recent Trump indictment reproduces this exchange of text messages (p. 11) :

Trump Employee 2:

We can definitely make it work if we move his
papers into the lake room?

Trump Employee 1:

There is still a little room in the shower where his
other stuff is. Is it only his papers he cares about?
Theres some other stuff in there that are not papers.
Could that go to storage? Or does he want everything
in there on property

Trump Employee 2:

Yes – anything that's not the beautiful mind paper
boxes can definitely go to storage. Want to take a
look at the space and start moving tomorrow AM?

Read the rest of this entry »

Comments (16)


InternLM

As I am about to deliver a keynote address to an international conference on Chinese language pedagogy, I receive news of this new LLM that knocks my socks off:

InternLM is a multilingual large language model jointly developed by Shanghai AI Lab and SenseTime (with equal contribution), in collaboration with the Chinese University of Hong Kong, Fudan University, and Shanghai Jiaotong University.

Technical report: [PDF]

Note: Please right click the link above to directly download the PDF file.

Abstract

We present InternLM, a multilingual foundational language model with 104B parameters. InternLM is pre-trained on a large corpora with 1.6T tokens with a multi-phase progressive process, and then fine-tuned to align with human preferences. We also developed a training system called Uniscale-LLM for efficient large language model training. The evaluation on a number of benchmarks shows that InternLM achieves state-of-the-art performance in multiple aspects, including knowledge understanding, reading comprehension, mathematics, and coding. With such well-rounded capabilities, InternLM achieves outstanding performances on comprehensive exams, including MMLU, AGIEval, C-Eval and GAOKAO-Bench, without resorting to external tools. On these benchmarks, InternLM not only significantly outperforms open-source models, but also obtains superior performance compared to ChatGPT. Also, InternLM demonstrates excellent capability of understanding Chinese language and Chinese culture, which makes it a suitable foundation model to support Chinese-oriented language applications. This manuscript gives a detailed study of our results, with benchmarks and examples across a diverse set of knowledge domains and tasks.

Read the rest of this entry »

Comments (1)