Archive for February, 2023

Syllable rhythm in English and Mandarin

I've always been skeptical of the distinction between "stress-timed" and "syllable-timed" languages, at least as a claim about the phonetic facts of speech timing as opposed to the psychological dimensions of speech production and perception. Syllable durations in all languages vary widely, due to differences in the intrinsic durations of different vowels and consonants, the effects of phrasal position and emphasis, and many other factors. As a result, inter-stress intervals in languages like English or German are not actually "isochronous", and neither are inter-syllable intervals in languages like French or Spanish. And it's not even true that speakers generally make such intervals closer to isochronous than the relevant timing factors would otherwise predict.

But in "Speech rhythms and brain rhythms", 12/2/2013, I showed a plot of the average syllable-scale power spectrum in the 6300 American-English sentences in the TIMIT dataset, which indicated a key periodicity at 2.4 Hz. I noted that "2.4 Hz corresponds to a period of 417 msec, which is too long for syllables in this material. In fact, the TIMIT dataset as a whole has 80363 syllables in 16918.1 seconds, for an average of 210.5 msec per syllable, so that 417 msec is within 1% of the average duration of two syllables. […] One hypothesis might be that this somehow reflects the organization of English speech rhythm into 'feet' or 'stress groups', typically consisting of a stressed syllable followed by one or more unstressed syllables."

I added that "Unfortunately there aren't any datasets comparable to TIMIT in other languages; but I'll see what I can come up with as a more-or-less parallel test in languages that are said to be 'syllable timed' rather than 'stress timed." Almost ten years later, I've never delivered on that promise, though it would have been easy to do so. So for today's Breakfast Experiment™ I'll show the same analysis for the 6300 sentences in the recently-published Global TIMIT Mandarin Chinese dataset.

Read the rest of this entry »

Comments (23)

ChatGPT writes VHM

[This is a guest post by ChatGPT]

Write a Language Log blog post by Victor H. Mair

Title: "The Wonders of the Chinese Language"

As one of the world's oldest and most complex languages, Chinese has captivated linguists and language enthusiasts for centuries. From its unique writing system to its vast vocabulary and intricate grammar, Chinese is a fascinating and challenging language that continues to surprise and delight those who study it.

Perhaps the most distinctive feature of Chinese is its writing system. Chinese characters are visual symbols that convey meaning, and they are used in both written and spoken language. Unlike the phonetic alphabets used in many other languages, Chinese characters do not correspond to specific sounds, but rather to concepts or ideas. This makes the learning process much more complex, as students must memorize thousands of characters and learn their meanings and pronunciations.

Read the rest of this entry »

Comments (22)

Iteration marks and repeaters in ancient Chinese texts

Let us begin this post with a brief introduction to the 16th-century Hokkien (Minnan) drama, Tale of the Lychee Mirror:

The Tale of the Lychee Mirror (traditional Chinese: 荔鏡記; simplified Chinese: 荔镜记; pinyin: Lì jìng jì; Pe̍h-ōe-jī: Nāi-kèng-kì, Lē-kèng-kì) is a play written by an unknown author in the Ming dynasty. Tân Saⁿ and Gō͘-niû (traditional Chinese: 陳三五娘; simplified Chinese: 陈三五娘; pinyin: Chén Sān Wǔniáng; Pe̍h-ōe-jī: Tân-saⁿ-Gō͘-niû) is a popular Taiwanese opera based on the script.


Read the rest of this entry »

Comments (1)

Taiwan Navy recruitment ad language puzzle

Photo of a Taiwan Naval Academy recruitment ad in the Taipei MRT which references the One Piece ワンピース manga series from Japan:

Read the rest of this entry »

Comments (6)

Diversification of Proto-Austronesian

Important archeological news from Tainan:

South Taiwan park renovation project paused after archaeological artifacts unearthed

 Artifact pieces belonging to neolithic Niuchouzi Culture discovered, date back to 3000-4500 years ago.

By Stephanie Chiang, Taiwan News (2/26/23)

Finds include "orange-colored pottery made of fine sand-bearing rope patterns, polished hoe-axes, polished adze-chisels, and shell mounds."

The nature of this culture is intriguing in that one of its most distinctive features is the red cord-marked pottery that has been found at the Wangliao archeological site in Tainan’s Yongkang park.

The dating roughly corresponds to the estimated beginning of the diversification of Proto-Austronesian (PAN / PAn).

Read the rest of this entry »

Comments (4)

Transcription vs. transliteration vs. translation in cartography

In this post, I wanted to do something that I thought would be fairly simple, viz., address the question of the "rectification" of Russian place names in areas proximate to populations speaking Sinitic languages.  This sort of rectification is also a hot topic where Russia borders on Ukraine.  There, however, the task is simpler, because Russian and Ukrainian are both written in Cyrillic, whereas, in the Russo-Sinitic case, the former is written in the phonetic Cyrillic alphabet, while the latter is written in morphosyllabic Sinoglyphs, a completely different type of writing system.

Everywhere we encounter references to the transliteration of Chinese characters into alphabetic scripts (or vice versa), whereas I maintain that cannot be done because the Sinitic writing system doesn't have any letters that can be transferred over into the letters of an alphabetic script.  Consequently, when talking about the conversion of Sinoglyphic writing to alphabetic scripts, I always speak of it as transcription.

Technically, transliteration is concerned primarily with accurately representing the graphemes of another script, whilst transcription is concerned primarily with representing its phonemes.


Read the rest of this entry »

Comments (11)

"Crisis" mentality infects China

From the recent meeting between Putin and Wang Yi (Director of the Office of the Central Foreign Affairs Commission of the Chinese Communist Party):

Read the rest of this entry »

Comments (1)

The impenetrability of cursive for students from the PRC

Today I had a revelation about my handwriting on the blackboard.

By far the majority of students in all of my classes come from mainland China.  They are by nature reticent to speak up, but when it comes to engaging in discussion about material that I have written on the board, they are essentially deadly silent.

I know they're smart and should be able to respond to at least some of my questions, but often they just stare intensely at the writing on the board, almost as though they are in pain.

My handwriting is famously poor, as I have confessed and documented in numerous previous Language Log posts, so I do try to slow down a bit and write clearly when at the board, but often my impatience gets the better of me, and when I speed up, all bets are off that others will comprehend.

Today, I intentionally wrote as clearly as possible (for me).  Still no reactions from the class.  I became frustrated and asked them why they did not answer.

Read the rest of this entry »

Comments (30)

Vignettes of quality data impoverishment in the world of PRC AI

Some snippets:

Limited data sets a hurdle as China plays catch-up to ChatGPT

Lack of high-quality Chinese texts on Internet a barrier to training AI models.

Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)

Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”

Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.

GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.

Read the rest of this entry »

Comments (11)

Spaceless pinyin

From the importer's label, carefully placed to obscure the safety instructions (the "do"s and "do not"s) of an electronic gas igniter:

Read the rest of this entry »

Comments (9)

Multi-modal writing among Hong Kong teens

From Jenny Chu:

Knowing your interest in multi-modal writing systems, I thought you might be amused by the attached screencap. It is from a WhatsApp group chat of S6 (final year) students in Hong Kong; one of them is asking the others what they would like to do on the afternoon of their last day of classes:

Read the rest of this entry »

Comments (4)

A crook that protects your belongings

Comments (4)

Do not major in the changing room

Comments off