Hanmoji

Read the rest of this entry »

Comments (2)


Fluent "disfluencies" again

One conventional view of "disfluencies" in speech is that they're the result of confusions and errors, such as difficulties in deciding what to say or how to say it, or changing ideas about what to say or how to say it, or slips of the tongue that need to be corrected. Another idea is that such interpolations can serve to "hold the floor" across a phrase boundary, or to warn listeners that a pause is coming.

These views are supported by the fact that fluent reading lacks filled pauses, restarts, repeated words, and non-speech vocalizations. And as a result, (human) transcripts of interviews, conversations, narratives, and speeches generally edit out all such interpolations, yielding a text that's more like writing, and is easier to read than an accurate transcript would be. Automated speech-to-text systems also generally omit (or falsely transcribe) such things.

The result is a good choice if the goal is readability, but not if the goal is to analyze the dynamics of speech production, speech perception, and conversational interaction. And in fact, even a brief examination of such interpolations in spontaneous speech is enough to tell us that the conventional views are incomplete at best.

I've noticed recently that automated transcripts from rev.ai do a good job of transcribing ums and uhs in English, though repeated words are still omitted. And in the other direction, I've noticed that the transcripts on the site of the U.S. Department of Defense include (some of the) repeated words, but not the filled pauses.  It's interesting to compare those transcripts to the audio (where available) — I offer a sample below.

Read the rest of this entry »

Comments (3)


Don't Occupy Your Seat

With apologies for the glare from the plastic covering, this sign comes from the canteen at Lingnan University in Hong Kong:

Read the rest of this entry »

Comments (11)


Open fire

Tim Frost found this sign last (southern hemisphere) summer at a lakeside in Argentina, near San Martin de los Andes.

Read the rest of this entry »

Comments (9)


The weirdness of traditional note names

Comments (13)


Translated abstracts and titles

There are many different subfields in Chinese Studies:  religion (Buddhism, Daoism, Islam…), art history, archeology, anthropology, language, literature, linguistics, esthetics, philosophy, economics, ethnology, and so forth.  Each of these subfields requires specialized knowledge and command of the requisite terminology.  One cannot expect a generalist to be adequately equipped to deal with all of them.

Read the rest of this entry »

Comments (7)


Omnibus Chinglish, part 4

Yet more fun (see parts 1, 2, and 3).

Don't JuYiGe


(source)

Read the rest of this entry »

Comments (3)


Omnibus Chinglish, part 3

Comments (1)


Super color Doppler, part 2

[This is a guest post by Greg Pringle, in response to questions I posed regarding the photograph at the top of this post from yesterday, mainly: 


What does the Mongolian script say?  Does it match the Chinese*?  Are there any mistakes in it?

*The Chinese is short for "in color with Doppler ultrasound".]

The Mongolian says önggöt – het dolgion – zurag (ᠥᠩᠭᠡᠲᠦ ᠬᠡᠲᠦ ᠳᠣᠯᠭᠢᠶᠠᠨ ᠵᠢᠷᠤᠭ). It literally means "coloured ultra-wave picture" or, as Google Translate has it, "colour ultrasound imaging”. My Inner Mongolian dictionaries confirm that önggöt het dolgion zurag means literally “彩色超声波图” in Chinese and it is found on the Internet with that meaning.

You quote Diana Shuheng Zhang as saying the Chinese means "Color Doppler Ultrasound". I did find önggöt doppler zuraglal (Өнгөт Допплер зураглал) "coloured Doppler sketch” in Mongolian-language pages on the Russian Internet, and Jichang Lulu found a couple of sources from Mongolia.

Rather than continue confirming what you already know, I think it fair to bring up the issue of terminology.

Read the rest of this entry »

Comments (1)


Omnibus Chinglish, part 2

Comments (7)


Omnibus Chinglish, part 1

Fantastic collection of Chinglish examples from WeChat.

There are 18 examples all together.  I've already done 2 or 3 of them (see under "Selected readings" below), and a couple of them are not so great.  That leaves around a dozen that are previously unknown and quite hilarious.  I'll do them in two or three batches.

1.

Read the rest of this entry »

Comments (5)


When more data makes things worse…

The mantra of machine learning, as Fred Jelinek used to say, is "The best data is more data" — because in many areas, there's a Long Tail of relevant cases that are hard to classify or predict without either a valid theory or enough examples.

But a recent meta-analysis of machine-learning work in digital medicine shows, convincingly, that more data can lead to poorer reported performance.  The paper is  Visar Berisha et al., "Digital medicine and the curse of dimensionality", NPJ digital medicine 2021, and one of the pieces of evidence they present is shown in the figure reproduced below:

This analysis considers two types of models: (1) speech-based models for classifying between a control group and patients with a diagnosis of Alzheimer’s disease (Con vs. AD; blue plot) and (2) speech-based models for classifying between a control group and patients with other forms of cognitive impairment (Con vs. CI; red plot).

Read the rest of this entry »

Comments (8)


I dunno1 or I dunno2 or I dunno3?

And don't forget I dunno4 . . .

Today's For Better or For Worse starts this way:

Read the rest of this entry »

Comments (13)