Archive for Parsing

Are you in the book today?

[This is a guest post by Nathan Hopson, who sent along the two screen shots with which it begins.]

Another splendid example of why punctuation matters and why machine translation is dumb…

Read the rest of this entry »

Comments (18)

Vietnamese without diacritics

From Reddit:

[Click to embiggen]

Read the rest of this entry »

Comments (7)

Words without vowels

Our recent discussions about syllabicity ("Readings" below) made me wonder whether it's possible to have syllables, words, and whole sentences without vowels.  That led me to this example from Nuxalk on Omniglot:

Sample

clhp'xwlhtlhplhhskwts' / xłp̓χʷłtłpłłskʷc̓

IPA transcription

xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ

Translation

Then he had had in his possession a bunchberry plant.

This is an example of a word with no vowels, something that is quite common in Nuxalk.

Souce: Nater, Hank F. (1984). The Bella Coola Language. Mercury Series; Canadian Ethnology Service (No. 92). Ottawa: National Museums of Canada.

Read the rest of this entry »

Comments (35)

Automatic Pinyin annotation — state of the art

[This is a guest post by Gábor Ugray]

Back in 2018 your post Pinyin for phonetic annotation planted an idea in my head that I’ve been gradually expanding ever since. I am now at a stage where I routinely create annotated Chinese text for myself; this (pdf) is what one such document looks like.

Read the rest of this entry »

Comments (4)

HouseHold GarBage

Dick Margulis saw this in a hospital waiting room in the University of Hong Kong Shenzhen Hospital:

Read the rest of this entry »

Comments (13)

The challenging importance of spacing in Korean

Fascinating article from BLARB (Blog // Los Angeles Review of Books:

"Our Language Battle: Korea’s Surprisingly Addictive Game Show of Vocabulary, Expressions, and Proper Spacing", by Colin Marshall (9/1/19)

This is the second paragraph of the article:

Having found myself living in the genuinely foreign country of Korea, I’ve lately also found myself watching Our Language Battle (우리말 겨루기), a game show that has aired every Monday evening on KBS since 2003. Though it occasionally invites celebrities, and this past July even brought on members of the National Assembly, it usually pits four everyday Koreans (or four teams of two, usually family) against each other in a test of their knowledge of the Korean language. It begins simply enough, with the contestants buzzing in to guess the words or phrases that fill in a crossword-style board, but soon the challenges get dramatically harder: separating folk spellings and regional variations from the officially standard, filling in words missing from old television and newspaper clips, and — most difficult of all, even for contestants who otherwise dominate the game — properly re-spacing a text whose words all run together.

Read the rest of this entry »

Comments (58)

The importance of proper parsing and punctuation

Currently circulating on Facebook and on Chinese social media are seemingly impenetrable sentences with the same character repeated numerous times.  When you first look at them, your eyes glaze over and you can't make any sense of them.  But if you slow down and think about such sentences, you usually can figure them out without too much effort.  In fact,  I could read some of the following right off upon first encounter.  Others required more effort before I was able to crack them.

Although it looks formidable, of the six sample sentences treated in this post, this one was easiest for me.  I could understand it at one go.  [N.B.:  In my treatment of these sentences, I first give the Pinyin with spaces between each syllable, then repeat the Pinyin with requisite parsing and punctuation.]

1.

míng míng míng míng míng bái bái bái xǐ huān tā dàn tā jiù shì bù shuō

明明明明明白白白喜欢他但他就是不说

Míngmíng míngmíng míngbái Báibái xǐhuān tā, dàn tā jiùshì bù shuō.

"Mingming clearly knew that Baibai liked her, but he just wouldn't say it."

Read the rest of this entry »

Comments (17)

"and himself jail"

In "More Cohen Businesses Coming to Light," on Talking Points Memo, Josh Marshall writes:

The biggest taxi operator in New York, Evgeny “Gene” Friedman, now manages Cohen’s 30+ NYC medallions or at least did the last time we spoke to him. Friedman has been struggling for the last year to keep his taxi businesses out of bankruptcy and himself jail.

The final three words of the boldfaced clause present a weird, and dare I say unusual, case of double ellipsis. The semantic content communicated by those three words (in the context of the sentence) is richer than you'd think could be expressed by only three words, especially given that one of them is merely the conjunction and. That content can be represented as follows, with the struck-through text standing for the content that the reader must infer:

Friedman has been struggling for the last year to keep his taxi businesses out of bankruptcy and to keep himself out of jail.

There's nothing unusual about the first omission; I don't see anything wrong with the clause to keep his taxi businesses out of bankruptcy and himself out of jail. But the omission of out of strikes me as very strange, and what's even stranger is that to my ear, the clause is worse if to keep is put back:

* Friedman has been struggling for the last year to keep his taxi businesses out of bankruptcy and to keep himself jail.

Read the rest of this entry »

Comments (31)

Pinyin in 1961 propaganda poster art

From Geoff Dawson:

On display in a current exhibition at the National Library of Australia.

Read the rest of this entry »

Comments (9)

A polysyllabic character that can be read in two different ways

Photo taken in Hangzhou by Nikita Kuzmin's Chinese teacher:

Read the rest of this entry »

Comments (5)

"Intelligent transportation communication systems"?

This morning's email brought an invitation to contribute to a "Special Issue on Intelligent Transportation Communication Systems" (for this journal). It took me a little while to figure out that conversing with cars (which I'm definitely in favor of) was not what they had in mind. And this process  reminded me of how difficult it can be for humans — never mind machines — to figure out how to parse complex nominals in English. (See "The Stress and Structure of Modified Noun Phrases in English" for some antique thoughts on the subject…)

Read the rest of this entry »

Comments (10)

Resisting reunification

Comments (28)

Court fight over Oxford commas and asyndetic lists

Language Log often weighs in when courts try to nail down the meaning of a statute. Laws are written in natural language—though one might long, by formalization, to end the thousand natural ambiguities that text is heir to—and thus judges are forced to play linguist.

Happily, this week's "case in the news" is one where the lawyers managed to identify several relevant considerations and bring them to the judges for weighing.

Most news outlets reported the case as being about the Oxford comma (or serial comma)—the optional comma just before the end of a list. Here, for example, is the New York Times:

Read the rest of this entry »

Comments (20)