Archive for Parsing

Parsing puzzle of the week

"Short Wave: A Physics Legend", NPR Up First 4/3/2022 [emphasis added]:

In the 1950's, a particle physicist made a landmark discovery that changed what we thought we knew about how our universe operates. Chien-Shiung Wu did it while raising a family and an ocean away from her relatives in China. In this episode from NPR's daily science podcast Short Wave, we delve into the life and impact of Chien-Shiung Wu, widely considered the "queen of nuclear physics."

Read the rest of this entry »

Comments (17)

Garden path of the day

This NYT link text needed a second reading for me to break the initial prepositional phrase after "Bruce Springsteen", and start the main-clause subject conjunction with "Bob Dylan":

Like Bruce Springsteen, Bob Dylan, Paul Simon, Tina Turner and others have all sold rights to their music for eye-popping prices.

Read the rest of this entry »

Comments (33)

Nordic amorous room

@JDMayger May 4:

Read the rest of this entry »

Comments (14)

Ted Cruz in big trouble

Ben Hull writes:

In our Computational Linguistics class we were discussing different methods of segmenting Chinese character texts. Today I came across a terrific example of the problems of segmenting left to right, in the first sentence of the attached image. I hope you find it as amusing as I did.

Read the rest of this entry »

Comments (6)

Chinglish cornucopia

Photos taken and curated (also here) by Ruan Qi:

1. "Chī duōshǎo ná duōshǎo 吃多少拿多少" – "Take as much AS YOU CAN" –> "Take as much as you eat".

This is from a hotel in Shaoxing, Zhejiang, serving buffet.

Read the rest of this entry »

Comments (5)

Mandarin tongue twister

Trending on Weibo, a Chinese microblogging website:

[So as not to give anything away, all syllables are separated and not divided into words.]

Nǐ de huò lā lā lā bù lā lā bù lā duō? Huò lā lā lā bù lā lā bù lā duō yào kàn nǐ de huò lā dé duō bù duō. Rú guǒ lā dé bù duō jiù lā nǐ de lā bù lā duō, rú guǒ lā dé duō jiù bù lā nǐ de lā bù lā duō.

"你的货拉拉拉不拉拉不拉多?货拉拉拉不拉拉不拉多要看你的货拉得多不多。如果拉得不多就拉你的拉不拉多,如果拉得多就不拉你的拉不拉多。"

Google Translate:

"Your cargo pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls more? If you pull too much, it won’t pull you.

Before turning the page, if you know Mandarin, try to parse and translate the above sentences.

Read the rest of this entry »

Comments (4)

Dependency Grammar v. Constituency Grammar

Edward Stabler, "Three Mathematical Foundations for Syntax", Annual Review of Linguistics 2019:

Three different foundational ideas can be identified in recent syntactic theory: structure from substitution classes, structure from dependencies among heads, and structure as the result of optimizing preferences. As formulated in this review, it is easy to see that these three ideas are completely independent. Each has a different mathematical foundation, each suggests a different natural connection to meaning, and each implies something different about how language acquisition could work. Since they are all well supported by the evidence, these three ideas are found in various mixtures in the prominent syntactic traditions. From this perspective, if syntax springs fundamentally from a single basic human ability, it is an ability that exploits a coincidence of a number of very different things.

The mathematical distinction between constituency (or "phrase-structure") grammars and dependency grammars is an old one. Most people in the trade view the two systems as notational variants, differing in convenience for certain kinds of operations and connections to other modes of analysis, but basically expressing the same things. That's essentially true, as I'll illustrate below in a simple example. But Stabler is also right to observe that the two formalisms focus attention on two different insights about linguistic structure. (I'll leave the third category, "optimizing preferences", for another occasion…)

This distinction has come up in two different ways for me recently. First, ling001 has gotten to the (just two) lectures on syntax, and because of the recent popularity of dependency grammar, I need to explain the difference to students with diverse backgrounds and interests, some of whom find any discussion of syntactic structure opaque. And second, someone recently asked me about whether anyone had used dependency grammar in analyzing music. (The answer seems to be "mostly not" — though see this paper —  but the relevant question really is what the advantages of dependency models in this application might be.)

Read the rest of this entry »

Comments (14)

Are you in the book today?

[This is a guest post by Nathan Hopson, who sent along the two screen shots with which it begins.]

Another splendid example of why punctuation matters and why machine translation is dumb…

Read the rest of this entry »

Comments (18)

Vietnamese without diacritics

From Reddit:

[Click to embiggen]

Read the rest of this entry »

Comments (7)

Words without vowels

Our recent discussions about syllabicity ("Readings" below) made me wonder whether it's possible to have syllables, words, and whole sentences without vowels.  That led me to this example from Nuxalk on Omniglot:

Sample

clhp'xwlhtlhplhhskwts' / xłp̓χʷłtłpłłskʷc̓

IPA transcription

xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ

Translation

Then he had had in his possession a bunchberry plant.

This is an example of a word with no vowels, something that is quite common in Nuxalk.

Souce: Nater, Hank F. (1984). The Bella Coola Language. Mercury Series; Canadian Ethnology Service (No. 92). Ottawa: National Museums of Canada.

Read the rest of this entry »

Comments (35)

Automatic Pinyin annotation — state of the art

[This is a guest post by Gábor Ugray]

Back in 2018 your post Pinyin for phonetic annotation planted an idea in my head that I’ve been gradually expanding ever since. I am now at a stage where I routinely create annotated Chinese text for myself; this (pdf) is what one such document looks like.

Read the rest of this entry »

Comments (4)

HouseHold GarBage

Dick Margulis saw this in a hospital waiting room in the University of Hong Kong Shenzhen Hospital:

Read the rest of this entry »

Comments (13)

The challenging importance of spacing in Korean

Fascinating article from BLARB (Blog // Los Angeles Review of Books:

"Our Language Battle: Korea’s Surprisingly Addictive Game Show of Vocabulary, Expressions, and Proper Spacing", by Colin Marshall (9/1/19)

This is the second paragraph of the article:

Having found myself living in the genuinely foreign country of Korea, I’ve lately also found myself watching Our Language Battle (우리말 겨루기), a game show that has aired every Monday evening on KBS since 2003. Though it occasionally invites celebrities, and this past July even brought on members of the National Assembly, it usually pits four everyday Koreans (or four teams of two, usually family) against each other in a test of their knowledge of the Korean language. It begins simply enough, with the contestants buzzing in to guess the words or phrases that fill in a crossword-style board, but soon the challenges get dramatically harder: separating folk spellings and regional variations from the officially standard, filling in words missing from old television and newspaper clips, and — most difficult of all, even for contestants who otherwise dominate the game — properly re-spacing a text whose words all run together.

Read the rest of this entry »

Comments (58)