Language Log

Parsing puzzle of the week

April 9, 2022 @ 1:52 pm· Filed by Mark Liberman under Computational linguistics, Parsing

"Short Wave: A Physics Legend", NPR Up First 4/3/2022 [emphasis added]:

In the 1950's, a particle physicist made a landmark discovery that changed what we thought we knew about how our universe operates. Chien-Shiung Wu did it while raising a family and an ocean away from her relatives in China. In this episode from NPR's daily science podcast Short Wave, we delve into the life and impact of Chien-Shiung Wu, widely considered the "queen of nuclear physics."

Read the rest of this entry »

Permalink Comments (17)

Garden path of the day

December 20, 2021 @ 7:36 am· Filed by Mark Liberman under Orthography, Parsing, Psychology of language

This NYT link text needed a second reading for me to break the initial prepositional phrase after "Bruce Springsteen", and start the main-clause subject conjunction with "Bob Dylan":

Like Bruce Springsteen, Bob Dylan, Paul Simon, Tina Turner and others have all sold rights to their music for eye-popping prices.

Read the rest of this entry »

Permalink Comments (33)

Nordic amorous room

May 5, 2021 @ 3:15 pm· Filed by Victor Mair under Grammar, Lost in translation, Orthography, Parsing

@JDMayger May 4:

Any Nordics in China want to explain what’s going on here? @brandhane ? pic.twitter.com/xlaRJtyfxk

— James Mayger (@JDMayger) May 4, 2021

Read the rest of this entry »

Permalink Comments (14)

Ted Cruz in big trouble

February 20, 2021 @ 2:13 pm· Filed by Victor Mair under Computational linguistics, Language and computers, Parsing

Ben Hull writes:

In our Computational Linguistics class we were discussing different methods of segmenting Chinese character texts. Today I came across a terrific example of the problems of segmenting left to right, in the first sentence of the attached image. I hope you find it as amusing as I did.

Read the rest of this entry »

Permalink Comments (6)

Chinglish cornucopia

January 3, 2021 @ 3:17 pm· Filed by Victor Mair under Lost in translation, Parsing, Signs

Photos taken and curated (also here) by Ruan Qi:

1. "Chī duōshǎo ná duōshǎo 吃多少拿多少" – "Take as much AS YOU CAN" –> "Take as much as you eat".

This is from a hotel in Shaoxing, Zhejiang, serving buffet.

Read the rest of this entry »

Permalink Comments (5)

Mandarin tongue twister

October 20, 2020 @ 7:32 am· Filed by Victor Mair under Fillers and pause words, Humor, Parsing, Punctuation

Trending on Weibo, a Chinese microblogging website:

[So as not to give anything away, all syllables are separated and not divided into words.]

Nǐ de huò lā lā lā bù lā lā bù lā duō? Huò lā lā lā bù lā lā bù lā duō yào kàn nǐ de huò lā dé duō bù duō. Rú guǒ lā dé bù duō jiù lā nǐ de lā bù lā duō, rú guǒ lā dé duō jiù bù lā nǐ de lā bù lā duō.

"你的货拉拉拉不拉拉不拉多？货拉拉拉不拉拉不拉多要看你的货拉得多不多。如果拉得不多就拉你的拉不拉多，如果拉得多就不拉你的拉不拉多。"

Google Translate:

"Your cargo pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls, pulls more? If you pull too much, it won’t pull you.

Before turning the page, if you know Mandarin, try to parse and translate the above sentences.

Read the rest of this entry »

Permalink Comments (4)

Dependency Grammar v. Constituency Grammar

October 10, 2020 @ 8:33 am· Filed by Mark Liberman under Parsing

Edward Stabler, "Three Mathematical Foundations for Syntax", Annual Review of Linguistics 2019:

Three different foundational ideas can be identified in recent syntactic theory: structure from substitution classes, structure from dependencies among heads, and structure as the result of optimizing preferences. As formulated in this review, it is easy to see that these three ideas are completely independent. Each has a different mathematical foundation, each suggests a different natural connection to meaning, and each implies something different about how language acquisition could work. Since they are all well supported by the evidence, these three ideas are found in various mixtures in the prominent syntactic traditions. From this perspective, if syntax springs fundamentally from a single basic human ability, it is an ability that exploits a coincidence of a number of very different things.

The mathematical distinction between constituency (or "phrase-structure") grammars and dependency grammars is an old one. Most people in the trade view the two systems as notational variants, differing in convenience for certain kinds of operations and connections to other modes of analysis, but basically expressing the same things. That's essentially true, as I'll illustrate below in a simple example. But Stabler is also right to observe that the two formalisms focus attention on two different insights about linguistic structure. (I'll leave the third category, "optimizing preferences", for another occasion…)

This distinction has come up in two different ways for me recently. First, ling001 has gotten to the (just two) lectures on syntax, and because of the recent popularity of dependency grammar, I need to explain the difference to students with diverse backgrounds and interests, some of whom find any discussion of syntactic structure opaque. And second, someone recently asked me about whether anyone had used dependency grammar in analyzing music. (The answer seems to be "mostly not" — though see this paper — but the relevant question really is what the advantages of dependency models in this application might be.)

Read the rest of this entry »

Permalink Comments (14)

Are you in the book today?

March 19, 2020 @ 7:55 pm· Filed by Victor Mair under Artificial intelligence, Information technology, Parsing, Punctuation, Translation

[This is a guest post by Nathan Hopson, who sent along the two screen shots with which it begins.]

Another splendid example of why punctuation matters and why machine translation is dumb…

Read the rest of this entry »

Permalink Comments (18)

Vietnamese without diacritics

March 16, 2020 @ 1:59 pm· Filed by Victor Mair under Parsing, Tones, Writing systems

From Reddit:

[Click to embiggen]

Read the rest of this entry »

Permalink Comments (7)

Words without vowels

March 2, 2020 @ 1:58 am· Filed by Victor Mair under Orthography, Parsing, Phonetics and phonology

Our recent discussions about syllabicity ("Readings" below) made me wonder whether it's possible to have syllables, words, and whole sentences without vowels. That led me to this example from Nuxalk on Omniglot:

Sample

clhp'xwlhtlhplhhskwts' / xłp̓χʷłtłpłłskʷc̓

IPA transcription

xɬpʼχʷɬtʰɬpʰɬːskʷʰt͡sʼ

Translation

Then he had had in his possession a bunchberry plant.

This is an example of a word with no vowels, something that is quite common in Nuxalk.

Souce: Nater, Hank F. (1984). The Bella Coola Language. Mercury Series; Canadian Ethnology Service (No. 92). Ottawa: National Museums of Canada.

Read the rest of this entry »

Permalink Comments (35)

Automatic Pinyin annotation — state of the art

January 20, 2020 @ 12:40 pm· Filed by Victor Mair under Language teaching and learning, Orthography, Parsing, Phonetics and phonology

[This is a guest post by Gábor Ugray]

Back in 2018 your post Pinyin for phonetic annotation planted an idea in my head that I’ve been gradually expanding ever since. I am now at a stage where I routinely create annotated Chinese text for myself; this (pdf) is what one such document looks like.

Read the rest of this entry »

Permalink Comments (4)

HouseHold GarBage

December 6, 2019 @ 2:01 pm· Filed by Victor Mair under Orthography, Parsing, Writing systems

Dick Margulis saw this in a hospital waiting room in the University of Hong Kong Shenzhen Hospital:

Read the rest of this entry »

Permalink Comments (13)

Literary Sinitic / Classical Chinese dependency parsing

November 27, 2019 @ 9:35 am· Filed by Victor Mair under Information technology, Parsing, Style and register, Translation

We are keenly aware that, while advances in machine translation of Vernacular Sinitic (VS) (Mandarin) are quite impressive and fundamentally serviceable, they cannot be applied directly to the translation of Literary Sinitic / Classical Chinese (LS/CC). That would be like using an Italian translating program for Latin, a Hindi translation program for Sanskrit, or a Modern Greek translation program for Classical Greek, probably even less useful than these parallel cases, because the whole structure and nature of LS/CC and VS are different from each other.

However, now there is available a LS/CC parsing program that takes us on a major step toward a functional system for the machine translation of the literary / classical written language (it is only a written / book language, not a spoken language). It was developed by YASUOKA Koichi 安岡孝一 of Kyoto University's Institute for Research in Humanities (Jinbun kagaku kenkyūjo 人文科学研究所) and is available here.

Read the rest of this entry »

Permalink Comments (5)

Archive for Parsing

Parsing puzzle of the week

Garden path of the day

Nordic amorous room

Ted Cruz in big trouble

Chinglish cornucopia

Mandarin tongue twister

Dependency Grammar v. Constituency Grammar

Are you in the book today?

Vietnamese without diacritics

Words without vowels

IPA transcription

Translation

Automatic Pinyin annotation — state of the art

HouseHold GarBage

Literary Sinitic / Classical Chinese dependency parsing

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta