Archive for November, 2019

A diarization corpus from Amazon

About a month ago, Zaid Ahmed and others in Amazon's speech research group released DiPCo ("Dinner Party Corpus"), "a new data set that will help speech scientists address the difficult problem of separating speech signals in reverberant rooms with multiple speakers".

The past decade has seen striking progress in Human Language Technology, brought about by new methods, more training data, and (especially) cheaper/faster computers. But this rapid progress highlights the fact that "All problems are not solved", as I wrote last year — and in particular, the central problem of "diarization", or determining who spoken when, has turned out to be a surprisingly difficult one. And diarization is not just hard for conversations at dinner parties.

Read the rest of this entry »

Comments (2)

Don't eat and don't drink

Wang Tong sent in this photograph of a sign which a friend of hers took during a visit to Japan.  The Chinese translation is quite amusing.

Read the rest of this entry »

Comments (19)

Hong Kong protests: "recover" or "liberate"

From Alison Winters:

I am a regular reader of Language Log and really enjoy your digging on unusual Chinese turns of phrase.

One word I have recently been puzzling over lately is the usage of guāngfù 光复 in the Hong Kong call to arms 光复香港时代革命*. The dictionary description indicates it has to do with reclaiming land from an occupier, and specifically references the end of Japanese occupation in Taiwan, but in English the slogan has been translated as “liberate”. When I look up “liberate” in the other direction, the dictionary suggests jiěfàng 解放, but note that it’s also associated with the CPC victory over the KMT.

I wonder if the usage of 光复 for liberate is a quirk of Cantonese (I live in mainland and only speak Standard Chinese), or if it’s a political choice to use that word based on previous “liberations”? I am curious about the etymology and would be interested to see a write-up on the blog, if you know a bit more background.

[*VHM:  A "standard" English translation of this slogan is "Liberate Hong Kong, the revolution of our times", the "loose" Cantonese Romanization for which is "Gwong Fuk Heung Gong! Si Doi Gark Ming!"  Source

Read the rest of this entry »

Comments (11)

University City train station notes

Announcements

1.

"Please be visible to the engineer OR* train will not stop."

*spoken with very heavy emphasis

Is there a choice?

2.

"Your attention please:  trains en route to destination may be late.  Passengers are advised* that times may increase or decrease** at any time."

*the preceding three words are uttered with rising crescendo, with a slight fall at the end

**strong emphasis on each of the preceding three words

This entire announcement is spoken in a seemingly snide, sneering, pompous tone.  No sympathy or apology whatsoever.  (In Japan, the railway administration is thoroughly ashamed when a train is half a minute late.  In Austria, where many of my relatives worked for the railways as much as a century or more ago, one could set your watch by the arrival and departure of the trains.)  I loathe this announcement more than any other — especially when one is made to wait for an hour or more, after which a train may simply be cancelled without explanation.

Read the rest of this entry »

Comments (26)

Kabbalist NLP

Oscar Schwartz, "Natural Language Processing Dates Back to Kabbalist Mystics", IEEE Spectrum 10/28/2019 ("Long before NLP became a hot field in AI, people devised rules and machines to manipulate language"):

The story begins in medieval Spain. In the late 1200s, a Jewish mystic by the name of Abraham Abulafia sat down at a table in his small house in Barcelona, picked up a quill, dipped it in ink, and began combining the letters of the Hebrew alphabet in strange and seemingly random ways. Aleph with Bet, Bet with Gimmel, Gimmel with Aleph and Bet, and so on.

Abulafia called this practice “the science of the combination of letters.” He wasn’t actually combining letters at random; instead he was carefully following a secret set of rules that he had devised while studying an ancient Kabbalistic text called the Sefer Yetsirah. This book describes how God created “all that is formed and all that is spoken” by combining Hebrew letters according to sacred formulas. In one section, God exhausts all possible two-letter combinations of the 22 Hebrew letters.

By studying the Sefer Yetsirah, Abulafia gained the insight that linguistic symbols can be manipulated with formal rules in order to create new, interesting, insightful sentences. To this end, he spent months generating thousands of combinations of the 22 letters of the Hebrew alphabet and eventually emerged with a series of books that he claimed were endowed with prophetic wisdom.

Comments (6)

Moist! Chuckle! Slacks! Dollop!

Below is a guest post from Kavita Pillay, co-host of the new Subtitle podcast.


Do you hate a seemingly normal word for reasons that you can't quite pinpoint?

Or, are there words that you love to say out loud?

If so, the Subtitle podcast (more on us below) wants to hear from you!

On Nov. 19th, we're airing an episode on words we love…and love to hate. From reading the comments section of Language Log, we've noticed that Language Log fans and readers have very well articulated opinions when it comes to word aversion, word rage, and word affinity. Now you can share those opinions with the world.

Here's what to do:

  1. Open the voice memos app on your phone
  2. Record a 30 second (or so) message about a word (or words) you love or loath
  3. Feel free to include your name if you feel comfortable doing so, and / or a brief explanation about how the word(s) in question make you feel.
  4. Once you complete the recording, email it to subtitlepod@gmail.com, and we may use it in our upcoming episode on word affinity / word aversion

DEADLINE: Monday, Nov. 11th

Feel free to share this request with others. We'd especially love to hear from people for whom English is not their native language. And if you are completely perplexed as to why anyone would love or hate a normal word, then that's all the more reason to tune into our Nov. 19th episode.

A little about us: Subtitle is a podcast about languages and the people who speak them, co-hosted by Patrick Cox and Kavita Pillay. It's the successor podcast to The World in Words, which previously aired on PRI's The World. Funding for Subtitle comes from the National Endowment for the Humanities and excerpts from every episode will begin airing this fall on NPR's Here & Now.

Many thanks, and we look forward to your voice memos!

Read the rest of this entry »

Comments (4)

Acronyms in China

Recently, one of my students found an interesting post from the Communist Youth League about the use of Hanyu Pinyin acronyms on the Internet. When people type on Weibo, WeChat, and other social media, they frequently use Pinyin acronyms. For examples:

Read the rest of this entry »

Comments (15)

Mastering Caution amidst Hermeneutic Acrobatics

[This is a guest post by Nicholas Morrow Williams]

Victor recently pointed out to me the appearance of Martin Kern’s important article in the latest issue of Early China on “Xi Shuai” 蟋蟀 (“Cricket”) and Its Consequences: Issues in Early Chinese Poetry and Textual Studies” (Early China 42 [2019]: 39–74).  Kern’s article offers both a very detailed examination of the poem “Cricket” contained in a Tsinghua manuscript, which differs substantially from the comparable poem in the Shijing 詩經, and also reflections on the broader significance of the manuscript for “textual studies.”

The article is well worth reading both the recently-discovered poem and for the broader reflections, but I would like to discuss one issue to which it does not devote so much attention, which is the interpretation of the received text of “Cricket” in the Shijing itself. After comparing the excavated and received texts, Kern concludes:

Read the rest of this entry »

Comments (6)

Four infinitives in search of an object

Brendan O'Leary's A Treatise on Northern Ireland, Volume I starts with a quotation from Spinoza's Tractatus Politicus:

Sedulo curavi, humanas actiones non ridere, non lugere, neque detestari, sed intelligere.

I have labored carefully, not to ridicule, or detest, but to understand.

That's Brendan's translation, which captures the relevant essence, although it leaves out the second of Spinoza's four infinitives (ridere, lugere, detestari, intelligere) and also their object (humanas actiones). Reading this yesterday afternoon, on a train returning from a committee meeting in DC, I mentally supplied the missing English words, and realized that the result is problematically awkward, in a way that punctuation can't fix:

I have labored carefully not to ridicule, not to lament, and not to detest, but to understand human actions.

This led me to ponder (not for the first time) the stylistic advantages — or at least differences — of Latin's inflectional morphology and free word order.

Read the rest of this entry »

Comments (16)