The right boot of the warner of the baron

Here at the UNESCO LT4All conference, I've noticed that many participants assert or imply that the problems of human language technology have been solved for a few major languages, especially English, so that the problem on the table is how to extend that success to thousands of other languages and varieties.

This is not totally wrong — HLT is a practical reality in many applications, and is being rapidly spread to others. And the problem of digitally underserved speech communities is real and acute.

But it's important to understand that the problems are not all solved, even for English, and that the remaining issues also represent barriers for extensions of the technology to other communities, in that the existing approximate solutions are far too hungry for data and far too short on practical understanding and common sense.

Read the rest of this entry »

Comments (9)


"An indefinite, renewable interprofessional strike"

I'm in Paris for the UNESCO International Conference Language Technologies for All (LT4All), which happens to coincide with another event in France, a national strike that (among other things) created a very long trip from the airport. In my hotel's elevator there was a sign whose first sentence taught me a couple of new words:

Ce Jeudi 5 Décembre aura lieu une grève interprofessionelle reconductible indéterminée.

There was also an English translation:

This Thursday, December 5th, will be an indefinite, renewable interprofessional strike.

 

Comments (22)


"Collapsed" calligraphy

Responding to this recent post about machine analysis of grammar, "Literary Sinitic / Classical Chinese dependency parsing" (11/27/19), Nicholas Morrow Williams writes:

That reminds me tangentially of something I just heard about, an effort to transcribe Japanese "kuzushiji" (cursive-like) script using AI.  This article, which contains some striking illustrations, is about a huge international competition to devise a better method, won apparently by a Chinese team.

Read the rest of this entry »

Comments (9)


Agu hair bian

Here I am standing in front of a hair salon near the south gate of Kansai University in Osaka, Japan two days ago:

Read the rest of this entry »

Comments (5)


Tibet water

Ben Zimmer was just passing through Hong Kong Airport, where he got a bottle of Tibet 5100 spring water, complete with Tibetan script:


Read the rest of this entry »

Comments (9)


Apostropocalypse again

"'Laziness has won': apostrophe society admits its defeat", The Guardian 12/1/2019:

John Richards, who worked in journalism for much of his career, started the Apostrophe Protection Society in 2001 after he retired.

Now 96, Richards is calling time on the society, which lists the three simple rules for correct use of the punctuation mark.

Writing on the society’s website, he said: “Fewer organisations and individuals are now caring about the correct use of the apostrophe in the English language.

“We, and our many supporters worldwide, have done our best but the ignorance and laziness present in modern times have won!”

Read the rest of this entry »

Comments (41)


Tero: an English word in Japanese garb

Three days ago, I passed through immigration at Kansai International Airport (near Osaka).  I was struck by a large, prominently displayed word in katakana (syllabary for transcription of foreign words and onomatopoeia):  tero テロ.

Since I was in a restricted area of the airport, naturally I couldn't take a picture of the signs with this word on them, but I knew right away from the circumstances what it signified:  "terrorism" — they were taking strict precautions against it.

Read the rest of this entry »

Comments (10)


Multiscriptal face writing

We've mentioned "kaomoji " before (see "Readings"), but only gave a few examples.

"Kaomoji 顔文字 ("face character / writing") is a Japanese term for more or less elaborate "drawings" composed of kana, characters, punctuation marks, and now letters and other symbols drawn from a wide range of writing systems.  They can be quite fanciful, even florid.  Some of them are exquisite, breathtakingly beautiful.

I hadn't seen many of them in the past, but in the last few days, Diana Shuheng Zhang started sending a bunch of them to me, and I found them utterly captivating, so I've decided to share some delightful kaomoji with Language Log readers.

Read the rest of this entry »

Comments (14)


"Knock it off, algorithms!"

An experience that's become all too common — as documented in Zits for 11/25 through 11/28:



Read the rest of this entry »

Comments (20)


Command your kitchen

…or at least the faucets in it, using Delta's VoiceIQ Technology.

Delta VoiceIQ Technology pairs with your connected home device to give you exactly the amount of water you need with features like metered dispensing and custom container commands.

I have to say that being able to tell my kitchen faucet to dispense 137 milliliters of hot water, or whatever, is not high on my list of desires. I'm happy enough with good old-fashioned indoor plumbing, reliable supplies of potable water, and filters to take care of residual issues. But apparently the market-research folks at Delta think that the faucet-buying public is more forward-looking than I am.

Read the rest of this entry »

Comments (4)


Loophole-ridden ‘screenplay’ concocted by anti-China forces

[This is a guest post by Jichang Lulu]

This statement, attributed to the new Taiwan Affairs Office spokeswoman of the PRC, reinforced my impression that Relevant Organs (including exoprop media like the Gobar Times (Huánqiú shǐbào 环球屎报 [Global Shit News], a pun for Huánqiú shíbào 环球时报 [Global Times], for which see "Dung Times" [3/14/18])) often start generating unusually quaint English when they go into full patriot mode.

> This is a totally absurd, loophole-ridden 'screenplay' concocted by anti-China forces…

Read the rest of this entry »

Comments (5)


O.K. is rude

Caity Weaver, "Typing These Two Letters Will Scare Your Young Co-Workers: Everything was O.K. until you wrote 'O.K.'", NYT 11/21/2019, starts with a note from someone in Queens:

I am a Gen X-er who generally speaks proper English and am a “digital native.” (Hey, kids: We built these tools that you claim as your own.) When I respond to a text or email with “O.K.,” I mean just that: O.K. As in: I hear you, I understand, I agree, I will do that. If I reply with “K,” I’m just being more informal.

However, I have been informed by my Millennial and Gen Z co-workers that the new thing I’m supposed to type is “kk.” To write “O.K.” or “K,” they tell me, is to be passive-aggressive or imply that I would like the recipient to drop dead. To which I am tempted to respond, “Believe me, if I want you to drop dead … you’ll know.”

I find “kk” loathsome. Are my co-workers being overly sensitive, or am I not acknowledging the nuance of modern communication? I would really like to settle this debate once and for all. O.K.?

Read the rest of this entry »

Comments (82)


Literary Sinitic / Classical Chinese dependency parsing

We are keenly aware that, while advances in machine translation of Vernacular Sinitic (VS) (Mandarin) are quite impressive and fundamentally serviceable, they cannot be applied directly to the translation of Literary Sinitic / Classical Chinese (LS/CC).  That would be like using an Italian translating program for Latin, a Hindi translation program for Sanskrit, or a Modern Greek translation program for Classical Greek, probably even less useful than these parallel cases, because the whole structure and nature of LS/CC and VS are different from each other.

However, now there is available a LS/CC parsing program that takes us on a major step toward a functional system for the machine translation of the literary / classical written language (it is only a written / book language, not a spoken language).  It was developed by  YASUOKA Koichi 安岡 孝一 of Kyoto University's Institute for Research in Humanities (Jinbun kagaku kenkyūjo 人文科学研究所) and is available here.

Read the rest of this entry »

Comments (5)