« previous post |

[This is a guest post by Nathan Hopson, who sent along the two screen shots with which it begins.]

Another splendid example of why punctuation matters and why machine translation is dumb…

With an h/t to my Earlham kōhai (後輩 ["junior schoolmate"]) Becki Kanou, who's also a big LL fan, I'm attaching two screencaps that will have both prophets of AI-fueled utopia and Oxford-comma haters alike weeping.

The horizontally aligned one is my own, since I decided to test this myself before succumbing to embarrassing clickbait. Sadly, it's real.

Google Translate, as good as it is for the basics in many world language pairs, still sucks at CJK. I maintain that in addition to all the other obvious issues, this is in no small part because without spaces, word parsing is hard. Really hard. And worse, really intuitive and high context.

The Japanese is:

今日本にいますか。

What's frankly baffling is that while the Romanization is correctly parsed and transliterated, the English fails to live up to that promise:

"Are you in the book today?"

My guess is that the lack of a comma after 今 is the culprit. With the comma, there's no ambiguity. It's 100% "Are you in Japan?" Without it, artificial stupidity probably saw 今日 as a pair before it moved linearly to 日本 and prioritized that pair. Linear processing for nonlinear processes is not pretty.

It's all very, Eats, Shoots and Leaves.

Selected reading

Permalink