{"id":30700,"date":"2017-01-29T08:38:04","date_gmt":"2017-01-29T13:38:04","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=30700"},"modified":"2017-01-29T08:57:39","modified_gmt":"2017-01-29T13:57:39","slug":"why-electronic-machine-translation-services-sometimes-seem-to-fail","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=30700","title":{"rendered":"Why electronic machine translation services sometimes seem to fail"},"content":{"rendered":"<p>The inability of Google Translate, Microsoft Translator, Baidu Fanyi, and other translation services to correctly render <a href=\"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=30686\">j\u012b ni\u00e1n d\u00e0j\u00ed \u9e21\u5e74\u5927\u5409<\/a> (\"may the \/ your year of the chicken be greatly auspicious!\") in various languages points up a vital distinction that I have long wanted to make, and now is as good a time as ever.\u00a0 Namely, just as you could not expect these translation services to handle Cantonese, Shanghainese, Taiwanese, etc. (unless specifically and separately programmed to do so), we should not expect them to deal with Literary Sinitic \/ Classical Chinese (LS \/ CC).<\/p>\n<p><!--more--><\/p>\n<p>These are all different languages, and electronic translation software, like human brains, cannot be programmed and trained in such a way that they can simultaneously translate material coming from different languages.<\/p>\n<p>The only exception is when bits and pieces of these other languages have been embedded in Modern Standard Mandarin (MSM) and regularized there in such a way that they have for all intents and purposes been borrowed as part of MSM vocabulary, e.g., m\u01ceid\u0101n \u4e70\u5355 \/ m\u00e1id\u0101n \u57cb\u5355 \/ m\u00e0id\u0101n \u5356\u5355 (\"pay the bill\") from Cantonese maai4daan1 \u57cb\u5355 (\"call for the bill \/ check\").\u00a0 Note that, even though MSM m\u01ceid\u0101n \u4e70\u5355 \/ m\u00e1id\u0101n \u57cb\u5355 \/ m\u00e0id\u0101n \u5356\u5355 (\"pay the bill\") is written in three different ways with three separate pronunciations, translation software can deal with all of them because they occur in MSM with sufficiently high frequency to be recognized as an integral, naturalized part of MSM vocabulary.<\/p>\n<p>The same holds for vocabulary coming from LS \/ CC, e.g., q\u01d0y\u01d2uc\u01d0l\u01d0 \u5c82\u6709\u6b64\u7406 (\"ridiculous; outrageous; absurd\") and s\u00e0iw\u0113ngsh\u012bm\u01ce \u585e\u7fc1\u5931\u9a6c (\"blessing in disguise\").\u00a0 The translations offered for such expressions are not always felicitous and may vary widely, depending upon whether they are trying to convey the overall gist or the literal meaning, but at least they recognize these expressions as constituting lexical, syntactic units within MSM.\u00a0 For this reason, I approve of Google Translate's pinyinization of such expressions as single units.<\/p>\n<p>The same is true of countless other MSM lexical items from a wide variety of sources beyond Cantonese and LS \/ CC.<\/p>\n<p>Similar criteria obtain for borrowings from diverse derivations in English, German, Japanese, and other languages.<\/p>\n<p>Electronic translation software programs for Sinitic, so far, are for MSM. \u00a0 They recognize and are generally capable of dealing with MSM vocabulary, grammar, and syntax quite effectively, and indeed often impressively so.\u00a0 I do not consider that they have failed when somebody throws an auspicious chicken &#8212; whoops! a monkey wrench \/ spanner &#8212; into the ointment \/ works.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The inability of Google Translate, Microsoft Translator, Baidu Fanyi, and other translation services to correctly render j\u012b ni\u00e1n d\u00e0j\u00ed \u9e21\u5e74\u5927\u5409 (\"may the \/ your year of the chicken be greatly auspicious!\") in various languages points up a vital distinction that I have long wanted to make, and now is as good a time as ever.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[194,176,196,80,224,205],"tags":[],"class_list":["post-30700","post","type-post","status-publish","format-standard","hentry","category-borrowing","category-diglossia-and-digraphia","category-language-and-computers","category-style-and-register","category-topolects","category-translation"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/30700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30700"}],"version-history":[{"count":6,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/30700\/revisions"}],"predecessor-version":[{"id":30714,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/30700\/revisions\/30714"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30700"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30700"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}