Massive borrowing

« previous post | next post »

Some people freak out when early borrowings from one language into another are pointed out, as though it were an insult to the integrity of the recipient language, or that it somehow clashes with the sacred laws of linguistics.

When looked at dispassionately, borrowing among languages is both normal and pervasive.  In this post, I will demonstrate how widespread borrowing is in several representative, typical languages.

One of the things I love most about English is the richness of its borrowed vocabulary.  This is something I was aware of from the time I was in elementary school, and why I wore out so many dictionaries already by the time I graduated from high school.  I reveled (and still do revel) in the fact that when I speak English, I also use words from hundreds of other languages.  Of course, words of French origin are particularly numerous in English, with nearly 30% of our vocabulary being derived from that language (many of them introduced by Frenchified Germanic Vikings during the Norman invasion) and another 30% coming from Latin.  This means that English — just taking French and Latin into account (never mind Spanish, Portuguese, etc.) — has more words of Romance derivation than of Germanic origin, but that doesn't stop English from being a Germanic language.

One of my favorite reference works is Hobson-Jobson, that magisterial dictionary of Anglo-Indian words and phrases by Col. Henry Yule and A. C. Burnell, Ph.D. (1886), new edition by William Crooke, B.A. (1903) — for a link to the digital version provided by the University of Chicago, click here.

Since the arrival of Buddhism in the East Asian Heartland beginning around two millennia ago, thousands of Sanskrit (and Prakit and Pali) words have poured into Sinitic.  Modern Sinitic (and by extension other East Asian languages) is full of Sanskrit loanwords, such as the following:

chànà 剎那 ("instant" < kṣaṇa)

chán(nà) 禪(那) ("meditation; Zen" < dhyāna)"

púsà 菩薩 ("Bodhisattva" < pútísàduǒ 菩提薩埵 [shortened by taking only the first and third syllables])

fāngbiàn 方便 ("convenience" < upaya), with which regular readers of Language Log are thoroughly familiar

I have dictionaries of Japanese gairai-go 外来語 ("loanwords; borrowed words; words of foreign origin"), some of which have more than fifty thousand entries.  One of my colleagues told me that, off the top of her head, probably around 80% of Japanese words are borrowings.  Another wrote:

Purely guessing, with no statistical background whatever, I would estimate 30% borrowed from Chinese, 15% borrowed from English, and 5% borrowed from other languages. [VHM:  Those figures seem conservative to me]

I’m not sure how to count  a word like tonkatsu, with the ton borrowed from Chinese and the katsu from English, but neither part very recognizable to a native speaker of those languages who does not know Japanese.  Another hard call is pseudo-Chinese terms invented in Japan  and pseudo-English terms similarly invented in Japan (naitaa, sarariiman).  Finally there is the Japanese propensity to abbreviate, so that words like pasokon (from English “personal computer” but unrecognizable to an English speaker), sumahon (from smart phone), or the totally incomprehensible garakei (flip phone, from Galapagos (i.e., living fossil) keitai, a pseudo-Chinese term for mobile phone).

[*The word tonkatsu is a combination of the Sino-Japanese word ton (豚) meaning "pig" and katsu (カツ), which is a shortened form of katsuretsu (カツレツ), the transliteration of the English word cutlet, which again derived from French côtelette, meaning "meat chop". — source]

From Jim Unger:

The Kokuritsu Kokugo Kenkyūjo, now known an the National Institute for Japanese Language and Linguistics (NINJAL), has published large-sample studies of vocabulary since the 1950s.   Volume 2 of the 1971 pp. 16-24 (attached) has the information you seek.  The statistics are summarized in Table 3 and Figures 1 and 2.  Mind the way the authors classify words:  proper nouns, particles, and auxiliary verbs are classed as 語種不要, distinct from 和語, 漢語, and 外来語.  混種語 presumably includes all portmanteau words that combine morphemes from the last three categories.  If the 語種不要 words were redistributed, the 和語 numbers would probably be higher.

Of course, 1971 was nearly 50 years ago, and the percentages of non-Chinese borrowings (both tokens and types) are probably higher today.  There are also hair-splitting issues:  should all kango be counted as borrowings?  (E.g. 経済 was coined in Japan and then borrowed back into Chinese.)  Is 麻雀 (=マージャン)a kango or a gairaigo?  Are bound morphemes words or not?  Leaving such matters aside, a very rough guess would be that roughly 40% of words in a big sample of normal text will be native, another 40% will be borrowings of one kind or another, and the last 20% will be numerals, signs, and a few words of unknown type.  But check out the data yourself:  you may interpret it differently from me.

Here's an especially interesting case of a widely used Japanese word, chongā チョンガー / 総角 ("bachelor") (usually written in katakana) that was borrowed from Korean 총각 chonggak, as related by Nathan Hopson:

The word appears to have entered Japanese in the 1910s. According to one source (an online slang dictionary), it was popularized first in the navy, and to a lesser extent the army.

The 1932 社会ユーモア・モダン語辞典 (Shakai yūmoa/modan-go jiten, "Dictionary of social humor and modern words") gives the following definition:

チョンガー(鮮)
未成年者、独身男

The latter sense — of someone no longer a minor but still (improperly) a bachelor — appears to have largely overtaken the former.

That's fairly clear in the lyrics to a 1970 song by The Drifters, one of Japan's most famous comedy groups:

いやじゃありませんかチョンガーは
靴下三日は我慢して
お尻の破れを手でかくし
銭湯でついでにパンツ洗う

Isn't it awful being a chongā
Wearing the same socks for three days
Covering the hole in your pants seat with your hand
Washing your underwear at the public bath

A colleague in Korean language studies estimates that Korean vocabulary has about 50-60% of Sino-Korean terms, 30-40% pure native Korean, and 5%+ loanwords other languages. "Of course," she says, "this is my very rough guesstimate!"  Judging from my own informal surveys, the amount of English in South Korean seems to be expanding all the time.

From Bob Ramsey:

The best information I have about Korean vocabulary comes from the numbers the Hangeul hakhoe 한글학회 ("Hangul Society") compiled for the dictionary Urimal keun sajeon 우리말 큰 사전 (Korean Dictionary), and which Eomungak 어문각 ("Language") published in 4 volumes in 1991. (The numbers are listed at the end of the last volume.)

Words of native origin: 74,612

Sino-Korean:                85,527

(Modern) loanwords:      3,986

Of course, these numbers are already over 2 and a half decades old (so perhaps the Society now has more recent numbers). And since the modern (South) Korean attitude towards English is one of total availability, the number of loanwords in Korean now is surely much greater. But then again, many of the borrowings from English you see these days are little more than nonce creations with a very short half-life….

I asked Brian Spooner about the situation regarding loanwords in Persian, Arabic, and Turkish.  He replied:

It's correct that all (or pretty much all) the Arabic words in Turkish vocabulary came through Persian. (Is "borrowing" the right term?) The Turks were Persianised before they got to the "Middle East," and the Ottomans used Persian as their language of administration. I wonder whether anyone could say anything about the Persian that got into Arabic before the Ottomans (like divan). The question for Persian is more complex. I suppose it starts with the fact that Arabic is to Persian what Greek and Latin are to English. But since half the population of modern Iran speaks Turkish at home (but writes only Persian) there is also some Turkish. The bigger problem is that written Persian, called New Persian (by  orientalists) since the Arabic script was adopted in the 7th century, is the direct descendant of Achaemenian Persian (written in cuneiform in the 6th and 5th centuries BCE). But what about all the other Iranian languages of the Persianate world spoken by communities that came under Persian administration since then, several of which still continue to be spoken?

Gernot Windfuhr's 2009 edited volume, The Iranian Languages (800 odd pp. in the Routledge Language Family series) does not have a chapter on anything like borrowing, but it has an index entry on loan words–in Balochi, Khotanese, Khwarazmian, Middle Persian, Pamir languages, Parachi, Persian and Tajik, Sogdian, Tumshuqese and Wakhi. Skimming through it I see it says that Arabic constitutes about 50% of literary Persian and 25% of spoken (and of course one has to remember that in the 7th-8th centuries it was the professional scribes from the Sasanian Empire who created "New Persian" by switching from the Aramaic to the Arabic script and, after writing Arabic grammar, incorporated a lot of Arabic vocabulary into Persian, after which starting in the 9th century and the arrival of paper from China Persian became the standard language for writing throughout Central Asia). But elsewhere I see only wording like "a considerable number of…"

I should add that starting under Ataturk in the 1920s there has been a considerable effort to re-turkicise Turkish, to get rid of the Arabic-Persian vocabulary it inherited from the Ottomans, and they have got rid of a lot, but there is still much there that is not always immediately recognisable as non-Turkic in origin. Modern Greek also has a considerable amount of Persian in it (from Ottoman). It's amusing (given the historical struggle between Greeks and Iranians in the age of the Marathon (and it was the Greeks of that period who taught us to call them Persians because they came from Pars, which is now Arabicised as Fars in southern Iran) that we accumulated loanwords from the Greeks and the Greeks have taken them from Persian.

I asked several Turkologists and Altaicists for their assessments of the proportion of Arabic and Persian in the languages they study.

Juha Janhunen:

I think it depends very much on whether we are basing the calculations on text corpora or on dictionary corpora, and whether we are talking of modern standard Turkish (with many neologisms derived from original Turkic roots) or older usage (with many more Arabic and Persian borrowings, in addition to Mongolisms etc.). The proportion of loanwords in a large dictionary corpus of Turkish – counting only word roots and not derivatives – must be very high. It will be interesting to hear what your numerical estimates are.

Peter Golden:

This is difficult to determine. So many words are in flux. Does one give a yanıt or a cevap? Sometimes usage is an indication of political orientation. Conservatives use more Ottoman or Ottoman-style vocabulary (with a strong Arabo-Persian element) and those left of center use more neologisms…although I would be hesitant to make this a hard and fast rule. It is merely an observation, one I first noticed when living in Turkey in 1967-early 1968, but one that seems to continue (although I am not in daily contact with Turkish-speakers).

From Mehmet Olmez:

It is not so easy to answer this question.

From 1900 till 1933, from 1933 till 1980, from 1980 to today we have different answers.

It depends also your social status or your political / religious preferences.

For example, for the  'religious, pious' there was just word in Turkish: dindar. Now mütedeyyin becomes also familiar, because R. T. Erdoğan uses just the word mütedeyyin. TV speakers and some people follow him and use mütedeyyin instead of dindar.

When B. Ecevit was prime minister, öz Türkçe ("pure Turkish") words were more popular. In the last 15 years, Arabic / Ottoman Turkish words are used more often.

About the vocabulary:

There are very limited Chinese words in Turkish which arrived together with Turks to Anatolia; less than 10: for example sındı 'scissor' (dialect word) << jiǎndāo 剪刀 ; sır 'lacquer' << qī 漆. There are some Chinese words which arrived through Mongolian: mantı << mantou (?), tepsi 'dish, plate << diezi. We have direct Mongolian words too, like ağaserin etc. But in daily life Mongolian words are not more than ten (in Ottoman texts there are over 100 Mongolian words, s. C. Schönig).

There are also limited words belonging to Sogdian: borç /borč/, kent maybe genç etc.

Mainly in Turkish borrowings are from the following languages: Arabic, Persian, Greek, and Latin. Latin words are according to others limited and mostly related with navy terms. Of course, after 1800 there is more European – Latin words in Turkish. Arabic words have been borrowed mostly with their Persian form.

Armenian words are popular mostly in dialects, in standard language they are more limited. In southwest or west Anatolian dialects, it is difficult to meet with an Armenian word. But from Çorum, Kayseri, Yozgat, Erzurum, Diyarbakır and similar cities you can find many Armenian words. From my dialect, from Nevşehir, it is very difficult to find Armenian words, but we have huge number of Greek loanwords, specially for agriculture terms.

From other languages, from Bulgarian, Serbian, Rumanian, Hungarian and Russian there are also limited borrowings. Russian is specially familiar in and around Kars: kartolistakan etc.

The best study about Turkish language reform is Geoffrey Lewis' book, The Turkish Language Reform. A Catastrophic SuccessEmmanuel Szurek (in Paris) also works and publishes on language reform, reform of personal names, etc.

About Old Uyghur words which were adopted during language reform: Jens Peter Laut, Die Uigurismen im Tarama Dergisi (1934).

About the structure of new words / neologismus, see K. Röhrborn, Interlinguale Angleichung der Lexik: Aspekte der Europäisierung des türkeitürkischen Wortschatzes.

Arabic words borrowed mostly with their Persian form [through Persian].

As for Greek words (in modern Turkish), Yorgos Dedes has a full list, but it seems that he needs more time to bring it to publication.

About Latin words: Latin words are according to others limited and mostly related with navy terms. Yes, normally we can not encounter many Latin words inside Anatolia: kamara, kaptan, etc. There are also some Latin words, very close to Rumanian form: masa 'desk; table', Rumanian masă. Of course, as everyone knows, after 1800, there have been a lot French borrowings in Turkish.

With Latin, I meant 'Latin languages' like (different) languages from Italy (Venetian or similar) and Rumanian language. My knowledge about loanwords from Latin languages depends on sources such as these:

The Lingua Franca in the Levant. Turkish Nautical Terms of Italian and Greek Origin, Henry & Renée Kahane, University of Illinois, Andreas Tietze, University of Istanbul, 1958 (reprinted at Istanbul 1988).

Meyer, Gustav (1893) Türkische Studien. Die griechischen und romanischen Bestandteile im Wortschatze des Osmanisch-Türkischen.

I have prepared for my own use an index to Meyer and reprinted it for other users: Meyer, Gustav (1998): Türkische Studien. Die griechischen und romanischen Bestandteile im Wortschatze des Osmanisch-Türkischen. Mit einem Geleitwort und einem Index herausgegeben von Mehmet Ölmez, Ankara.

From Marcel Erdal:

Speaking about loans, one should, I think, always exclusively consider the last source language, not the original language from which a loan may ultimately have come. Many (but by no means all) of the 'Arabic loans' of Turkish actually come from Persian, as shown by both phonetic and semantic evidence. In this sense, there are no Latin loans (mentioned by Mehmet) in Turkish. (Neo-Latinist creations in medicine, pharmacy, etc. are a topic by itself, but I would doubt whether Turkish scientists were very active in coining those.)

From Alexander Vovin:

In addition to what Peter, Juha, and Mehmet have already said. The straightforward answer is indeed difficult. First, loans from what languages and into which languages? Turkish and Uzbek would have more Arabic and Persian loans than Kazakh or Kirghiz, let alone Tuvan, Chuvash, and Yakut. Tuvan would have more Mongolic loans than any other Turkic language.

You are also asking about other languages. In Japanese, e.g., more Chinese loans are used in the written language than in the colloquial, and in a newspaper much more than in fiction. In the colloquial, the higher the register is, the more words of Chinese origin one encounters. The situation in Korean vis-a-vis Chinese and in Mongolian vis-a-vis Tibetan is somewhat similar. The situation might be further complicated in the case when languages are closely related. In Russian high register Eastern Slavic words are frequently replaced by their South Slavic cognates.

To conclude with my own words, I have written about the vain efforts of Recep Tayyip Erdogan, president of Turkey, to purify the Turkish language of borrowings from foreign languages, "Putting the kibosh on bosh" (6/18/17):

I'm afraid that, no matter how hard Erdogan or any other purist huffs and puffs, they will not be able to blow away the foreign building blocks which have been used in the construction of the house that is Turkish.  I am the proud owner of the big Redhouse Turkish-English dictionary (I also have on the shelves of my library the Redhouse English-Turkish dictionary which is nearly as large — both of them are around twelve hundred pages in length).  Looking through the pages of Redhouse, I see an enormous number of words from Persian, Arabic, Greek, French, Spanish, English, German, Albanian, Armenian, Hebrew, Russian, Polish, Hungarian, Bulgarian, Serbo-Croatian, Romany, Chinese, Japanese, and Malay (sorry if I missed something).

The same is true of other modern Turkic languages besides Anatolian Turkish.  Henry G. Schwarz's An Uyghur-English Dictionary, about a thousand pages long, is full of words borrowed from Arabic and Persian.  As much as 75% of the vocabulary of Uyghur is Perso-Arabic.  During the 20th century Russian words came flooding in, and now Chinese is having a heavy impact.

If we go back to the earliest traceable stage of the Turkic lexicon, as collected in Gerard Clauson's An Etymological Dictionary of Pre-thirteenth-century Turkish (Clarendon, 1972) and other works of scholarship on early Turkic,  we find words derived from many languages, including Indic (Sanskrit), Iranic (Sogdian, Khotanese), Mongolic / Khitan, Samoyedic, and Sinitic (here again I may have missed some).  The language that served as the source of a number of Old Turkic words that intrigued me the most when I was perusing Clauson was Tocharian, since it may have been derived from the speech of the Bronze Age mummies of Eastern Central Asia and plays such an important role in discussions of the early development of Indo-European ("Early Indo-Europeans in Xinjiang" [11/19/08]).

Is there any language on earth today that is "pure" in the sense of having no lexical borrowings or other types of influences of any sort from other languages?

Reading

"The American Heritage Dictionary of the English Language, 5th edition" (11/14/12)

"Ur-etyma: how many are there?" (7/6/14)

"Sino-Sanskritic 'devil'" (12/11/18)

"Bahasa and the concept of 'National Language'" (3/14/13)

"Are Sanskrit and Chinese 'congenial languages'?" (9/9/13)

"Dung Times" (3/14/18)

"Sanskrit and Pseudo-Sanskrit Daoist incantations" (5/24/18) — with a bibliography of many additional readings

"Of jackal and hide and Old Sinitic reconstructions" (12/16/18) — and many other posts in that series

[Thanks to Linda Chance, Frank Chance, Haewon Cho, and William Hannas]



70 Comments

  1. Coby Lubliner said,

    February 18, 2019 @ 2:23 pm

    What I find striking about Turkish is not only the large number if actual French words borrowed (otel, tuvalet, kuaför…) but the fact that nearly all the international Graeco-Latin vocabulary appears in French form (müzik, jeologi, konser, televizyon, direktör…). This is one of the big differences between Turkish and Azerbaijani.

  2. David Marjanović said,

    February 18, 2019 @ 2:32 pm

    Is there any language on earth today that is "pure" in the sense of having no lexical borrowings or other types of influences of any sort from other languages?

    Whatever they speak on North Sentinel Island is a good candidate. Other than that… I'd be very surprised.

  3. Annie said,

    February 18, 2019 @ 2:44 pm

    For obvious reasons, a lot of Spanish words (particularly beginning with A) come from Arabic.

    Less obvious, the clichéd quintessential Japanese words arigato and tempura probably come from Portuguese.

  4. a said,

    February 18, 2019 @ 2:46 pm

    My favorite to speculate about, though, is whether the American expression "hooey," as in "that's a lot of hooey," comes (via a WWII connection?) from the Russian "khuy," oft used in constructions that mean essentially "You don't know dick."

  5. Annie Gottlieb said,

    February 18, 2019 @ 3:05 pm

    A favorite of mine to speculate about, though, is whether the American English expression "hooey," as in "that's a lot of hooey" (a synonym for "baloney"), comes (via a WWII connection?) from the Russian "khuy," which, if I translate it into English, makes my comment vanish. Victor Erofeyev will explain but I'm not sure comment policy allows links. Will check and post it below if allowed.

  6. Annie Gottlieb said,

    February 18, 2019 @ 3:06 pm

    http://www.russki-mat.net/e/mat_VEvrofeyev.htm

  7. Ian said,

    February 18, 2019 @ 3:18 pm

    I'm constantly shocked by how many Swedish loan words there are in Finnish. This fact is quite thoroughly pointed out, typically, when one first studies Finnish, so you think one would be prepared for it, but it seems like virtually every word in a modern sentence has some roots in Swedish. Many of them are quite opaque, due to the much more restrictive phonotactics of Finnish and the considerably different phonemic inventory, which sometimes makes them hard to spot. Likewise, they're completely nativized, so they inflect just like normal Finnish words, which again makes drawing a comparison back to a Swedish word sometimes difficult. I would guess that non-Uralic (mostly Swedish, Russian, English) loans make up over 70% of the vocabulary, but that's a total shot in the dark.

  8. Kyle B. said,

    February 18, 2019 @ 4:24 pm

    @Annie, interesting on tempura. I didn't know that came from Portuguese. Thanks for pointing it out.

    However, I don't think the consensus is that arigatou (ありがとう)came from Portuguese. At least gogen-allguide thinks it derives from a phrase with the rough meaning appreciation for hard to get things, and that use of arigatou predates Portuguese contact.
    https://en.wikipedia.org/wiki/Glossary_of_Japanese_words_of_Portuguese_origin#Arigatō also says arigatou probably didn't come from Portuguese, though its source doesn't directly say this from what I can tell. The wikipedia entry does agree with the gogen-allguide entry that I'd trust, though http://gogen-allguide.com/a/arigatou.html.

  9. Suzanne Valkemirer said,

    February 18, 2019 @ 5:33 pm

    One must always check dates of earliest known use when proposing etymologies.

    Since the earliest evidence that Merriam-Webster has so far uncovered for hooey is dated 1912, a connection with World War Two is impossible, so that if the word is in any way related to the Russian word mentioned by the posters, that war could not be involved.

    Interjections are notoriously hard to etymologize because of the possibility of independent innovation.

  10. Suzanne Valkemirer said,

    February 18, 2019 @ 5:46 pm

    @ Coby Lubliner

    "What I find striking about Turkish is not only the large number of actual French words borrowed (otel, tuvalet, kuaför…) but the fact that nearly all the international Graeco-Latin vocabulary appears in French form"

    Until about 1840, Italian was the chief Western language of prestige in Istanbul. After that year, more or less, French was.

    That explains the significant influence of French on later Ottoman Turkish. Presumably, if one looked for earlier Turkish loans from the West, most would probably be from Italian.

    Coming to mind (though from Arabic, not Turkish) is warda !, which in many varieties of Levantine Arabic is a warning that dynamite is about to be exploded (it is used by construction workers). The etymon is Italian guarda 'look!' (/gwarda/).

  11. Suzanne Valkemirer said,

    February 18, 2019 @ 5:54 pm

    Marcel Erdal is right when he says, "Speaking about loans, one should, I think, always exclusively consider the last source language, not the original language from which a loan may ultimately have come."

    Only immediate transfer is a sign of contact, hence of influence. Thus, an etymological chain such as Bulgarian < Turkish < Persian < Arabic shows solely Arabic influence on Persian, Persian influence on Turkish, and Turkish influence on Bulgarian.

    For that reason, I doubt that Hebrew, Japanese, or Spanish has had any influence on Turkish (see above). Whoever so claims is presumably looking at non-immediate etymons.

    To put the matter in the simplest terms, if John gives Mary a gift and Mary regifts that gift to James, John has given James nothing.

  12. Jenny Chu said,

    February 18, 2019 @ 6:14 pm

    It's fun to look at dating and borrowing as an indication of when certain topics or objects entered a language. My favorite example is Vietnamese. Dai hoc (university) seems to be related to Cantonese, indicating that people were talking about universities in the period of Chinese rule about a millennium ago. Fanh (brake), xup lo (cauliflower) and tu no vit (screwdriver) apparently showed up during the French colonial period. Hien dai hoa (modernization) seems to be a modern Communist borrowing from Mandarin. The chip (microchip) is something that became common to talk about only once stronger contact with English language sources was established, post Doi moi era.

  13. Bathrobe said,

    February 18, 2019 @ 7:52 pm

    With languages that use or have used Chinese characters, borrowing is often based on te written language. Once the pronunciation of characters is stabilised (which may take place several times), words can be borrowed through the visual medium (the written word) rather than the spoken medium. Moreover, morphemes (in the form of Chinese characters) are borrowed at the same time as words (combinations of morphemes). The fact that Japanese borrowed the morphemes meant that they were able to create their own 'Chinese vocabulary' based on Chinese models, and the Chinese were able to borrow it right back without feeling that the words were strange or foreign.

    It's possible that Vietnamese đại học was borrowed not from Cantonese (via the spoken route) but in the form of the characters 大學, read đại học in Vietnamese.

  14. Jim Breen said,

    February 18, 2019 @ 8:02 pm

    Interesting to see the old chestnut of ありがとう deriving from Portuguese popping up again. I suspect it's partly driven by the common "arri-GAR-to" mispronunciation, which leads people to think it's associated with obrigado.

  15. Bathrobe said,

    February 18, 2019 @ 8:04 pm

    大學 currently means 'university', but in ancient times referred, among other things, to the Great Learning, one of the four books in Confucianism.

  16. Victor Mair said,

    February 18, 2019 @ 8:08 pm

    @Jenny Chu:

    [was planning to post this an hour ago, but didn't get a chance till now]

    Are you sure that that "Dai hoc" of a millennium ago was referring to "university", not to the Confucian classic, "Dàxué 大學" ("The Great Learning")? The latter was extracted from the Book of Rites by the Neo-Confucian thinker, Zhu Xi (1130-1200), about the same time you noted that the term "Dai hoc" circulated in Vietnam.

    https://en.wikipedia.org/wiki/Great_Learning

    https://en.wikipedia.org/wiki/Book_of_Rites

    Chinese "dàxué 大學" ("university") is one of those "round-trip words" that started out in China with one meaning, travelled to Japan where it picked up a new Western meaning, then returned to China with the new meaning. See "East Asian Round-Trip Words" in Sino-Platonic Papers, 34 (Oct., 1992).

    http://www.sino-platonic.org/

    =====

    I will post some extensive, integrated notes on borrowing among Chinese, Japanese, Korean, and Vietnamese from Bill Hannas as soon as I get a chance to type them out, probably tomorrow evening.

  17. Bathrobe said,

    February 18, 2019 @ 8:36 pm

    The "Vietnamese borrowed from Cantonese" trope often comes up, but I'm not sure that this is the current consensus. I don't have any references to hand, but I seem to remember that typical Sino-Vietnamese readings of Chinese characters are thought to have come from some form of southern Mandarin.

    Whatever the case, the nature of contact between Vietnamese and Chinese is an area of considerable interest. As has been mentioned at other posts (if I remember rightly), some claim that Chinese was spoken natively by part of the Vietnamese ruling class. Also, unlike Japan, it's also possible that Chinese-speaking teachers were a common fixture in Vietnam. I've had trouble finding any references (other than Marc Hideo Miyake) that try to give a full picture of the relationship of Sino-Vietnamese character readings to Chinese or of the nature of the contact that led to them.

  18. Bathrobe said,

    February 18, 2019 @ 8:48 pm

    I should have checked Wikipedia: https://en.wikipedia.org/wiki/Sino-Vietnamese_vocabulary

    To quote: "The Old Sino-Vietnamese layer was introduced after the Chinese conquest of the kingdom of Nanyue, including the northern part of Vietnam, in 111 BC. The influence of the Chinese language was particularly felt during the Eastern Han period (25–190 AD), due to increased Chinese immigration and official efforts to sinicize the territory. This layer consists of roughly 400 words, which have been fully assimilated and are treated by Vietnamese speakers as native words."

    "The much more extensive Sino-Vietnamese proper was introduced with Chinese rhyme dictionaries such as the Qieyun in the late Tang dynasty (618–907). Vietnamese scholars used a systematic rendering of Middle Chinese within the phonology of Vietnamese to derive consistent pronunciations for the entire Chinese lexicon. After expelling the Chinese in 938, the Vietnamese sought to build a state on the Chinese model, including using Literary Chinese for all formal writing, including administration and scholarship, until the early 20th century. Around 3,000 words entered Vietnamese over this period. Some of these were re-introductions of words borrowed at the Old Sino-Vietnamese stage, with different pronunciations due to intervening sound changes in Vietnamese and Chinese, and often with a shift in meaning."

  19. Chris Button said,

    February 18, 2019 @ 9:13 pm

    … or that it somehow clashes with the sacred laws of linguistics.

    I would add that although neogrammarian-style fixed sounds laws can indeed be thrown out of the window, every proposal for a borrowing does nonetheless still need to be presented in a linguistically sound manner.

  20. Anthony said,

    February 18, 2019 @ 9:27 pm

    Not meaning to derail this thread, but we should note the passing of Professor Eric Hamp:
    https://en.wikipedia.org/wiki/Eric_P._Hamp

  21. AntC said,

    February 18, 2019 @ 11:56 pm

    early borrowings …, as though it were an insult to the integrity of the recipient language,

    For many of the massive borrowings you mention, there is a clear historical record of close contact between the languages. In the case of English, a whole series of Romance languages carried the prestige over Anglo-Saxon for many centuries.

    Furthermore, there can be clear explanations of a word arriving with the thing. Wheels and wagons/chariots; iron/steel technology; potato, tomato, courgette in English; aubergine in English and cognates in many European languages. Buddhist devotional vocabulary into Thai and Chinese, as mentioned in earlier threads. We still need an explanation for why some languages import the word with the thing, rather than adapting an existing term: pomme-de-terre, pommodoro, eggplant (how come US English was happy to absorb zucchini but not aubergine or melanzana?).

    In earlier threads, I regarded it as not an insult but certainly a surprise/in need of an explanation when it was claimed a language massively borrowed everyday words: that is, words for which there would already be adequate vocabulary/there was no new thing arriving in need of a name. (Presumably: of course it's difficult to establish early vocab.)

    So that is the case with the hypothesised massive borrowing of everyday words from IE/Germanic into Chinese. Where/when was the period of contact? What Germanic culture or peoples could be carrying prestige over Chinese?

  22. John Swindle said,

    February 19, 2019 @ 4:32 am

    Suzanne Valkemirer said, "Only immediate transfer is a sign of contact, hence of influence. Thus, an etymological chain such as Bulgarian < Turkish < Persian < Arabic shows solely Arabic influence on Persian, Persian influence on Turkish, and Turkish influence on Bulgarian."

    I can see that this is true of contact, but how is it true of influence? Religions and technologies and fashions travel from one country to a second and then to a third and a fourth with no doubt that the originating culture has thereby influenced the others along the chain. Why should language be different?

  23. Michael Watts said,

    February 19, 2019 @ 4:45 am

    With languages that use or have used Chinese characters, borrowing is often based on te written language. Once the pronunciation of characters is stabilised (which may take place several times), words can be borrowed through the visual medium (the written word) rather than the spoken medium.

    This is true whenever any two languages share or largely share a writing system. It's not special to Chinese characters. Weren't we just talking about "Amarillo by Morning"?

  24. Bart O'Brien said,

    February 19, 2019 @ 7:19 am

    I've never understood why people use the term 'borrowing' for this phenomenon. To borrow an object is to accept some obligation to return it. That doesn't apply here. Languages don't have any obligation to return a piece of vocabulary to its original language.

    Using 'import' or 'acquire' instead of 'borrow' works perfectly well.

  25. Benjamin Orsatti said,

    February 19, 2019 @ 8:12 am

    Nobody's mentioned Icelandic yet?

    "Deyr fé,
    deyja frændur,
    deyr sjálfur ið sama;
    en orðstír
    deyr aldregi
    hveim er sér góðan getur."

  26. Victor Mair said,

    February 19, 2019 @ 8:31 am

    How did Enkh Erdene learn "Amarillo by Morning" — through the written or oral medium?

  27. Philip Anderson said,

    February 19, 2019 @ 8:42 am

    @AntC
    My understanding is that British cuisine was under French influence, hence courgettes and aubergines, whereas Italian immigrant influence was more significant in the USA, hence zucchini.

    Why eggplant? I don’t know why sometimes a name is borrowed, sometimes invented, or sometimes translated. Some countries prefer to use native words though – but they don’t always catch on with ordinary speakers.

    History shows that words can be borrowed for everyday things, although usually (?) with close contact if not bilingualism. Welsh includes many words from Latin, borrowed into Brittonic during the Roman Empire or early Christian period, for familiar things: e.g. braich (<brachium) for arm, coes (<coxa) for leg. It is that llaw and troed, now meaning hand and foot, originally referred to both arm and hand, and leg and foot, respectively.

  28. Victor Mair said,

    February 19, 2019 @ 8:56 am

    @Benjamin Orsatti

    Please explain what you meant to show by your Icelandic quotation.

    The GT rendering of it doesn't make a whole lot of sense, especially the last line.

    "Deaths,
    die kinsmen,
    dies the same;
    but celebrity
    never dies
    wheat is good can. "

    ======

    Deyr fé,
    deyja frændur,
    deyr sjálfur ið sama;
    en orðstír
    deyr aldregi
    hveim er sér góðan getur."

    Cattle die,
    kinsmen die
    you yourself die;
    One thing now
    that never dies
    the fame of a dead man’s deeds.

    https://en.wikipedia.org/wiki/H%C3%A1vam%C3%A1l

    But it would still be nice to have an explanation of what you're trying to demonstrate.

  29. Benjamin Orsatti said,

    February 19, 2019 @ 9:13 am

    Hello, Prof. Mair. Sorry, I guess I was being a bit unintentionally oracular just there.

    Only to say that Icelandic is believed by many to be a remarkably conservative language, with relatively few borrowings / loans / word hoard raidings. I had an Icelandic dorm mate at Penn (Hi, Ásgeir, if you're reading!) who would proudly proclaim that he, without any specialized training, could read any of the Eddas or Sagas (c. 900 A.D.) that one might put in front of him as easily as if he were reading the Reykjavik daily newspaper, whereas an English speaker would be stymied by, say, Beowulf (…and this is true, I didn't get much further than "Hwæt", myself). Wikipedia provides an example at: https://en.wikipedia.org/wiki/Old_Norse#Old_Icelandic (scroll down to "text example").

  30. Thaomas said,

    February 19, 2019 @ 9:29 am

    "I suppose it starts with the fact that Arabic is to Persian what Greek and Latin are to English."

    No, Arabic and Persian belong to two different language families, Semitic and Indo-European; whereas English and Latin are both Indo-European.

  31. Victor Mair said,

    February 19, 2019 @ 9:29 am

    Now that's really interesting, Benjamin! I doubt that what you said about the ease of reading material from more than a thousand years ago "without any specialized training" is true of any other language, certainly not of Literary Sinitic / Classical Chinese.

  32. Jorge said,

    February 19, 2019 @ 10:00 am

    Bistro, from the Russian "быстро," "quick." Supposedly from Napolean's invasion of Russia.

    And the reverse, "пляж" from the French word for beach.

    Russian has quite a few English loanwords, "to attack," "to park," and "to arrest," for example.

    Don't get me started about Tagalog and all of it's borrowings, but don't fall for "siyempre." ;-)

    Good times.

  33. George said,

    February 19, 2019 @ 10:07 am

    On the whole aubergine/eggplant thingy, This is a great read.

  34. George said,

    February 19, 2019 @ 10:12 am

    @Thaomas

    I have no doubt that Brian Spooner is well aware of what you point out. (If he wasn't, I can't for the life of me imagine why anyone would ask his opinion on anything relating to Persian.) I think the point was that Arabic is a language of scripture and liturgy, much as Latin and Greek have been.

  35. Victor Mair said,

    February 19, 2019 @ 10:20 am

    For a truly great book on Persian edited by Brian Spooner and the late William L. Hanaway, see Literacy in the Persianate World: Writing and the Social Order (Philadelphia: University of Pennsylvania Press, 2012.

    https://www.upenn.edu/pennpress/book/1243.html

    Persian has been a written language since the sixth century B.C. Only Chinese, Greek, and Latin have comparable histories of literacy. Although Persian script changed—first from cuneiform to a modified Aramaic, then to Arabic—from the ninth to the nineteenth centuries it served a broader geographical area than any language in world history. It was the primary language of administration and belles lettres from the Balkans under the earlier Ottoman Empire to Central China under the Mongols, and from the northern branches of the Silk Road in Central Asia to southern India under the Mughal Empire. Its history is therefore crucial for understanding the function of writing in world history.

    Each of the chapters of Literacy in the Persianate World opens a window onto a particular stage of this history, starting from the reemergence of Persian in the Arabic script after the Arab-Islamic conquest in the seventh century A.D., through the establishment of its administrative vocabulary, its literary tradition, its expansion as the language of trade in the thirteenth century, and its adoption by the British imperial administration in India, before being reduced to the modern role of national language in three countries (Afghanistan, Iran, and Tajikistan) in the twentieth century. Two concluding chapters compare the history of written Persian with the parallel histories of Chinese and Latin, with special attention to the way its use was restricted and channeled by social practice.

    This is the first comparative study of the historical role of writing in three languages, including two in non-Roman scripts, over a period of two and a half millennia, providing an opportunity for reassessment of the work on literacy in English that has accumulated over the past half century. The editors take full advantage of this opportunity in their introductory essay.

  36. Michael Watts said,

    February 19, 2019 @ 11:14 am

    How did Enkh Erdene learn "Amarillo by Morning" — through the written or oral medium?

    How did Texans learn to pronounce "Amarillo" with an /l/? It's not there in the Spanish.

    All languages borrow by reinterpreting the written form of a word, if they think they can understand it.

  37. cameron said,

    February 19, 2019 @ 12:03 pm

    With regard to the readability of thousand-year-old Icelandic texts "without any specialized training". I suppose it's true; one does need training, but it just isn't all that specialized. It's the training one gets in learning to read and write Icelandic. Since the literary standard is defined by those old texts, what it means to be literate is to be able to read those texts.

    A similar situation exists in Persian. The language of poets like Ferdowsi (born c. AD 940) and Hafiz (born AD 1315) and all the other classic Persian poets is quite different from the spoken Persian that one hears on the streets of Tehran, and yet a literate Persian speaker can read their works without translation or extensive scholarly apparatus. They're perhaps not quite as easily readable as a newspaper, but they are readable. Being literate is specifically determined by acquiring the background and vocabulary needed to read those works.

  38. Peter Grubtal said,

    February 19, 2019 @ 12:17 pm

    "…. half the population of modern Iran speaks Turkish at home "

    You certainly pick up some interesting factoids round here.
    Around the western Caspian it's clear, Azeri is Turkic, I know. Perhaps in the north-east, in what used to be Seistan Turkic languages are spoken? But that the inhabitants of these areas make up 50% of the population?

  39. George said,

    February 19, 2019 @ 12:47 pm

    @Peter Grubtal

    Yeah, that 50% figure surprised me too, particularly as I've had Iranian friends from the part of the country near Azerbaijan whose first language is Farsi.

  40. George said,

    February 19, 2019 @ 12:54 pm

    … and numerous Google results are suggesting that less than 20% of Iran's population speak Turkic languages.

  41. Jared said,

    February 19, 2019 @ 12:55 pm

    Not adding much to such a great post, but I find these borrowed words in japanese interesting:

    アルバイト arubaito "part time job" (from german arbeit)
    オナニー onanii "masturbation" (from german onanie)
    パクチー pakuchii "coriander/cilantro" (from Thai ผักชี phagchee)
    ルー ruu "thickening agent (used in cooking and sometimes used interchangeably with おかず)" (from French roux)

  42. cameron said,

    February 19, 2019 @ 1:19 pm

    I haven't set foot in Iran since my childhood, but I'd guess the figure of how many people are native Turkic speakers is probably around 25%. That's mostly Azeri, spoken in the north-west, and Turkmen and Uzbek, spoken in the north-east. There are other Turkic languages spoken, but with small numbers of speakers. The work torki, in Persian, refers by default to Azeri.

  43. Philip Anderson said,

    February 19, 2019 @ 2:30 pm

    @George:
    Thanks for the link. I say aubergine, but brinjal is the term in Indian restaurants here.

    @Michael Watts:
    Although English tends to borrow written words and then pronounce them as if they were English (“Wipers for Ypres), that isn’t the case for all languages. English words taken into Welsh are generally rewritten in Welsh orthography for the original pronunciation (often more phonetic than the English – Biwmaris for Beaumaris).

  44. Cuconnacht said,

    February 19, 2019 @ 3:23 pm

    I noticed in the movie Untergang that at one point the order was given "Stop!", borrowed from English obviously, and that the English subtitles translated it "Halt!", which we borrowed from German (in that sense; the adjective as in "the lame, the halt, and the blind", is native English).

  45. David Morris said,

    February 19, 2019 @ 3:32 pm

    'Arbeit' is used in South Korea. My students were surprised when I didn't know what it meant, and, when they explained, that it's not an English word.

    South Korea has a National Institute for Korean Language, which, among other aims, seeks “remove incorrect loanwords, foreign languages, and Japanese words and to use correct Korean language”. (At least it did 7 years ago when I copied that quotation – I can't find it right now.) They have about as much chance as any other national language regulator.

  46. turang said,

    February 19, 2019 @ 3:49 pm

    Much can be written about imports into various Indian Languages. I am aware of a dictionary in Kannada devoted to imported words with the now about 106-year old linguist G. Venkatasubbaiah. Apparently available for less than $3 but out of stock:( at https://www.sapnaonline.com/books/eravalu-padakosha-1586-g-venkatasubbaiah-817302331x-9788173023316

    He used to write a column in a local newspaper on usage and etymology of words into his 90s.

    Kannada has the reputation for importing words from all sorts of places with ease (Sanskrit directly, Sanskrit through Prakrits, Arabic, Farsi, French, Portugese and nowadays a lot from English) unlike its sister language Tamil.

  47. J.W. Brewer said,

    February 19, 2019 @ 6:30 pm

    I didn't know that "halt" as a verb came from German but one online source said (evoking some of the discussion above) that in English its a loanword from French, in which it was in turn a loanword from German. In both French and English it sounds like it was originally borrowed purely as military jargon, i.e. part of the set of stylized/standardized commands that drill sergeants bark at soldiers.

  48. Victor Mair said,

    February 19, 2019 @ 7:36 pm

    Borrowing among Chinese, Japanese, Korean, and Vietnamese

    From:

    Wm. C. Hannas, with a Foreword by John DeFrancis, Asia’s Orthographic Dilemma (Honolulu: University of Hawai’i Press, 1997.

    pp. 183-184

    …For nearly two millennia, non-Chinese languages on China’s periphery have shared Sinitic vocabulary freely, in a manner known to all of the world’s languages. Until recently, the direction of this “borrowing” had been largely from Chinese to Japanese, Korean, and Vietnamese, although the latter languages—most notably Japanese—have reversed the process and for the last century and a half have been coining new terms from Sinitic morphemes that are adopted by all four languages. As a result of this borrowing, more than 40 percent of Japanese, 50 percent of Korean, and at least one-third of the words in Vietnamese are based on Sinitic morphemes, according to Liu (1969:67). These figures apply to everyday vocabulary and are lower than other researchers’ counts that take in a wider corpus. For example, Sokolov claims 60 percent for Japanese, with the range for actual use varying between 10 and 80 percent, depending on the topic (1970:98). Ho Ung claims 60 percent (1974:44), and Oh claims 90 percent for some types of Korean materials (1971:26). Helmut Martin notes that in formal Vietnamese the ratio of Sinitic words can reach 50 percent; for newspapers it goes much higher (1982: 32).

    In general, the share of Chinese-style words in these non-Chinese languages increases with formality and difficulty of content, which is to say, Sinitic terms dominate those environments where style and subject matter make them the least predictable. One would think that the emphasis would be on maintaining phonetic distinctions between these word forms, but the opposite is more nearly true. Since most of the terms refer to higher-level concepts, the expectation was that they would be identified through writing, where phonetic characteristics matter less. Accordingly, there was less pressure to avoid homonyms and near homonyms. Another, more important reason for the homophony can be traced to the dynamics of borrowing. When a language “borrows” terms from another, it typically adapts the words’ sounds to its own phonology, which is never a perfect match. The borrowing language cannot add distinctions to the sounds of the terms it is borrowing, but it can and does ignore phonological distinctions that its own system is not equipped to handle. In the case of international Sinitic, this means dropping the tonal features that help distinguish one Chinese syllable from another.

    Just what this meant for the Sinitic vocabulary of Korean and Japanese is evident in the following figures. From an inventory of thirty-six initial and six syllable-final consonants totaling 3,877 different syllable types in sixty century A.D. Chinese, the number of syllables in modern standard Mandarin fell to 1,280, distinguished by twenty-two initial consonants, two final consonants (three, including the Beijing dialect’s –r), and four phonemic tones. Korean speakers, for their part, have 1,096 syllables at their disposal (Yi Kang-ro 1969:44), which increases to 1,724 if we count written syllable types, hundreds more than in Mandarin even with the tones. This inventory seems to give Korean an advantage, until we realize that only four hundred or so different syllables are used for Sino-Korean. If this were not bad enough, most of this vocabulary is expressed in Korean as two-syllable compounds, even more than in Chinese, because of the availability of indigenous single- and multi-syllabic words to handle the day-to-day concepts. The result is significantly more homonyms. Nam counted 22,983 Sinitic homonyms and 4,077 of mixed origin among the 91,825 entries in the Hangul Society’s Kukŏ sajŏn (Korean Language Dictionary) (1970:11). Pure-Korean homonyms numbered only 3,120.

    For Japanese the situation is even worse. Not only were Chinese tonal categories leveled, the phonetic reduction that occurred when these words were borrowed and their subsequent erosion through time have left just 319 sounds (on readings, including bisyllabic morphemes ending in tsu, chi, ku, and ki) for 4,775 character-morphemes listed in Nelson’s dictionary. Even this figure understates the problem, because many of these sounds have one character only, while others accommodate more than one hundred. Samuel Martin noted that the Japanese syllable corresponds to “at least 38 different (Chinese) syllables, some of which already represented more than one morpheme in classical Chinese” (1972:99). More than 180 characters are identified with this sound alone. Even with compounding the numbers are still formidable. Korchagina counted twenty-four words pronounced kōkō, twenty-three pronounced kōshō, eighteen kōtō, and fourteen kōchō in a modern Japanese-Russian dictionary (1977:43), adding that “the allegation of certain linguists that homonyms are an imaginary problem that exists only for linguists can hardly be applied to the Japanese language” (1975:52).

    Other sources of homonyms are attenuated classical expressions in the modern colloquial language and extensive abbreviation—a practice that Zhou called the “monosyllabification of polysyllabic words” (1961:300). These abbreviations appear in technical terms and other types of new vocabulary that are shortened for convenience after the concepts take root in society, in names for organizations and institutions where the first or most significant characters for each word in the name are singled out to represent the whole, and, especially in Chinese, in the use of pithy, shortened slogans generally of a political nature. Although abbreviations make sense from the point of view of the reader, who, thanks to the characters, is inundated with a surplus of graphic information, the same morphemes that make up these abbreviations lose most of their redundancy, both absolutely and with respect to other expressions in the language, when spoken aloud. What began as graphically and phonetically distinct words collapse into homonyms or near homonyms (“paronyms”) as reductions are made based on the requirements of writing that have no direct connection with the information-bearing requirements of speech.

  49. Victor Mair said,

    February 19, 2019 @ 7:42 pm

    Hannas, cont.

    pp. 77-78

    A third reason Vietnamese seems related to Chinese is because, as I intimated, a large part of the Vietnamese lexicon is Sinitic in origin. Estimates range from 30 to over 60 percent (DeFrancis 1977:8). Calculating the exact size of the Sino-Vietnamese inventory, however, is not a straightforward process. Is the count based on the percentage of Sinitic words in connected discourse, in which case the Sino-Vietnamese vocabulary runs from under 30 percent for everyday speech to a figure near 90 percent for patches of text in a Marxist editorial? Or do we base the count [on] the lexicon itself, where other problems come into play, including trying to sort out nonindigenized Sino-Vietnamese terms, used by a minority of educated Vietnamese who learned the terms through international Sinitic, from terms that have become part of the common repertoire? More fundamentally, can we include among the Sinitic part of the Vietnamese lexicon words so thoroughly indigenized that their Chinese origin is obscure to all but a few etymologists?

    Sifting out Sinitic from native vocabulary is more of a problem in Vietnamese than in Japanese or even in Korean because of the longer history of contact between Chinese and Vietnamese, and because of the intimacy (most Vietnamese would use a different word to describe it) of this contact. Vietnam was under Chinese “suzerainty” for nearly a millennium: from 111 B.C. to A.D. 39, from 45 to 544, and from 602 to 939. During this long period, the Vietnamese language itself was overshadowed and to some extent replaced by Chinese, opening the door to thousands of Chinese terms and, I would conjecture, to the monosyllabic Sinitic morphology enforced by written Chinese, which was the only writing known in Vietnam at the time. Instances of multiple borrowing of the same Sinitic morpheme were fairly common, as the term entered the language from different parts of China with different pronunciations. Or the Sinitic morpheme was reintroduced centuries later with a changed pronunciation, the original borrowing having become so well indigenized that users were unaware the morpheme was already part of Vietnamese. Typically, Chinese terms introduced through characters retained their Sino-Vietnamese pronunciations, while those whose primary realization depended more on speech were assimilated to native pronunciation habits (Hai 1974:2).

    Even after the Chinese occupation ended in the tenth century, the influence of Chinese language and writing continued to be felt through Vietnam’s bureaucrats and scholar elite, for whom classical Chinese remained the preferred medium of expression. What had until then been haphazard, continuous borrowing of Chinese vocabulary gave way to systematic, deliberate adjustments to the Sino-Vietnamese lexicon. Pronunciations of Sinitic terms, based as they were on a written medium that had become severed from colloquial Chinese language, were read in a fixed (albeit Vietnamese-like) fashion. These latter practices, while relevant only to the 3 to 5 percent of the population who were literate (DeFrancis 1977:19), succeeded in establishing character-based, phonetically marginal Sinitic as the vehicle through which new upper-level vocabulary was formed.

    At the end of the nineteenth and the beginning of the twentieth centuries, the number of new Sino-Vietnamese words coming into Vietnamese directly or indirectly from Chinese increased under the pressure to create new terms for new Western concepts. Whereas before, poorly assimilated Sino-Vietnamese terms were confined chiefly to diplomacy and Buddhism, these newer terms were scientific and intellectual, and hence found their way into the common language through newspapers and school textbooks (Hai 1974:41). The result was the same sort of transformation that simultaneously characterized the other major languages of East Asia: an expansion of the identifiably Sinitic part of the lexicon and a growth in the number of people who had to deal with it.

  50. Victor Mair said,

    February 19, 2019 @ 7:46 pm

    From Bob Ramsey:

    A nice compilation, Victor. But one thing I'd like to remind you of is that body of "round-trip words" you and I have often talked about. (That term "round-trip word" is your coinage isn't it? [VHM: yes, it's mine] –at least that's what I always tell my students!) But what that body of vocabulary means for your lists is that there is some confusion about what can legitimately be called "Chinese"; much of the vocabulary of modern life in all of these East Asian countries (China, Japan, Korea–and, I assume–also Vietnam) are only Chinese in that "round-trip" sense. They were, after all, actually Japanese coinages and thus more properly thought of that way. I realize I'm not saying anything new to you, but I wonder if this background information might not be worth reminding people about.

  51. Jerry Friedman said,

    February 19, 2019 @ 10:19 pm

    Philip Anderson: Although English tends to borrow written words and then pronounce them as if they were English (“Wipers for Ypres), that isn’t the case for all languages.

    I think that sometimes we do that and sometimes we adapt the pronunciation instead of following the spelling, as in "ballet", "faux", and "patio". Sometimes both pronunciations exist, as in British "valet". And sometimes we pronounce borrowed words as if they were from a different foreign language, as in "machete".

  52. Peter Grubtal said,

    February 20, 2019 @ 3:36 am

    Victor Mair : "For Japanese the situation is even worse…."

    Take a look at possible on-readings (vocalisations) of kanji 生 (sei, shou, jou…) for example.

    In Japanese the problem is compounded because sometimes the same character has been coopted into the language in different epochs, and the (then changed) chinese pronunciation adds another reading to the character in Japanese.

    But more often than not (for nouns at least), two kanji's are combined in jukugo's which does diminish the problems of homophony.

    The same has happened to some extent in English with borrowings from French. The words chase and catch are derived from the same French word, but catch came with the Normans and their Norman French pronunciation, and chase later (with the Angevins, I think) and their central French pronunciation. Other examples escape me for time being, but do exist, I'm almost sure.

  53. Jenny Chu said,

    February 20, 2019 @ 5:17 am

    @Victor Mair – Thank you for the clarification and especially the reference from Hannas!

    My main point was not to define precisely the meaning & origin of dai hoc – but to highlight that the Vietnamese word for university is not, for example "u ni vec xi te" (as it might be when borrowed from French) nor "u ni ve xi ti" (as it might be when borrowed from English) – indicating that it wasn't during the French colonial period nor the recent era of rising popularity of English that people started talking about universities.

  54. David Marjanović said,

    February 20, 2019 @ 5:35 am

    Here's a hypothesis identifying a Vietnamese borrowing into Sinitic (Old But Not Too Old – first attested in the 2nd/3rd century).

    neogrammarian-style fixed sounds laws can indeed be thrown out of the window

    That's a controversial statement; I've read some passionate defenses of neogrammarian exceptionlessness in the last few years, and even lamentations that we've been missing interesting information because important people (or entire subfields) have neglected looking for strict sound laws. Uralic linguistics was revolutionized in the 1980s when the previously dominant idea that sound changes are some kind of semi-regular phenomenon that routinely have two different randomly distributed outcomes was thrown out. Could you explain your stance in more detail?

    Persian has been a written language since the sixth century B.C. Only Chinese, Greek, and Latin have comparable histories of literacy.

    Aramaic and Egyptian/Coptic come to mind; they, Greek and Sinitic have longer written histories than Persian (some 3000 years in each case).

    With regard to the readability of thousand-year-old Icelandic texts "without any specialized training". I suppose it's true; one does need training, but it just isn't all that specialized. It's the training one gets in learning to read and write Icelandic. Since the literary standard is defined by those old texts, what it means to be literate is to be able to read those texts.

    The spelling system carefully hides almost all sound changes that have happened since then, too. There've been plenty, and some of them are pretty weird.

  55. Chris Button said,

    February 20, 2019 @ 7:54 am

    @ Jim Breen

    Interesting to see the old chestnut of ありがとう deriving from Portuguese popping up again. I suspect it's partly driven by the common "arri-GAR-to" mispronunciation, which leads people to think it's associated with obrigado.

    I think that's right. Portuguese "obrigado" aligns well with an English "arigato" in that regard. The stress placement in an English pronunciation of "arigato" reminds me of an earlier discussion on LLog regarding "onigiri" and how English speakers tend to say "onigiri". It's interesting how it is not the higher pitch in Japanese on "arigato" nor in "onigiri" that is being associated with the stress in English.

  56. Victor Mair said,

    February 20, 2019 @ 9:30 am

    "All languages borrow by reinterpreting the written form of a word, if they think they can understand it."

    Enkh Erdene is reported not to understand English.

    There are lots of illiterate individuals and groups in the world who borrow words from other languages without having a clue about how those words are written.

  57. Brian Spooner said,

    February 20, 2019 @ 10:14 am

    In response to the doubts expressed above about the proportion of the population of Iran that speaks Turkish at home, let me say that when I was spending a lot of time there in the 1960s and 70s, wandering about all over the country, I found people speaking Turkish not only in the Azeri areas in the northwest but all the way from the Turkish border through the Alborz mountains to a point 300 miles east of Tehran, also in the northeast in Bojnord and parts of northern Khorasan, and of course the Qashqai in the south, as well as here and there in other parts of the country.

  58. David Marjanović said,

    February 20, 2019 @ 5:17 pm

    I'm constantly shocked by how many Swedish loan words there are in Finnish.

    Swedish, and Old Norse, and "Runic Norse", and Proto-Northwest Germanic, and Proto-Germanic, and Pre-Germanic stretching almost all the way down to Proto-Indo-European! Sometimes, Finnish preserves information about Germanic sound changes that is not preserved in Germanic itself.

    (Plus, miekka "sword" is from East Germanic.)

    It's interesting how it is not the higher pitch in Japanese on "arigato" nor in "onigiri" that is being associated with the stress in English.

    It probably is by those who have actually heard the words often enough. But if all you know is that a word has four syllables, Germanic word-root stress brings with it the interpretation that such a word is probably a compound of two stems of two syllables each… and English has a tendency toward penultimate stress, which is probably related.

  59. Bathrobe said,

    February 21, 2019 @ 2:39 am

    Enkh Erdene's performance is now online at a different site, and you can hear both his original performance in Mongolia and his performance in America. There is an amazing improvement. I would not be surprised if he had got intensive coaching:

    https://nextshark.com/mongolian-cowboy-worlds-best/

    There are lots of illiterate individuals and groups in the world who borrow words from other languages without having a clue about how those words are written.

    Both written and spoken routes exist. When much of English educated vocabulary was borrowed, it looks like it was borrowed in the written form, thus the spelling determines the pronunciation (although I'm sure it's not quite as simple as that). That's why we say ejukayshn, based on the spelling 'education' rather than edukasyon, which it might have been if we'd borrowed it by ear from the French.

    Needless to say, there are plenty of borrowings between languages via the ear route. Much Mongolian borrowing from Chinese is like that, which can bring about radical differences from the original Chinese pronunciation. But Japanese zurōsu is my favourite example, being pretty clearly borrowed aurally from English 'drawers' in the 19th century. If they'd borrowed it visually it would have turned out something like dorowāzu

  60. Rodger C said,

    February 21, 2019 @ 8:00 am

    Japanese zurōsu

    Would I be right to suppose that that started out in the 19th century as dzurōsu?

  61. Bathrobe said,

    February 21, 2019 @ 8:49 am

    Is your point that the Japanese analysed 'dr-' as ヅロ dzuro-, and thus came up with that particular rendering? Yes, this would match ツリー tsurii for 'tree'. It is possible that it conformed with Meiji-era transliteration standards, although I'm not sure that the rendering of ヅ as 'dzu' in Roman letters was anything but a transliteration convention. ズ today is still pronounced 'dzu'. The ending still suggests that they were cleaving fairly closely to the perceived pronunciation rather than the written form.

    Another Meiji era word was, I believe, kameya for 'dog'. That's because they heard 'Come here!' as 'Kameya!'. It's quite accurate phonetically, much more so than the modern standardised form learnt by Japanese, kamu hiya.

  62. Chris Button said,

    February 21, 2019 @ 11:16 am

    @ David Marjanović

    Yes – I speculated along somewhat similar lines in two comments on the "onigiri" post starting here:

    http://languagelog.ldc.upenn.edu/nll/?p=40591#comment-1556885

  63. GALESL said,

    February 21, 2019 @ 12:05 pm

    Fascinating article and comments. I was wondering: any info on borrowing in signed languages? Or even "tactile languages" (if they exist)? (Are things like Braille always used to represent only spoken/graphically-written languages?)

  64. Heino said,

    February 21, 2019 @ 7:27 pm

    Alan Booth’s ‘The Roads to Sata’ (Penguin Books, 1987) tells of this Japanese-speaking writer’s 2,000 mile walk through the length of Japan. On page 111 he writes:

    I sat with the old gentleman in his living room and watched a television drama. The credits at the end of the drama caught his attention.
    “What is direkutaa?” he asked me.
    “It’s an English word – director,” I explained. “In Japanese you’d say kantoku.”
    “Then why don’t say kantoku? It’s supposed to be a Japanese program.”
    The news came on.
    “And what is kyampeen?”
    “It’s another English word – campaign. It means undo.”
    “And what is a konsensasu?”
    “A consenus. Goi.”
    The old man sighed and shook his head.
    “It’s getting to the point,” he muttered, “where to understand Japanese television you need to be a gaijin.”

  65. Bathrobe said,

    February 21, 2019 @ 8:02 pm

    @ Heino

    Well, yes, but the Japanese have always been conscious of the need not to leave the old folks too far behind. Katakana expressions in the Japanese press are often followed by kanji explanations in parentheses. It's a token nod in the direction of people who don't keep up with these new-fangled expressions.

  66. Bathrobe said,

    February 21, 2019 @ 8:13 pm

    @ Rodger C

    Sorry if my reply sounded garbled. I wasn't sure where you were coming from. At any rate, the pronunciation of ズ is still /dzu/, so no change there. But in the Meiji there was a clear written differentiation between ズ (zu) and ヅ (du), even though both were pronounced the same. This written differentiation has largely been abandoned in the modern orthography.

    Perhaps there was greater consciousness in the Meiji era of the kana ダ ヂ ヅ デ ド row as equivalent to da di du de do — after all, they did render 'diesel' as ヂーゼル (diizeru, pronounced jiizeru) — but I'm still doubtful that ズロース zurōsu (or ヅロース durōsu) was a result of the same process.

  67. Kristian said,

    February 22, 2019 @ 12:49 pm

    @Ian
    There are lots of Swedish loan words in Finnish, but they don't make up nearly as large a percentage of an average text as you suggest. And the percentage of English loan words in Finnish must be rather small.

    I have read that between a third and half of the Finnish basic vocabulary is from loans, but many of these are very obscure and from older Indo-European languages, as David Marjanović comments, so one wouldn't know unless one looks in an etymological dictionary or happens to be scholar in that field.

  68. Levantine said,

    February 23, 2019 @ 3:54 am

    Regarding Brian Spooner's assertion that "the Ottomans used Persian as their language of administration", this stopped being true during the sixteenth century.

  69. Levantine said,

    February 23, 2019 @ 4:07 am

    It's also interesting to note that, in the later period, the Ottomans themselves coined several Arabic terms that are today current in both Turkish and Arabic. These include jumhuriyya/cumhuriyet (republic) and madaniyya/medeniyet (civilisation).

  70. David Marjanović said,

    February 23, 2019 @ 10:13 am

    Yes – I speculated along somewhat similar lines in two comments on the "onigiri" post starting here:

    Thanks, I agree with all that!

RSS feed for comments on this post