More on Loanword Typology

« previous post | next post »

Uri Tadmor has been kind enough to respond to some of the comments on yesterday's post "Borrowability", which described the Loanword Typology project at the Max Planck Institute for Evolutionary Anthropology.

[Guest post by Uri Tadmor]

Thanks a lot, Mark, for featuring the Loanword Typology Project, and thanks to all those who contributed comments — we really appreciate the feedback.

A few clarifications may be in order. First, please disregard the May results — they are really outdated now. The lists from our LSA presentation two days ago are:

10 least borrowed meanings (overall):

3SG pron
to lose

10 least borrowed meanings (based on unanalyzable words only)

3SG pron
to rise
to stand
to lie down
to run
the nose
to go

10 most borrowed meanings (which happen to be on our list)

the school
the bank
the machine
the bus
the motor
the coffee
the television
the tea
the towel
the pen

To fully understand what these lists mean, please see the specific comments below.

On January 10, 2009 @ 12:36 pm Joseph Dart said:

The most borrowable word meanings were kangaroo (100%). How is that 100% even possible, given that Chinese invented/calqued its own word to refer to them (one which literally translates as "pocket mouse")?

100% would not mean 100% in all the world’s languages, of course — just in the 40 project languages. However, these 40 languages constitute a decently representative sample of the world’s geographical, genealogical, sociolinguistic, and typological diversity. Our results (as Mark mentioned) are preliminary; we don’t have all the data in yet (e.g. the Chinese database is one of a handful of incomplete language databases in our consolidated database) and we are still tweaking our algorithms. At any rate, the "most borrowed meanings" list is not terribly meaningful, since it’s a factor of the meanings on our list. We could have chosen from thousands of other meanings. The more significant lists are that of "least borrowed meanings", because it is unlikely that we have overlooked a very basic meaning (our list includes all the items on Buck’s I-E list, the International Dictionary Series [IDS] list, and the Swadesh list, plus many other items).

On January 10, 2009 @ 12:43 pm bulbul said,

So let me get this straight: one of the top five most loanword-friendly languages is a creole and one of the top five most loanword-resistant languages is also a creole?

Yes. Loanwords in creoles presented special challenges for our project, but we really wanted to include some creoles, and so worked around the problems. We considered as loanwords those words which were borrowed after the creole’s crystalization from any language (including later borrowings from the lexifier, if this can be demonstrated — e.g. if it denotes a concept that was not present when the creole was formed, or if it exhibits a sound correspondence pattern that indicates the word was borrowed from the lexifier after the creole had been formed).

Well shave my back and call me a beaver. I’d love to see the same study done on Maltese.

Actually we did commission a study of Maltese, but the contributor withdrew from the project for lack of time (completing a database for the Loanword Typology Project takes many hundreds of hours of work, so we completely understand if some contributors couldn’t find the time to complete the taks).

On January 10, 2009 @ 1:25 pm Craig Russell said,

My problem is this: how do you determine what "THE word" for a meaning in a language is?

This is one area where our computerized database goes far beyond what was possible in traditional "word lists" (actually meaning lists) used for cross linguistic comparison. Such traditional lists normally can only accomodate a 1:1 correspondence (one meaning "one word"). As Mark mentioned, our electronic database allows a "many-to-many" correspondence between meanings and words: each meaning can be linked to any number of words, and each word can be linked to any number of meanings. So there is no need to determine what "THE word" for a meaning in a language is — one can simply link a meaning to several words.

How about concepts for which there is no one word in a language?

No problem at all — in such a case no counterpart has to be provided. However, in addition to identity or near-identity between the LWT meaning and the meaning of the counterpart word in the project language, we also allow for more complex semantic relationships. You can read more about this in the LWT guidelines on our website.

On January 10, 2009 @ 1:34 pm m-vic said,

Did they mention any correlation with how receptive the various cultures are to foreign influence?

The LWT project consists of two parts: an online database (coming up soon) and a book with 40 case studies on the various project languages plus some general chapters (coming later this year). In our 20-minute LSA presentation, we could only summarize our most important results, and could not go into detail about individual languages. However, your question is very interesting, and the sociolinguistic background of borrowing situations is in fact discussed in our book (to be published by Mouton later this year).

I also wonder why Mandarin is only at 1% – with such a widely spoken language, in a multilingual country especially, you might expect to find dialect borrowings, or loans from related languages spoken nearby. But those are probably harder to detect.

Dialect borrowings do count as loanwords for our project. Indeed, these are hard to detect, but since all our contributors are experts in the languages on which they report, we hope they did not miss too many. Also, our Chinese database is not quite complete yet. But indeed, Chinese speakers are averse to lexical borrowing, and prefer coining their own terms for introduced concepts (including many place names and even personal names).

On January 10, 2009 @ 2:23 pm bulbul said,

I have to wonder about the choice of some concepts / words. In that paper on Selice Romani, Viktor Elšík provides a list of loans which includes words like

cukornádo 'sugar cane’
lagúna 'lagoon’
papagáji 'parrot’
jaguári * 'jaguar’
tekňéšbíka 'turtle’

and so forth. Those are not only imported words, but also imported items and thus with little relevance to lexicostatistics and glottochronology.

They may be very relevant to sociolinguistics and sociology, though – the fact that these particular ones were imported from Hungarian (another minority language) and not Slovak (the official and majority language) is certainly noteworthy.

The LWT word list is based on the IDS wordlist (which was in turn based on Buck’s list). Since we wanted our list to be compatible with the IDS list, we did not delete any item, however unsuitable we may have found it. But we did add quite a few items. Some of them were specific to non-European ecosystems and cultures (and were proposed by experts on the languages of those areas). They were added precisely in order to dilute the Eurocentricity of the IDS list.

On January 10, 2009 @ 3:14 pm Sky Onosson said,

Perhaps if they also borrowed the word, it was counted as an instance of borrowing, whether or not there were additional calques in the language.

If a meaning had more than one counterpart (word) in a project language, each one was assigned a fractional value. For example, in a case where a meaning had two counterparts, one of which was a loanword and the other was not, it would count as 0.5 loanword and 0.5 non-loanword.

On January 10, 2009 @ 3:26 pm J Greely said,

When the native experts determined the correct words for each meaning, did they discount loanwords that existed alongside native words?

No. As already mentioned, it was possible to have several counterparts to one meaning. Moreover, contributors were asked to specify in a special field whether each loanword was an instance of replacement, coexistence, or insertion (where the word was borrowed together with a newly introduced concept). Provided they knew the answer, of course — the fourth option was "no information".

On January 10, 2009 @ 4:22 pm bulbul said,

And as for "the married woman", according to the guidelines (ibid.): "The word form can be a single word, or a phrasal expression, which must be a fixed one. It is not clear to me from the examples they give (’feather’ expressed as 'hair of bird’ vs. 'to make love’ meaning 'to have sex’) what "the married woman" is.

Personally I would not consider "the married woman" to be a fixed phrase in English — there is nothing about it (e.g. unexpected compositional semantics, stress shift, etc.) to indicate it is anything but a simple noun phrase. Moreover, in our LSA presentation we explained that in order to eliminate analyzable expressions such as compounds and phrases, which normally would not count as loanwords anyway, we also performed a tally on (synchronically) unanalyzable words only, and received rather different results (with no meanings of the type "married woman" which are not lexicalized in many languages). See the lists at the beginning of this post.


  1. m-vic said,

    January 12, 2009 @ 1:55 pm

    Thanks so much for the response. I've been looking at loanwords in a particular language (Tagalog), so seeing something crosslinguistic is really interesting to me – I'll definitely keep an eye out for the book. What still strikes me about Mandarin is that, since the percentage is so low, it seems that most speakers must share that aversion to borrowing, which goes beyond official policy. This goes back to my question about cultures having various degrees of receptivity to outside influence. It's not like French or Spanish where this is some official body actively trying to block loanwords by making up things like 'correo electronico', but the actual speakers keep on saying 'email' anyway. I'd be curious as to what percentage of loanwords those languages have – I didn't see them on the old slides. So many languages, so little time.

    [(myl) The small percentage of loanwords in Chinese — and the project's preliminary data may be an under-estimate in this case — seems likely to increase in the future, given the newly widespread study of English in China, with effects like this one. ]

  2. Brian Barker said,

    January 12, 2009 @ 2:51 pm

    As the "International Year of Languages" comes to an end on 21st February, you may be interested in the contribution, made by the World Esperanto Association, to UNESCO's campaign for the protection of endangered languages.

    The following declaration was made in favour of Esperanto, by UNESCO at its Paris HQ in December 2008.

    The commitment to the campaign to save endangered languages was made, by the World Esperanto Association at the United Nations' Geneva HQ in September. or

  3. Lameen said,

    January 12, 2009 @ 5:32 pm

    a. What about all the Chinese words formed by giving the characters of originally Japanese coinages Chinese readings? Are those counted as loans? (See – first reference I could find at short notice.)

    b. Resistant to borrowing though they may be, I can think of examples of top ten borrowings in both the languages I'm working on:

    * Kwarandzyey "why?" = maγạ < Berber (ma "what" +γər "at")
    * Siwi "go" = ṛaħ < Arabic ṛaħ

  4. Mark F. said,

    January 12, 2009 @ 5:39 pm

    English is known for being especially prone to borrow words, and the LWT results seem to bear that out. But I've wondered for some time whether English is *still* especially prone to borrow words, or do we just have a lot of borrowed words because of the Norman Conquest? Can the LWT data say anything about that?

  5. Gareth Rees said,

    January 12, 2009 @ 7:06 pm

    Does "3SG pron" mean "3rd-person singular genitive pronoun" ("his", "her", "its", in English)?

  6. Coby Lubliner said,

    January 12, 2009 @ 8:08 pm

    What bothers me about this exercise — unless I'm not seeing something — is the apparent assumption that the notion of a "word" is valid across languages.
    I will focus on only one aspect of this problem. In any given language a word typically has several meanings, and usually these meanings are not conveyed by a single word in other languages.

    Let me give two examples from the list of most borrowed words. "School" in English can mean 'place of learning' or 'group of followers of a doctrine or style." But in Hebrew, as Uri Tadmor ought to know, the former meaning is given by beit-sefer (a native compound), and the latter by askola, a borrowing from Greek.

    "Kangaroo" is given in most languages by some cognate of the Australian aboriginal word. But in European Spanish canguro also means 'babysitter'.

    So, what am I missing?

  7. Jongseong Park said,

    January 13, 2009 @ 5:04 am

    I have the same question as Lameen about Chinese regarding Japanese coinages using Chinese characters that were imported into Chinese and given Chinese readings. I am inclined to see them as loanwords, but I can see how others might see them differently.

    If Chinese appears resistant to borrowing, it may be because the writing system makes it a bit of a hassle to represent foreign words phonetically in a somewhat unambiguous manner. I have a feeling more recent loanwords are used in everyday speech and informal writing than can be found in more formal writing. Informal writing may render the recent loanwords in the Roman alphabet to circumvent the difficulties of writing them phonetically in Chinese characters, but that might be discouraged in formal writing.

    On a separate note, assuming the English words 'sky' (from Old Norse) and 'school' (from Middle Dutch) count as loanwords, how would incomplete knowledge of the word histories in some languages affect the data? It may be hard to distinguish loanwords from closely related languages that go centuries back from native words.

  8. bulbul said,

    January 13, 2009 @ 10:57 am

    Prof. Liberman, Dr. Tadmor,
    thank you very much for the follow-up. Do you plan to expand the database of languages?
    As for the creoles, was I right in my assumption that in the case of Saramaccan, Portuguese and other non-English lexical items would be considered loanwords?

    note that the guidelines do not refer to 'words', but rather 'lexical meanings' or as I dubbed them in the comments to the previous post, 'concepts'. Thus beyt sefer and askola would be two different items on the list. Based on the sample sentence provided on the LWT list, it is quite obvious that the concept behind 'school' they had in mind is 'place of learning'.

  9. Etienne said,

    January 14, 2009 @ 5:55 pm

    Bulbul (and others):

    In the case of Saramaccan, actually, comparative evidence makes it much easier than in many other languages (creole and non-creole alike) to separate the borrowed from the inherited element: Saramaccan and other Creoles of Suriname (Sranan and Ndjuka) are so similar (in grammar as well as lexicon) that they plainly have a common ancestor (which some scholars believe was originally spoken in coastal West Africa): this makes it clear that the bulk of the Portuguese lexical component (and most of the African component) of Saramaccan was borrowed into a stable English-based language (which at the time may have been an expanded pidgin or a creole) from which Sranan and Ndjuka also stem.

  10. Chris said,

    January 16, 2009 @ 1:32 am

    @Lameen, Jonseong:

    there is a lot of research especially on the vocabulary that was used during the modernisation/westernisation efforts in the 19th century. Some of the terminology was "reused" from ancient texts, but of course a lot was newly coined, and many words can be traced to either a Japanese or Chinese source. Thus, if a Japanese coinage came to be used in Chinese, I would consider it as a borrowing.

  11. marie-lucie said,

    January 20, 2009 @ 8:53 pm

    I am not sure why "wife" is not considered a suitable exemplar for "married woman": until a few years ago many women would characterize themselves as being "a wife and mother" (not just "X's wife"), and a "wife" is by definition a married woman. In the natural order of things (ie in less complex societies) an adult woman is also a married woman (although she may not always be a mother). The word "wife" itself used to mean "adult woman", as in the expression "man and wife" originally meaning "man and woman" (= "couple"). If "mother" is considered a single concept (even though it implies a relationship with children), why not "wife"?

RSS feed for comments on this post