Uri Tadmor has been kind enough to respond to some of the comments on yesterday's post "Borrowability", which described the Loanword Typology project at the Max Planck Institute for Evolutionary Anthropology.
[Guest post by Uri Tadmor]
Thanks a lot, Mark, for featuring the Loanword Typology Project, and thanks to all those who contributed comments — we really appreciate the feedback.
A few clarifications may be in order. First, please disregard the May results — they are really outdated now. The lists from our LSA presentation two days ago are:
10 least borrowed meanings (overall):
10 least borrowed meanings (based on unanalyzable words only)
to lie down
10 most borrowed meanings (which happen to be on our list)
To fully understand what these lists mean, please see the specific comments below.
On January 10, 2009 @ 12:36 pm Joseph Dart said:
The most borrowable word meanings were kangaroo (100%). How is that 100% even possible, given that Chinese invented/calqued its own word to refer to them (one which literally translates as "pocket mouse")?
100% would not mean 100% in all the world’s languages, of course — just in the 40 project languages. However, these 40 languages constitute a decently representative sample of the world’s geographical, genealogical, sociolinguistic, and typological diversity. Our results (as Mark mentioned) are preliminary; we don’t have all the data in yet (e.g. the Chinese database is one of a handful of incomplete language databases in our consolidated database) and we are still tweaking our algorithms. At any rate, the "most borrowed meanings" list is not terribly meaningful, since it’s a factor of the meanings on our list. We could have chosen from thousands of other meanings. The more significant lists are that of "least borrowed meanings", because it is unlikely that we have overlooked a very basic meaning (our list includes all the items on Buck’s I-E list, the International Dictionary Series [IDS] list, and the Swadesh list, plus many other items).
On January 10, 2009 @ 12:43 pm bulbul said,
So let me get this straight: one of the top five most loanword-friendly languages is a creole and one of the top five most loanword-resistant languages is also a creole?
Yes. Loanwords in creoles presented special challenges for our project, but we really wanted to include some creoles, and so worked around the problems. We considered as loanwords those words which were borrowed after the creole’s crystalization from any language (including later borrowings from the lexifier, if this can be demonstrated — e.g. if it denotes a concept that was not present when the creole was formed, or if it exhibits a sound correspondence pattern that indicates the word was borrowed from the lexifier after the creole had been formed).
Well shave my back and call me a beaver. I’d love to see the same study done on Maltese.
Actually we did commission a study of Maltese, but the contributor withdrew from the project for lack of time (completing a database for the Loanword Typology Project takes many hundreds of hours of work, so we completely understand if some contributors couldn’t find the time to complete the taks).
On January 10, 2009 @ 1:25 pm Craig Russell said,
My problem is this: how do you determine what "THE word" for a meaning in a language is?
This is one area where our computerized database goes far beyond what was possible in traditional "word lists" (actually meaning lists) used for cross linguistic comparison. Such traditional lists normally can only accomodate a 1:1 correspondence (one meaning "one word"). As Mark mentioned, our electronic database allows a "many-to-many" correspondence between meanings and words: each meaning can be linked to any number of words, and each word can be linked to any number of meanings. So there is no need to determine what "THE word" for a meaning in a language is — one can simply link a meaning to several words.
How about concepts for which there is no one word in a language?
No problem at all — in such a case no counterpart has to be provided. However, in addition to identity or near-identity between the LWT meaning and the meaning of the counterpart word in the project language, we also allow for more complex semantic relationships. You can read more about this in the LWT guidelines on our website.
On January 10, 2009 @ 1:34 pm m-vic said,
Did they mention any correlation with how receptive the various cultures are to foreign influence?
The LWT project consists of two parts: an online database (coming up soon) and a book with 40 case studies on the various project languages plus some general chapters (coming later this year). In our 20-minute LSA presentation, we could only summarize our most important results, and could not go into detail about individual languages. However, your question is very interesting, and the sociolinguistic background of borrowing situations is in fact discussed in our book (to be published by Mouton later this year).
I also wonder why Mandarin is only at 1% – with such a widely spoken language, in a multilingual country especially, you might expect to find dialect borrowings, or loans from related languages spoken nearby. But those are probably harder to detect.
Dialect borrowings do count as loanwords for our project. Indeed, these are hard to detect, but since all our contributors are experts in the languages on which they report, we hope they did not miss too many. Also, our Chinese database is not quite complete yet. But indeed, Chinese speakers are averse to lexical borrowing, and prefer coining their own terms for introduced concepts (including many place names and even personal names).
On January 10, 2009 @ 2:23 pm bulbul said,
I have to wonder about the choice of some concepts / words. In that paper on Selice Romani, Viktor Elšík provides a list of loans which includes words like
cukornádo 'sugar cane’
jaguári * 'jaguar’
and so forth. Those are not only imported words, but also imported items and thus with little relevance to lexicostatistics and glottochronology.
They may be very relevant to sociolinguistics and sociology, though – the fact that these particular ones were imported from Hungarian (another minority language) and not Slovak (the official and majority language) is certainly noteworthy.
The LWT word list is based on the IDS wordlist (which was in turn based on Buck’s list). Since we wanted our list to be compatible with the IDS list, we did not delete any item, however unsuitable we may have found it. But we did add quite a few items. Some of them were specific to non-European ecosystems and cultures (and were proposed by experts on the languages of those areas). They were added precisely in order to dilute the Eurocentricity of the IDS list.
On January 10, 2009 @ 3:14 pm Sky Onosson said,
Perhaps if they also borrowed the word, it was counted as an instance of borrowing, whether or not there were additional calques in the language.
If a meaning had more than one counterpart (word) in a project language, each one was assigned a fractional value. For example, in a case where a meaning had two counterparts, one of which was a loanword and the other was not, it would count as 0.5 loanword and 0.5 non-loanword.
On January 10, 2009 @ 3:26 pm J Greely said,
When the native experts determined the correct words for each meaning, did they discount loanwords that existed alongside native words?
No. As already mentioned, it was possible to have several counterparts to one meaning. Moreover, contributors were asked to specify in a special field whether each loanword was an instance of replacement, coexistence, or insertion (where the word was borrowed together with a newly introduced concept). Provided they knew the answer, of course — the fourth option was "no information".
On January 10, 2009 @ 4:22 pm bulbul said,
And as for "the married woman", according to the guidelines (ibid.): "The word form can be a single word, or a phrasal expression, which must be a fixed one. It is not clear to me from the examples they give (’feather’ expressed as 'hair of bird’ vs. 'to make love’ meaning 'to have sex’) what "the married woman" is.
Personally I would not consider "the married woman" to be a fixed phrase in English — there is nothing about it (e.g. unexpected compositional semantics, stress shift, etc.) to indicate it is anything but a simple noun phrase. Moreover, in our LSA presentation we explained that in order to eliminate analyzable expressions such as compounds and phrases, which normally would not count as loanwords anyway, we also performed a tally on (synchronically) unanalyzable words only, and received rather different results (with no meanings of the type "married woman" which are not lexicalized in many languages). See the lists at the beginning of this post.