Folktale phylogeny

Over at Languagehat's place, there's been a lively discussion of Sara Graça da Silva & Jamshid J. Tehrani, "Comparative phylogenetic analyses uncover the ancient roots of Indo-European folktales", Royal Society Open Science 1/20/2016. I'm not going to summarize that discussion, so go read "Ancient Indo-European Folktales", Languagehat 1/21/2016.

The foundation of da Silva & Tehrani's analysis is a matrix that they provide in Excel format, "recording the presence (1)/absence (0) of 275 Magic Tale types found among 50 Indo-European-speaking populations represented in the ATU [= "Aarne Thompson Uther" (myl)] International Type Index". They add a (simple Euclidean approximation to) geographic proximity, taking the location of each collected tale as (I think) the current (Eurasian) centroid of the population speaking the language in question:

They also add "language trees as a model for population histories", taking the historical-linguistic descent trees from Bouckaert et al. 2012 (as amended 2013).

All of this is then fed into statistical model of phylogenetic descent, which is used to find a hypothesis about historical transmission and borrowing of stories that's optimal — in terms of the model.

To have an informed opinion about the results, I'd want to recapitulate the analysis in full, and unfortunately the authors don't provide the data and code in a form that would make that as easy as it should be (though I think all the required pieces are there).

My wish to recapitulate the analysis is partly because that's the best way to fully understand what the model's assumptions are really doing, and also to test what happens if different aspects or versions of the data are used. It's also because, as Keith Baggerly has written ("The Importance of Reproducible Science in High-Throughput Biology: Case Studies", AAAS 2011):

One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common.

It's one of those common simple errors (retention of some missing data as zeros) that led to Bouckert et al. 2012 requiring a formal correction.

And finally, I think that this is an interesting and consequential example of a promising style of analysis, and therefore worth studying in detail.

But in advance of detailed exploration, I'm skeptical that reducing each language to a point on the map yields a reasonable proxy for the probability of borrowing. The languages and populations involved have moved around a lot over the last couple of thousand years, much less the 5,000 years or more that their analysis aims to cover.

And even without large-scale geographical displacement, it's clear that trading relationships, raiding, and military adventures can play a large role in spreading stories.

I can offer one anecdotal example from personal observation. The Javanese trickster animal is Kancil the mouse deer, and several of the stories told about him are remarkably similar to Uncle Remus stories. In a field methods course when I was an undergraduate, our Javanese language consultant told us the story of Kancil and the tar baby (well, the doll covered with sticky stuff), complete with the one-paw-at-a-time punching and sticking process, and the "oh please don't throw me in the place-I-most-want-to-be" method of escape.

To read a version of the Javanese version, you can take a look at R.V. Winsedtt, “Some Mouse-deer Tales“, Journal of the Straits Branch of the Royal Asiatic Society, 1906, who writes “Mr. George Maxwell and others have reminded me, that one of these tales of mine bears an extraordinary resemblance to that of Brer Rabbit and the Tar Baby.”

I’ve always wondered how that story ended up in the American South and in Java. Aurelio Espinosa, “A New Classification of the Fundamental Elements of the Tar-Baby Story on the Basis of Two Hundred and Sixty-Seven Versions“, The Journal of American Folklore 1943, affirms his “belief in the India origin of the tale in the sense that India is as far back as we can trace it, and that it is not of African origin as some have believed”.

So presumably it was Indian traders who brought the story to Java and to East Africa, from where it somehow made it to one of the sources of the American slave trade. Or maybe the story was West African in origin after all, and made its way via Arab or Portuguese traders to India and Southeast Asia. But anyhow, neither linguistic phylogeny nor geographical proximity seems to have been especially important in this case…

And I'd add that Espinosa's approach, which is described as following "in general the methods employed by Bolte-Polivka and by the Finnish folklorists", has the useful property of deriving (from a single story) a large number of individual traits, whose evolution could be somewhat independently modeled. His analysis involves

five general, important and all-inclusive divisions or parts: The thief, bully or mischief-maker; the artificial or fashioned "tar-baby;" the natural "tar-baby;" the multiple-attack and stick-fast episode; and the end or outcome.

The second of these in detail:

B. The owner of the garden, food stock, bank or well, man or animal of elements A-A5, the enemy (or enemies) of the mischief-maker of elements A6, A8, or one of the rival animals of element A9, has (or have) a human figure made of tar or some other sticky substance, or of other material and has (or have) it covered with tar or some other sticky substance, and places (or place) it where the thief or mischief maker is likely to encounter it, in order to catch him.

BI. Idem. Tar-baby has food, such as cake or sweets, in his possession, usually in his hands or lap.
B2. Idem. Tar-baby has a cigar or a pipe in his mouth.
B3. Idem. Tar-baby has a deck of cards or dice in his possession.
B4. Idem. Tar-baby is a "pretty girl."
B5. The animals of element A5 send a tortoise, covered with tar or some other sticky substance, to guard the well or lake and to catch the thief and polluter of the water.
B5a. The animals of element A5 send a tortoise to guard the well or lake. It catches the thief with its mouth. [In this variant of B5 there is really no "tar-baby"at all, but the attack and catch is of the same pattern as when B5 is involved.]
B6. In order to catch a bear and other animals, a very poor couple fashion a straw-ox and cover it with tar. It becomes animated and gives the commands of element F2 to invite an attack.
B7. In order to catch the witch of element A7, the brothers of the kidnaped girl place a tarred horse at the entrance of the door of their house.
B8. Tar-baby is a fetish or witch doll. It has the power to catch and to hold.
B9. Tar-baby is a magic trap. It has the power to catch and to hold. 
B10. Tar-baby is a bucket or barrel full of tar, a large piece of butter, a tarred piece of meat, a tarred object or figure buried in the ground, or a rotted and pitchy pine branch.
B11. In order to capture covetous or sensual monkeys or other animals, hunters place sticky plasters on the ground over which they usually travel.
B12. In order to catch a thief, a bully or a mischief maker, men or animals place tar, pitch or bird lime on a gate, on a house roof, on sticks on the ground, on a stool (under which a tortoise sits as guardian), or on a stone.
B13. In order to catch a bird that has stolen the mate of another bird (element AI0), the birds make a log-whale, cover it with tar, get inside of it, and go to sea.
B14. In order to catch a deer, men place esparto grass stems covered with bird lime and stuck to potatoes on the water where the animal usually drinks.

This is a much richer source of (innovated or inherited) traits than the simple "do you have the story or not" binary traits in the ATU index.


  1. D.O. said,

    January 25, 2016 @ 9:38 am

    In what is considered to be (I am not an expert and it is hard to tell what exactly were the folk elements after a professional writer edited the story) a Ukrainian folk-tale The straw bull-calf the decoy calf with sticky sides catches three different animals, which are used for ransom. I wonder whether it has enough elements to be classified with the "tar-baby" stories or not…

  2. GH said,

    January 25, 2016 @ 10:51 am

    Interesting that the method of breaking down a story into numerous traits that can be analyzed independently is referred to as coming from Finnish folklorists. As a layperson, I was only familiar with the Aarne-Thompson classification system, and it always struck me as unsatisfactory precisely because it reduces a tale to one "essential" motif, necessarily ignoring other features. In that respect, Propp's system seemed more flexible. I wonder if the method used here is a development of Aarne's approach, or a reaction to it?

  3. GH said,

    January 25, 2016 @ 11:03 am

    Ah, looking into it a little more, it seems the traits they analyzed were indeed taken from the ATU index, so there must be some aspect of this classification I've failed to grasp.

    [(myl) They indeed did use (a subset of) the ATU index, which indeed is limited to one binary trait per story. What Espinosa did (published in 1943) is independent and done in a different (and I think more interesting) way.]

  4. Sean M said,

    January 26, 2016 @ 2:41 am

    Thanks for such a respectful treatment of a topic connected to the dreaded Bouckaert et al. 2012). It seems to me that if da Silva and Tehrani have a logically valid and useful method, they will have a lot of work convincing folklorists to learn enough statistics to adopt it … quite a few people in the quantitative social sciences still have trouble thinking statistically after all! It is too far from my training and research interests for me to have an opinion.

  5. Jim B. said,

    January 26, 2016 @ 12:24 pm

    Yeah, I wondered why they used the AT(U) tale type index rather than the Thompson's Motif-Index which has finer granularity: not all versions of a tale type share the same set of motifs. (At least that's how I remember it, but maybe not.)

