Nicola Davis, "Bat chat: machine learning algorithms provide translations for bat squeaks", The Guardian 12/22/2016
It turns out you don’t need to be Dr Doolittle to eavesdrop on arguments in the animal kingdom.
Researchers studying Egyptian fruit bats say they have found a way to work out who is arguing with whom, what they are squabbling about and can even predict the outcome of a disagreement – all from the bats’ calls.
“The global quest is to understand where human language comes from. To do this we must study animal communication,” said Yossi Yovel, co-author of the research from Tel Aviv University in Israel. “One of the big questions in animal communication is how much information is conveyed.”
The differences between the calls were nuanced. “What we find is there are certain pitch differences that characterise the different categories – but it is not as if you can say mating [calls] are high vocalisations and eating are low,” said Yovel.
The results, he says, reveals that even everyday calls are rich in information. “We have shown that a big bulk of bat vocalisations that previously were thought to all mean the same thing, something like ‘get out of here!’ actually contain a lot of information,” said Yovel, adding that analysing further aspects of the bats’ calls, such as their patterns and stresses, could reveal even more detail encoded in the squeaks.
Kate Jones, professor of ecology and biodiversity at University College, London described the findings as exciting. “It is like a Rosetta stone to getting into [the bats’] social behaviours,” she said of the team’s approach. “I really like the fact that they have managed to decode some of this vocalisation and there is much more information in these signals than we thought.”
The meaning of phrases like "rich in information", "a lot of information" and "much more information than we thought" is obviously contextually evaluated — a "big mouse" is small compared to a "small ox". But we can quantity information content as "entropy" in bits, just as we can quantify size in meters or mass in kilograms. So for this morning's breakfast experiment, I thought I'd try to quantify how much information there actually is in those bat vocalizations.
The research paper documenting this work is Yosef Prat, Mor Taub & Yossi Yovel, "Everyday bat vocalizations contain information about emitter, addressee, context, and behavior", Scientific Reports 12/22/2016:
75 days of continuous recordings of 22 bats (12 adults and 10 pups) yielded a dataset of 162,376 vocalizations, each consisting of a sequence of syllables. From synchronized videos we identified the emitter, addressee, context, and behavioral response. We included in the analysis 14,863 vocalizations of 7 adult females, for which we had enough data in the analyzed contexts. […]
Egyptian fruit bats are social mammals, that aggregate in groups of dozens to thousands of individuals, can live to the age of at least 25 years, and are capable of vocal learning. We housed groups of bats in acoustically isolated chambers and continuously monitored them with video cameras and microphones around-the-clock. Over the course of 75 days, we recorded tens of thousands of vocalizations, for many of which (~15,000) we were able to determine both the behavioral context as well as the identities of the emitter and the addressee. [..]
One might expect most social interactions in a tightly packed group, such as a fruit bat colony, to be aggressive. Indeed, nearly all of the communication calls of the Egyptian fruit bat in the roost are emitted during aggressive pairwise interactions, involving squabbling over food or perching locations and protesting against mating attempts. These kinds of interactions, which are extremely common in social animals, are often grouped into a single “agonistic” behavioral category in bioacoustics studies. […]
Moreover, in many bioacoustics studies, different calls are a-priori separated into categories by human-discernible acoustic features […]. Such an approach however, was impossible with our data (Fig. 1, note how aggressive calls emitted in different contexts seem and sound similar). We therefore adopted a machine-learning approach, which proved effective in recognizing human speakers, and used it to evaluate the information potential of the spectral composition of these vocalizations. We were able to identify, with high accuracy, the emitters of the vocalizations, their specific aggressive contexts (e.g., squabbling over food), and to some extent, the addressees and the behavioral responses to the calls.
I should start by saying that this strikes me as a terrific piece of work, and a model for many future studies of animal communication. The only piece that's missing is publication of the underlying raw data — annotated audio and video recordings — which we can hope will some day follow.
The authors provide tables showing the performance of their classification algorithms. Let's start with the prediction of addressee sex in their Figure 2 D, since that's a 2×2 table and simple to explain:
In this case the space of possible "messages" has two discrete options, male and female. The information content (= uncertainty) of an arbitrary N-way choice is given by applying the entropy calculation function
h = -sum(p*log2(p))
to a discrete probability distribution p with N equally-probable values. Thus the maximum information content of a binary choice is (courtesy of R)
> p = c(1/2,1/2) > -sum(p*log2(p))  1
or 1 bit — this is how much information we gain by learning the outcome of the choice. A perfect classifier would leave us with no residual uncertainty. But the bat-squeak classifier is not perfect — when the true sex of the addressee was female, the table above give us the following calculation (if we take the cited proportions as accurate estimates of the probabilities involved):
> p = c(0.68,0.32) > -sum(p*log2(p))  0.9043815
So we're left with a 0.90 bits of uncertainty, and we've gained 1.0 – 0.90 = 0.1 bit of information. If we wanted to allow for the uncertainty of the probability estimates, we could use R's entropy package, which applies appropriate shrinkage to probability estimates from counts, as follows:
> femalecounts = c(0.68,0.32)*1223 > malecounts = c(0.40,0.60)*11334 > hfemale = entropy(femalecounts, unit="log2") > hmale = entropy(malecounts, unit="log2") > 1-mean(c(hfemale,hmale))  0.06233397
so that we've estimated our information gain in the average case as about 0.06 bits.
For the 3×3 table of specific addressees, we get:
> pN3 = c(1/3,1/3,1/3); hN3 = -sum(pN3*log2(pN3)) > f5counts = c(0.47,0.33,0.19)*93 > f7counts = c(0.27,0.42,0.31)*218 > m2counts = c(0.20,0.19,0.61)*3578 > hf5 = entropy(f5counts,unit="log2") > hf7 = entropy(f7counts,unit="log2") > hm2 = entropy(m2counts,unit="log2") > hN3-mean(c(hf5,hf7,hm2))  0.1150682
or an information gain of about 0.12 bits.
The uncertainty of a choice among 7 equally-likely alternatives is about 2.81 bits, and the "emitter" information in Figure 2 A
yields an estimate of the average information gain of about 1.25 bits.
All of the above assumes that the classifier is not taking advantage of the fact that the distribution of categories in the both the training and testing material is highly unequal. If we take the counts (column "N") in the various subparts of Figure 2 as representative, then the a priori uncertainties in bits are reduced to about
Alternatives Equally Likely Alternatives Estimated From N emittee sex (2) 1.00 0.46 emittee ID (3) 1.58 0.47 emitter ID (7) 2.81 2.57
If the bat-squeak classifiers were taking advantage of these prior probabilities, then the estimates of information gain would need to be reduced appropriately.
But in fact, in several cases the classifiers are not doing nearly as well as a prediction based on the prior probabilities — thus in the case of addressee sex, if we just guessed "male" all time, the expected score would be 11334/(11334+1223) = 90% correct, whereas the table in Figure 2 C shows us that the classifier averaged only 64% correct. So I'm assuming that we should treat all priors as equal.
On this theory, we end up with the following estimates of information gain on the various interactional dimensions studied in this dataset:
|emitter ID (of 7)||1.25 bits|
|emittee ID (of 3)||0.12 bits|
The estimate of total per-squeak information conveyed, 1.97 bits, is equivalent to a vocabulary of 2^1.97 = 3.9 words.
Is this surprising? Not at all, I think.
First, as the authors point out, there's a large literature showing that many animals can recognize other members of their species quite accurately from their vocalizations. See (among many other articles) Dorothy Cheney and Robert Seyfarth, "Vocal recognition in free-ranging vervet monkeys", Animal Behavior 1980.
Second, it's also well established that animal vocalizations (and other commnicative displays) are different in different contexts of interaction.
And third, it's well known that animal vocalization can differ along a dimension of intensity or emphasis.
So what's new in this case? Well, there's the commendable methodology. But it's not new that a single category of vocalization contains some information about the squeaker, or that vocalizations directed to different (classes of) individuals might be somewhat different, or that these vocalizations might be modulated in ways that make them more or less likely to accompany interactions with one kind of outcome or another.
The novel aspect seems to be the demonstration that these vocalizations, which would be assigned to the same category by human impressionistic classification, can to some extent be assigned to different types of competitive interaction (feeding/mating/perch/sleep). But it's not clear whether that's because there are really different categories of vocalization in this case, or rather because the various different activity types are associated with different kinds of gradient modulation, say because of typical differences in posture or motion or overall arousal.
The good news is that with large digital datasets of this type, it should be possible to explore such questions further.
Update — I should note that summing the information gain of the various individual classifiers is probably the wrong thing to do, since there is probably quite a bit of collinearity among the categories involved. The result of a more careful analysis would be to lower the estimate of overall information gain.