Marmoset conversation

« previous post | next post »

This is a guest post by Margaret Wilson.

Turn-taking is fundamental to human conversation, so the question of whether it occurs in other social animals is extremely interesting. A new paper on turn-taking in marmoset monkeys (Takahashi et al., "Coupled Oscillator Dynamics of Vocal Turn-Taking in Monkeys", Current Biology, 2013) is to be applauded for tackling this issue.

Unfortunately, though, it is not clear that their data demonstrate turn-taking in any sophisticated sense: specifically (and this is the sense embraced by the authors), entrainment of timing mechanisms between two individuals to regulate the passing of the turn. They begin by asking, "Is this a simple call-and-response (‘‘antiphonal’’) behavior seen in numerous species, or is it a sustained temporal coordination of vocal exchanges as in human conversation?" They conclude that they have shown the latter, but, on my reading, all their data is compatible with simple call-and-response. What seems to be going on is that the authors have failed to appreciate just how weird human turn-taking is.

When humans take turns, there is a cyclic structure to the extremely short gaps between speakers' utterances (Sacks, Schegloff, & Jefferson, 1974; Wilson & Wilson, 2005; Wilson & Zimmerman, 1986). A between-turn gap of, say, 200 milliseconds is more likely to be broken by the second speaker at certain regular intervals (say, odd multiples of 50 ms) than during the "troughs" between those intervals. That is, short silences are not of arbitrary length, but reflect a cyclic passing back and forth of who has the "right" to speak next (Wilson & Zimmerman, 1986). The troughs represent moments when the right to speak has shifted back to the original speaker, hence the second speaker inhibits speech during those fractions of a second. And this is happening at the order of tens of milliseconds. This "structured silence" can only be explained by extremely tight coupling — entrainment — of some oscillatory mechanism in the brains of the two speakers. (For further research on this framework, see O'Dell, Neiminen & Lennes, 2012; Stivers et al., 2009).

In contrast, Takahashi et al. analyzed marmoset exchanges of up to 60 seconds, and found a cyclic pattern of about 12 seconds. This is completely different, because it includes the actual calls of Marmoset 1, and then Marmoset 2, and then Marmoset 1 again, etc. In other words, there is an actual signal being exchanged. Cyclicity in this case simply means that the timing has some consistency to it.

Furthermore, twelve seconds (six seconds per marmoset) is more than enough time to perceive an unanticipated stimulus and generate a response to it. Neither marmoset need be "tracking" the other marmoset's timing (which is what we mean by entrainment). Instead they could be simply reacting. This IS the call-and-response theory. The authors claim that their data are identical to the human case, except for the difference in time-scale. But that difference in time-scale is all.

With this core issue in mind, let's look in more detail at the findings. Takahashi et al. make their case for an entrainment model of marmoset turn-taking in three stages:

1) They disproved strict versions of two alternative hypotheses. The "reset" hypothesis holds that each monkey maintains a fixed interval between its own calls, and this interval is simply reset to zero by the intrusion of a call from another animal. The "inhibition" hypothesis holds that each monkey inflexibly preplans a series of calls (a minimum of three), and that a call by another animal suppresses one call in this series without altering the timing of the subsequent calls. The authors show that neither of these hypotheses fit the data. This seems fine as far as it goes, but these are very narrow hypotheses, and eliminating them doesn't force us to a human-like turn-taking account.

2) They showed counter-phased cyclicity of marmoset exchanges, as discussed above.

3) They claim to show that the marmosets are responsive to each others' timing, in the sense that each is faster or slower if the other is faster or slower. Unfortunately, it is not clear that they've shown this. Consider three calls uttered alternately by marmoset 1 and marmoset 2 (their Figure 4A), labeled M1a, M2, and M1b (p. 6). The thing to do would be to look for a correlation between the inter-call interval M1a-M2 and the inter-call interval M2-M1b.* That is, when Marmoset 2 jumps in quickly, does Marmoset 1 then jump back in quickly?

Instead, the authors looked for a correlation between M1a-M2 (marmoset 2's other-self interval), and M1a-M1b (marmoset 1's self-self interval). (The latter is normed to marmoset 1's self-self interval when alone. This is because they are basing their calculations on the phase response curve, which is used to model how an oscillator responds to a perturbing signal.) But notice that M1a-M1b contains within it the duration of M1a-M2, and will be partially determined by it, if marmoset 1 is in any sense at all responding to marmoset 2's call. The only way for there not to be a correlation between these two measures is if marmoset 1 proceeds with its own call timing (M1a-M1b), impervious to marmoset 2's call. This is a straw man, not reflective of any of the hypotheses under consideration. In short, while the authors have chosen a legitimate statistical technique, it is not clear that they have chosen the correct one for their question.

To see that the question of mutual speeding-up or slowing-down is not captured by this analysis, note that a positive correlation would be found if marmoset 1 simply waits a fixed 6 seconds after M2, regardless of the timing of M2. Marmoset 1 would not be speeding up or slowing down according to whether marmoset 2 speeds up or slows down (except in a degenerate sense in which, when marmoset 2 speeds up its response to marmoset 1, then marmoset 1's overall cycle is thereby necessarily shortened — that is, when "speed" is defined differently for marmoset 1 and marmoset 2).

To return to the big picture, none of the data presented here make a convincing case for anything other than the call-and-response hypothesis. Meaningful turn-taking might still be demonstrated with other statistical techniques, either in marmosets or in other non-human animals. And the efforts of these authors are certainly not wasted, in that they have begun to map the territory of how non-human call exchange occurs. But the conclusion that marmoset call exchange mirrors human turn-taking is simply not warranted.

[Above is a guest post by Margaret Wilson]


  1. Anschel Schaffer-Cohen said,

    October 21, 2013 @ 8:02 am

    What do you want to bet some newspaper or magazine picks this up as "Marmosets have 'conversations', say scientists"?

  2. marie-lucie said,

    October 21, 2013 @ 8:15 am

    I think that the time between speakers in human speech is a lot more varied than this article suggests, and depends on both personality and culture. American culture puts a premium on fast response (often leading to interruption), but some other cultures expect people to think for a while before they reply.

  3. Jerry Friedman said,

    October 21, 2013 @ 8:58 am

    Ancschel Schaffer-Cohen: I'm not taking that bet.

    Marie-Lucie: By "this article" did you mean the paper by Takahashi et al. or the LL post by Margaret Wilson?

  4. Jerry Friedman said,

    October 21, 2013 @ 8:59 am

    Anschel Schaffer-Cohen: Sorry about the typo.

  5. bks said,

    October 21, 2013 @ 9:01 am

    Why do I always screw up the turn-taking when talking on a cellphone but not on a landline?

  6. Ted McClure said,

    October 21, 2013 @ 9:13 am

    Something I remember from acting classes many years ago: The audience perception of the interval between speakers is longer than the perception of the participants. To appear realistic, actors in a conversation must reply quickly without "stepping on" the other's line. This takes practice, because it is unnatural. The art is in getting the rhythm right for the audience. One of many reasons why I'm not an actor.

  7. Luke said,

    October 21, 2013 @ 10:18 am

    The cited paper by Stivers et al. actually addresses the issue of variation across different languages:

  8. Eric P Smith said,

    October 21, 2013 @ 10:45 am

    @bks: on cellphone calls there can be a considerable latency: a delay between one party speaking and the other party hearing it. It's that that screws up the turn-taking. It's typically about 300ms but it can be up to about 1 second.

    Personally however I am quite capable of screwing up the turn-taking without needing a cellphone.

  9. Victor Mair said,

    October 21, 2013 @ 11:39 am

    Taking turns to Beethoven's 5th:

  10. marie-lucie said,

    October 21, 2013 @ 12:44 pm

    Sorry, I meant the guest post, which seemed to imply that the interval applied to "human speech" in general.

  11. Stephen Hart said,

    October 21, 2013 @ 1:13 pm

    Working link for Margaret Wilson at UCSC:

  12. Bill Benzon said,

    October 21, 2013 @ 2:45 pm

    @Ted McLure: Interesting observation. I wonder if the trouble stems from the fact that actors aren't really having a conversation. They're only appearing to do so.

    @Victor Mair: A wonderful clip. People making music together are, of course, mutually entrained. And I believe we've now got empirical evidence on the point, though I don't have any citations quickly at hand.

  13. Mark Dowson said,

    October 21, 2013 @ 8:30 pm

    @Ted McClure: What I remember from improvisation classes, and later directing amateur theatre, is quite the opposite: The temptation for actors is to reply quickly, rather than exploiting the dramatic effect of silences or of long pauses before responding

  14. Bill Benzon said,

    October 22, 2013 @ 4:37 am

    It's worth mentioning the pioneering work of psychiatrist William Condon. who was publishing on interpersonal synchronization as long ago as 1963. He published an article in Science (183: 99) in 1974: Neonate movement is synchronized with adult speech. Integrated participation and language acquisition. I don't have that paper in front of me at the moment, but I know that in one experiment he filmed neonates (less than an hour old) while adults were talking too them and discovered that their body movements where synchronized to the speech rhythms. He also made observations where autistics failed to synchronize normally with others.

    David Hays brought Condon's work to my attention back in the 1970s, saying he thought it was of fundamental importance. Since it was Hays telling me this, I read Condon carefully. But I didn't really get it back then. Now I do.

    The conceptual problem seems to be that this is about how a physical mechanism works, that of the mind-brain. It's got rhythms and when it engages in a certain kind of communication, those rhythms have to be synched among participants. But there's a lot of thinking that's latched on to superficial information speak, where information isn't physical, it's something else. But sending a signal through a phone line IS physical, and information theory arose around that problem.

    Getting back to Condon and autism. Autism has figured centrally in thinking about so-called Theory of Mind (ToM). ToM has also been linked to gaze following. Could there be a causal link between gaze following and synchronization?

    Some years ago in an essay review of Mithin's The Singing Neaderthal, I speculated as follows:

    Let us push the argument a step further. For the last decade or so there has been considerable interest in the notion that people acquire a so-called theory of mind (TOM) early in maturation and that this TOM is critical to interpersonal interaction (see e.g. Baron-Cohen 1995). Gaze following is one behavior implicated in TOM. Humans beyond a relatively early age will follow the direction of one another's gaze. I would like to suggest that we notice gaze direction in people with whom we synchronize, but not otherwise.

    Think about the perceptual requirements of noticing and tracking gaze direction. Even at conversational distance, another person’s eyes are small in relation to the whole visual scene; thus the visual cues for gaze direction will also be small. Further, people in conversation are likely to be in constant relative motion with respect to one another. The motions may not be large – head turns and gestures, trunk motion – but they will be compounded by the fact that one’s eyes are in constant saccadic motion. Synchronization would eliminate one component of relative motion between people and therefore simplify the process of picking up the minute cues signalling gaze direction. But if one cannot properly synchronize with others, then those cues will be more difficult to notice and track. Thus the capacity for interpersonal synchrony may be a prerequisite for the proper functioning of TOM circuitry.

  15. Alex Bollinger said,

    October 23, 2013 @ 2:05 am

    Marmosets aside, the turn-taking process sounds very interesting in humans. I never would have thought of something like that! Thanks for the guest post.

    If someone can't get it right (say, they respond 75 milliseconds after the previous speaker instead of at 50 or 150, or maybe they vary the intervals), do they sound weird? Do they get a reputation for being socially inept? Is this really important to us but we don't ever think about it, like personal space or a speaker's control of their volume?

    OK, found the wikipedia page on this topic. Time to nerd-out.

    The worst thing is that I'm going to start paying attention to this in conversations now…

  16. Jeb said,

    October 25, 2013 @ 3:40 am

    A Thanks from me as well. Startled and hesitant at first I now want to drop everything and spend a couple of weeks chewing through the evidence. Can see how this area lends itself so well to empirical testing and pleasing to discover work already done on the subject.

    I am almost scared to look at the papers as the temptation to significantly nerd out and not return to routine duties is overwhelming.

  17. I’ve Got Your Missing Links Right Here (26 October 2013) – Phenomena: Not Exactly Rocket Science said,

    October 26, 2013 @ 11:01 am

    […] skeptical scientist from my turn-taking monkeys post expands on her critique at Language […]

  18. Morsels For The Mind – 1/11/2013 › Six Incredible Things Before Breakfast said,

    November 2, 2013 @ 4:37 pm

    […] A turn for the worse? Do marmosets really take turns “talking” as per human conversation? […]

RSS feed for comments on this post