Fitch (and von Humboldt) on monkey talk

Tecumseh Fitch et al., "Monkey vocal tracts are speech-ready", Science Advances 12/9/2016:

For four decades, the inability of nonhuman primates to produce human speech sounds has been claimed to stem from limitations in their vocal tract anatomy, a conclusion based on plaster casts made from the vocal tract of a monkey cadaver. We used x-ray videos to quantify vocal tract dynamics in living macaques during vocalization, facial displays, and feeding. We demonstrate that the macaque vocal tract could easily produce an adequate range of speech sounds to support spoken language, showing that previous techniques based on postmortem samples drastically underestimated primate vocal capabilities. Our findings imply that the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy. Macaques have a speech-ready vocal tract but lack a speech-ready brain to control it.

Nell Greenfield Boyce, "Say, What? Monkey Mouths And Throats Are Equipped For Speech", NPR 12/9/2016, presents a simulation of a macaque saying "Happy Holidays":

I heartily endorse the study's theme — and so does Wilhelm von Humboldt, hoisting a stein in some heavenly Biergarten. In his posthumous essay On Language, first published in 1836 as the introduction to his book on the Kawi language of Java, he wrote (in Heath's 1988 translation):

The articulated sound, the foundation and essence of all speech, is extorted by man from his physical organs through an impulse of his soul; and the animal would be able to do likewise, if it were animated by the same urge. Already in its first and most indispensable elements, language is so utterly and exclusively rooted in man's spiritual nature, that its permeation is sufficient, though necessary, to transform the animal sound into the articulated one. For the intent and capacity to signify, and not just in general, but specifically by the presentation of thought, is the only thing that constitutes the articulated sound, and nothing else can be stated to describe its difference from the animal cry, on the one hand, and the musical tone on the other. It cannot be described by reference to its constitution, but only by the way it is produced, and this is not due to any incapacity on our part, but is typical of its very nature, since it is nothing else but the soul's intention to utter it, and contains only so much of the physical as external perception cannot do without.

When I first began teaching ling001 twenty years ago, one of my standard final-exam questions was to ask for an explanation in modern terms of the first sentence of that passage.

Here's the original, from Über die Kawi-Sprache auf der Insel Java, nebst einer Einleitung über di Vershiedenheit des menschlichen Sprachbaues un ihren Einfluss auf die geisitige Entwickelung des Menschengeshlechts:

Der Mensch nöthigt den articulirten Laut, die Grundlage und das Wesen alles Sprechens, seinen körperlichen Werkzeugen durch den Drang seiner Seele ab; und das Thier würde das Nämliche zu thun vermögen, wenn es von dem gleichen Drange beseelt wäre. So ganz und ausschliefslich ist die Sprache schon in ihrem ersten und unentbehrlichsten Elemente in der geistigen Natur des Menschen gegründet, dass ihre Durchdringung hinreichend, aber nothwendig ist, den thierischen Laut in den articulirten zu verwandeln. Denn die Absicht und die Fähigkeit zur Bedeutsamkeit, und zwar nicht zu dieser überhaupt, sondern zu der bestimmten durch Darstellung eines Gedachten, macht allein den articulirten Laut aus, und es lässt sich nichts andres angeben, um seinen Unterschied auf der einen Seite vom thierischen Geschrei, auf der andren vom musikalischen Ton zu bezeichnen. Er kann nicht seiner Beschaffenheit, sondern nur seiner Erzeugung nach beschrieben werden, und dies liegt nicht im Mangel unsrer Fähigkeit, sondern charakterisirt ihn in seiner eigenthümlichen Natur, da er eben nichts, als das absichtliche Verfahren der Seele, ihn hervorzubringen, ist, und nur so viel Körper enthält, als die äufsere Wahrnehmung nicht zu entbehren vermag.

But I think that Fitch et al. have gone a step too far in the implications of their statement that "the evolution of human speech capabilities required neural changes rather than modifications of vocal anatomy". There's ample evidence that there have been speech-related modification of vocal anatomy in the hominin line as well — and that's what any good Darwinian should expect, after all.



  1. Fitch (and von Humboldt) on monkey talk • Zhi Chinese said,

    December 10, 2016 @ 9:13 am

  2. John Roth said,

    December 10, 2016 @ 12:22 pm

    Another interesting implication is that it pulls the support out from under theories that language had to be bootstrapped by something else, like sign language. With a speech-ready vocal tract, all that was needed was something to say and the desire to say it. At least, something to say that couldn't already be said.

  3. Gregory Kusnick said,

    December 10, 2016 @ 12:49 pm

    John Roth: I'm no expert, but it seems to me there's step being elided between "something to say and the desire to say it" and "a speech-ready vocal tract". Specifically, one must also have a sufficiently developed motor cortex capable of causing the vocal tract to utter the desired sounds.

    Just as human cortical expansion enabled tool use through increased manual dexterity, isn't it also reasonable to suppose that it similarly enabled speech through increased vocal dexterity (even if all the moving parts were already present in monkeys)?

  4. Neal Goldfarb said,

    December 10, 2016 @ 1:36 pm

    So it seems that if macaques could talk, they'd sound like Greta Garbo and would be politically correct.

    [(myl) Actually, the first part would be true if they were super-macaques, significantly bigger than regular macaques. First because "for comparison with a human female speaker, the average length of all observed monkey vocal tract shapes (11.4 cm) was multiplicatively scaled to an average value appropriate for an adult female human (14 cm)". And second because macaques' natural pitch range is way squeakier than Ms. Garbo's — here's a real macaque vocalization:

    So some laryngeal surgery would probably be necessary for the monkeys to be truly "speech ready".

    But still, no doubt they'd work it out wenn es von dem gleichen Drange beseelt wäre.]

  5. Neal Goldfarb said,

    December 11, 2016 @ 12:14 am

    OK, so Peewee Herman.

  6. James Wimberley said,

    December 11, 2016 @ 3:47 pm

    Songbirds vary enormously in the complexity of their trills, from the monotony of the cuckoo's call to the vast portfolios of nightingales and lyre birds. Presumably they have very similar vocal tracts.

  7. Alex said,

    December 12, 2016 @ 10:42 pm

    I learned back in the day in Anthro 101 that the human larynx is positioned farther down than in other primates. The idea was that this makes us a lot more likely to choke, but the evolutionary disadvantage is outweighed by the better communication skills it provides. Is this no longer thought to be the case? If we could talk just as well with a macaque's larynx, why haven't we evolved not to choke?

    [(myl) What you learned was true then and is true now. Mature humans are the only mammal where the relative positions of the larynx, soft palate and epiglottis are such as to endanger the safe passage of a bolus of food or liquid from mouth to esophagus. See e.g. this diagram and the surrounding discussion here.

    The argument has always been (and continues to be) that this apparently maladaptive development (which happens phylogenetically during the first few years of life) serves the function of enlarging the easily-accessible phonetic space.

    That's not the same as asserting that individuals without this change couldn't "talk" — in fact the evolutionary argument only makes sense if you assume that hominins before the change were already talking.]

  8. Guy said,

    December 13, 2016 @ 1:06 pm

    Maybe this is addressed in the paper, which I haven't read yet, but it seems that there is a difference between "human speech sounds" and "a repertoire of vocalizations that could support a spoken language".

