Stoeger et al., "An Asian Elephant Imitates Human Speech", Current Biology (2012):
Vocal imitation has convergently evolved in many species, allowing learning and cultural transmission of complex, conspecific sounds, as in birdsong. Scattered instances also exist of vocal imitation across species, including mockingbirds imitating other species or parrots and mynahs producing human speech. Here, we document a male Asian elephant (Elephas maximus) that imitates human speech, matching Korean formants and fundamental frequency in such detail that Korean native speakers can readily understand and transcribe the imitations. To create these very accurate imitations of speech formant frequencies, this elephant (named Koshik) places his trunk inside his mouth, modulating the shape of the vocal tract during controlled phonation. This represents a wholly novel method of vocal production and formant control in this or any other species. One hypothesized role for vocal imitation is to facilitate vocal recognition by heightening the similarity between related or socially affiliated individuals. The social circumstances under which Koshik’s speech imitations developed suggest that one function of vocal learning might be to cement social bonds and, in unusual cases, social bonds across species.
Here's Figure 1, whose legend reads:
Spectral Comparison of the Speech Utterance “nuo”: Spectrograms exemplifying the speech utterance “nuo” of the trainer (A and D) compared to the elephant’s (Koshik) imitation (B and E) and a 40-year-old male Korean native speaker (C and F) with no experience of Koshik’s Korean output (recorded via a head set and thus with higher recording quality than the other two sound samples). (A–C) represent narrow band spectrograms of “nuo” and (D–F) give wide-band spectrograms of each “nuo” utterance, respectively. The fundamental frequency (fund. freq.) and the first and the second formant (F1 and F2) are indicated.
Some audio examples — in each case, the trainer says a word and then Koshik imitates it:
|anja ("sit down")||
|nuo ("lie down")||
The other word in his vocabulary of imitation is aniya "no".
The fidelity of Koshik's reproductions is not as good as this method of presentation may make you think — when you know what a sound is supposed to be, your expectations make an attempt to imitate it sound more accurate, an effect noted by Solzhenitzyn in his description of (poor quality) vocoder testing in The First Circle:
Koshik’s speech sound repertoire was said by his trainers to comprise six Korean words. We tested this hypothesis by analyzing transcriptions made by 16 Korean native speakers on 47 recordings of Koshik’s utterances (see Table S1 available online). The subjects were not informed about the supposed spelling or meaning of the imitations. This analysis largely confirmed the trainers’ claims, indicating that Koshik’s speech imitations correspond to the following five words: “annyong” (“hello,” Audio S1), “anja” (“sit down,” Audio S2), “aniya” (“no”), “nuo” (“lie down,” Audio S3), and “choah” (“good,” Audio S4). Agreement was high for vowels and relatively poor for consonants: vowel transcription similarity was 67% overall, whereas consonant agreement only reached 21% (Table S1). For example, “choah” utterances (according to trainers) were mainly transcribed as “boah” (“look,” 38%) or “moa” (“collect,” 23%), but neither of these utterances was used toward Koshik. As a result, transcriptions provided exact spelling matches (in Korean) for only one sound (“annyong,” “hello,” for which the majority of respondents [56%] agreed) and three additional imitations for which considerable agreement could be documented (“aniya”: 44%; “nuo”: 31%; “anja”: 15%). These results show that Koshik accurately imitates vowels, determined by formant frequency matching, but that consonant fidelity is relatively poor.
Here's a video showing Koshik producing several repetitions of "choah", illustrating the trunk-in-mouth technique of formant manipulation:
This case suggests that elephants must be added to the species known to be capable in principle of vocal learning; and the authors speculate that the application of this ability to the imitation of human speech has social and emotional roots:
Although elephants living under human care may be heavily exposed to speech from birth on, they do not imitate speech on a regular basis. Thus, early intensive speech exposure does not seem adequate to initiate speech imitation in elephants (although it might be a required precondition), as long as they are embedded within an elephant social environment. Koshik was captive-born in 1990 and translocated to Everland in 1993, where two female Asian elephants accompanied him until he was five years old. From 1995 to 2002, Koshik was the only elephant in Everland. He was trained to physically obey several commands and was exposed to human speech intensively by his trainers, veterinarians, guides, and tourists. In August 2004, his trainers first noticed that Koshik imitated speech. We cannot be certain whether Koshik started to produce speech sounds at 14 years of age (near the onset of Koshik’s sexual maturity; his first musth period occurred in March 2005) or whether earlier imitations went unrecognized by his trainers. However, the determining factors for speech imitation in Koshik may be social deprivation from conspecifics during an important period of bonding and development when humans were the only social contact available (this hypothesis may also hold for other known examples of speech imitation in mammals, Hoover the seal and the beluga Logosi, and also most talking birds.
[ht Shermin de Silva]