"Voiceprint" springs eternal

« previous post | next post »

John R. Quain, "Alexa, What Happened to My Car?", NYT 1/25/2018 [emphasis added]:

And even though voice bots like Alexa and Google’s Assistant can be taught to recognize different voices — well enough to cater to each family member’s favored Pandora stations, for example — they do not offer any sort of biometric security, such as voice print analysis. As a result, Alexa’s voice-recognition capabilities are not discerning enough for security purposes, according to Amazon.

There are two things about this passage that caught my attention.

First, a minor point: the NYT here chooses to write "voice print" as two separate words. This is a change from their previous practice — already in May of 1962 (and many times since),  the grey lady was writing "voiceprint" solid in stories like this one:

A researcher from Bell Telephone Laboratories described yesterday tests that he said, showed that "voiceprints" may prove to be almost as effective, for identification, as fingerprints.

And second, a more important point:  here's a journalist who still thinks that "voice print analysis", however spelled, offers "biometric security".

[Warning: what follows is a long post about lexicographic, technological,  journalistic, and literary history, guaranteeing that at least three quarters of the content will bore or mystify most readers.)

Before going any further, I should point out that latent fingerprint analysis has also been enormously oversold — see e.g. Simon A. Cole, "More than zero: Accounting for error in latent fingerprint identification", Journal of Criminal Law and Criminology 2005. And I should also note that properly-done voice biometrics can work well for verification purposes as well as for investigatory screening, and in fact can be argued to have much better empirical support than latent fingerprint analysis does. (I'm not sure about comparisons to the current phone and laptop fingerprint readers, but a quick search doesn't turn up any serious empirical analysis of their performance.)

Following up on the terminological history, I learned something new — voice print, spelled with a space, goes back at least to 1918, when it was used in a series of bad jokes published in a variety of American newpapers, including The Tenneseean:

What would a woman do without the word "cute"? With her the word has a latitutde of meaning that covers everything from a baby to a monkey.
[…]
A lawyer is always smart enough to make a damage suit large enough to allow for a good deal of shrinkage.
The way of the transgressor is getting harder. Some of these days he won't be able to use a telephone because the sleuths will be able to to trace him by his voice prints on the transmitter.

The idea of the joke seems to be that your voice somehow leaves a smudge on the telephone transmitter just like your fingers do.

And voice-print, with a hyphen, has a more serious history going back at least to 1927, almost 20 years before the declassification of the sound spectrograph, and nearly 40 years before (what I still believe was) the first commercial application of the idea in the late 1960s. In 1927, some wire service put out a small widely-published news note — here's the version from The Circleville Herald, 6/13/1927:

A longer and more elaborate claim for voiceprints, written solid, came from the United Press a few years later — here's the version published by The Pittsburgh Press 2/1/1935:

The source for this story is "Dr. E.E. Free, acoustical engineer connected with New York University", who also figures in the Mechanical Marvels section of Boy's Life for May 1938:

Dr. E. E. Free of N.Y.U. demonstrates his "sound microscope" which magnifies sound 10 thousand billion times! He can hear a wheat weevil eating its way out of a grain of wheat.

And there were some other pioneers/cranks pushing similar ideas around the same time, as well as voicing expressing an appropriate degree of skepticism. Thus "Voice as Identification Record Unlikely to Replace Fingerprints", Lansing State Journal 3/6/1936:

A dispatch from Denver, Colo., stating that substitution of voice recordings phonographically for fingerprinting may result from extensive experiments being conducted in the speech department of the University of Denver, was not substantiated by the opinions of state police fingerprint experts at East Lansing.
[…]
Voice photographs, according to the Denver story, are already playing an important part in the world's oldest art–speech. According to Dr. Elwood Murray of the western school, students and other subjects in the experiments "show their voice" just a plainly as a smudge of the thumb reveals a true fingerprint identity.

A cathode-ray oscillograph is used for the "voice-prints." 

That last sentence helps me understand how the term voiceprint could have been used even before the 1946 declassification of the sound spectrograph, much less its 1941 invention. (See the documentation of the extraordinary Acoustical Society session in May 1946: R.K. Potter, "Introduction to Technical Discussions of Sound Portrayal"; W. Koenig, H.K. Dunn, and L.Y. Lacy, "The sound spectrograph"; H. Dudley and O. Gruenz Jr., "Visible Speech Translators with External Phosphors"; R. R. Riesz and L. Schott, "Visible Speech Cathode‐Ray Translator"; J.C. Steinberg and N.R. French, "The Portrayal of Visible Speech"; G.A. Kopp and H.C. Green, "Basic Phonetic Principles of Visible Speech".)

I had always associated the term "voiceprint" with the work of Lawrence Kersta, the Bell Labs researcher cited in the 1962 NYT story at the start of this post. Kersta presented a paper at the annual meeting of the Acoustical Society of America in 1962, under the title "Voiceprint Identification". Also in 1962, he published a paper in Nature under the same title, and he then went on to start a company Voiceprint Laboratories.

In 1974, Kersta published an expanded version of the Nature paper in Police Law Quarterly, which included these claims of success:

In law enforcement applications, voiceprint identification assistance has been provided for over 75 law enforcement agencies, including municipal, county, state, and government agencies; private industry, defense lawyers, and United States airlines. More than 600 individual cases have been processed for these agencies. Expert testimony has been given at eighteen trials with favorable rulings of admissibility occurring in all. At this date, only one denial has been experienced; this, at a pretrial hearing in a homocide [sic] case. 

In fact, forensic applications of this technology were controversial from the very beginning — see e.g. "Jury deadlocked in Voiceprint trial", NYT 4/17/1966. For a survey of the next few decades, see William R. Jones, "Danger — Voiceprints Ahead", American Criminal Law Review 1973; Sharon Gregory, "Voice Spectrography Evidence: Approaches to Admissibility", University of Richmond Law Review 1986; or Peter Tiersma and Lawrence Solan, "The Linguist on the Witness Stand", Language 2002.

In any case, I'm not the only one who associated the term voiceprint with Kersta — thus the Encyclopedia of Espionage, Intelligence, and Security (?) says that

The U.S. Federal Bureau of Investigation (FBI) used spectrographic or voice identification analysis as early as the 1950s, but the technique did not gain scientific acceptance until a 1962 study by Lawrence Kersta, a researcher working with a 1940s-model Bell Laboratory sound spectrograph. Kersta maintained that "voiceprints," a term he coined, provide a unique means of identifying individuals. He went on to establish a professional association, the International Association of Voice Identification, which in 1980, became part of the more general International Association for Identification.

Kersta (and/or his associates) went so far as to design a way of producing "contour" spectrograms, apparently just because they look somewhat fingerprint-like. As far as I know, no one has ever found those contour spectrogram displays useful in any actual scientific or engineering application. Here's a figure from the 1962 Nature and 1974 Police Law Quarterly papers, showing an example of this idea:

Kersta explains:

The voiceprint shown at the right is called a contour voiceprint for the obvious reason of its pattern of contours. These contours are just like those which appear on a topographical map which, instead of measuring levels of altitude, measures levels of loudness. The other dimensions (time; frequency) are the same as for the bar voiceprints. Contour voiceprints are used for computer automated classification of speakers and provide a filing system for identified prints.

Even before looking voiceprint up in newspapers.com and the OED, I should have wondered about the pre-1962 history of the term, because it's used as a chapter heading in the English translation of Alexander Solzhenitsyn's novel In the first circle (В кру́ге пе́рвом), a fictionalized account of his experiences in a Soviet sharashka (secret prison-camp R&D lab) in the late 1940s.

The crucial action concerns a recorded telephone call with politically suspect content:

The rumor went around some of the laboratories that Minister Abakumov had arrived in person, accompanied by eight generals. In other laboratories people went on sitting quietly, unaware of the impending storm.

The rumor was half true: Deputy Minister Selivanovsky had arrived, accompanied by four generals. […]

After a hurried conference some of the visitors remained in Yakonov’s office, and others made for Number Seven, while only Selivanovsky accompanied Major Roitman down to the Acoustics Laboratory.  […]

“But how do you expect to identify the man?” Selivanovsky asked as they went along. Roitman had first heard of the commission five minutes ago and so had no thoughts on the matter. Oskolupov had done the thinking for him the night before, when he thoughtlessly undertook the task. All the same, Roitman’s mind had been busy in the last five minutes.

He addressed the deputy minister informally and with no trace of servility. “Look,” he said. “We have a visible speech device for recording speech visually. It prints off what we call voiceprints, and there’s a man by the name of Rubin who can read them.”

“A prisoner?”

“Yes. Senior lecturer in philology. Just lately I’ve had him working on the detection of individual speech peculiarities."

"Rubin" is the fictional version of Lev Kopelev, whose memoir of his sharashka experiences was published in English as Ease My Sorrows. In Solzhenitsyn's version, Rubin faces a dilemma: the authorities have a particular candidate that they want him to identify as the guilty speaker; but listening to the recording, he identifies the voice as belonging to a different person who is a friend of his. Technical "voiceprint" analysis has nothing to do either with the official identification or with his human recognition.

I'm not sure what term Sozhenitsyn used for "voiceprint" in the original Russian — perhaps some commenter with access to the text can tell us.

Anyhow, the history of the term "voice print"" (or "voice-print"or "voiceprint") is a pretty much a 100-year progression of jokes, fakery, and exaggeration. As a result, this term is generally not used by serious researchers to describe serious research in speaker recognition and speaker verification (of which there is plenty). But it's clear that journalists, from 1918 to today, find "voiceprint" helpfully evocative of "fingerprint" — and continue to have a somewhat exaggerated idea of the forensic reliability of fingerprints.

A few other relevant LLOG posts:

"Earwitnesses, voiceprints, automatic speaker recognition", 11/1/2003
"Speech-based lie detection in Russia", 6/8/2011
"Separated by a common problem", 12/2/2013
"Forensic linguistics in the Zimmerman case", 6/24/2013
"'Voiceprints' again", 10/14/2014

[Considering his impact, it's odd that there's no Wikipedia article for Lawrence Kersta.]



9 Comments

  1. Tom Dawkes said,

    January 29, 2018 @ 1:36 pm

    There's an online copy of V kluge perm at sharashhttp://solzhenitsyn.ru/upload/books/v_kruge_pervom.pdf
    but I haven't so far been able to find the passage, as my Russian is limited.

  2. Daniel Milton said,

    January 29, 2018 @ 2:24 pm

    Off the main topic, but I was struck (and momentarily mislead) by the lead sentence in the1935 article “finger print experts, most deadly trackers down in scientific crime detection”. Is, or was, “tracker down” a normal two-word substantive?

  3. Matt said,

    January 29, 2018 @ 3:21 pm

    In this edition of Solzhenitsyn's text:

    http://lib.ru/PROZA/SOLZHENICYN/vkp1.txt

    The Russian word used is "звуковиды"

  4. David Marjanović said,

    January 29, 2018 @ 4:54 pm

    звуковиды

    "what sounds look like", "sound-views", "sound aspects"…

  5. Jerry Friedman said,

    January 29, 2018 @ 5:19 pm

    The return to "voice print" is an example of something I've noticed. We're often told there's a tendency for compound words in English to go from open to hyphenated to solid, but there's an opposite tendency too. On my students' homework, I correct open compounds to hyphenated or solid more often than I make the other possible corrections. As far as I can tell, there are people who write almost all compounds open, maybe making exceptions for "birthday", "football", and a few others.

    On another subject, "philology" is an interesting discipline for someone studying voice identification. Would he have been a senior lecturer in linguistics in American in the late '40s?

  6. Jerry Friedman said,

    January 29, 2018 @ 5:21 pm

    "in America", that is.

    Obligatory ngram result for "voiceprint, voice print, voice-print".

  7. John Swindle said,

    January 30, 2018 @ 9:06 pm

    It's recently been shown that orcas can imitate human speech. We know that some birds do likewise. These abilities may have implications for the security of voice prints for any of these species. Perhaps voice prints could be used in conjunction with species-specific identifiers such as, in the case of humans, handwriting analysis.

  8. Graeme said,

    February 2, 2018 @ 4:25 am

    Touché John. No biometric security based on voiceprints? My heart went out to the orcas practising hard to imitate their keepers in the hope of a great escape.

  9. John Swindle said,

    February 12, 2018 @ 2:15 am

    Yesterday afternoon, on a weekend, I called CitiBank about a credit card problem. The customer service representative requested my consent for voice recording and voiceprint for security purposes. I readily agreed, and I got excellent service. I may of course have had an especially good voiceprint.

RSS feed for comments on this post