More deceptive statements about Voice Stress Analysis

« previous post | next post »

Leonard Klie, "Momentum Builds for Voice Stress Analysis in Law Enforcement", Speech Technology Magazine, Summer 2014:

Nearly 1,800 U.S. law enforcement agencies have dropped the polygraph in favor of newer computer voice stress analyzer (CVSA) technology to detect when suspects being questioned are not being honest, according to a report from the National Association of Computer Voice Stress Analysts.

Among those that have already made the switch are police departments in Atlanta, Baltimore, San Francisco, New Orleans, Nashville, and Miami, FL, as well as the California Highway Patrol and many other state and local law enforcement agencies.

The technology is also gaining momentum overseas. "The CVSA has gained international acceptance, and our foreign sales are steadily growing," reports Jim Kane, executive director of the National Institute for Truth Verification Federal Services, a West Palm Beach, FL, company that has been producing CVSA systems since 1988.

How does this stuff supposedly work?

CVSA works by measuring involuntary voice frequency changes that would indicate a high level of stress, as occurs when someone is being deceptive. Muscles in the voice box tighten or loosen, which changes the sound of the voice, and that is what the CVSA technology registers.

"The technology uses proprietary methods to process the vocal input, typically yes or no responses to direct questions," Kane explains. "CVSA analyzes vocal input and identifies responses where stress is either present or absent and provides graphical output for each yes or no response."

Here "proprietary", as far as I can tell, means something like "The original 'voice stress' ideas have been thoroughly debunked both theoretically and practically, so now we won't tell anyone how our products work, so that no one  can test the ideas without buying our stuff and taking our training — and if they do that and fail to find positive results, we can say that it's because they did it wrong…"

In "Analyzing voice stress", 7/2/2004, I complained that I and others had been trying for 30 years to validate the claims behind "voice stress analysis", without even being able to find evidence for the stable measurement (and even the existence) of the features (like variable "micro-tremors" or other "involuntary voice frequency changes") that this technology is supposed to be based on.

How can I make you see how amazing this is? Suppose that in 1957 some physiologist had hypothesized that cancer cells have different membrane potentials from normal cells — well, not different potentials, exactly, but a sort of a different mix of modulation frequencies in the variation of electrical potentials between the inside of the cell and the outside. And further suppose that some engineer cooked up a proprietary circuit to measure and display these alleged variations in "cellular stress" (to the eyes of a trained cellular stress expert, of course), and thereby to diagnose cancer, and started selling such devices to hospitals, and selling training courses in how to use them. And suppose that now, almost half a century later, there is still no documented, well-defined procedure for ordinary biomedical researchers to use to measure and quantify these alleged cell-membrane "tremors" — but companies are still making and selling devices using proprietary methods for diagnosing cancer by detecting "cellular stress" — computer systems now, of course — while well-intentioned hospital administrators and doctors are occasionally organizing little tests of the effectiveness of these devices. These tests sometimes work and sometimes don't, partly because the cellular stress displays need to be interpreted by trained experts, who are typically participating in a diagnostic team or at least given access to lots of other information about the patients being diagnosed.

This couldn't happen. If someone tried to sell cancer-detection devices on this basis, they'd get put in jail.

But as far as I can tell, this is essentially where we are with "voice stress analysis."

In "Speech-based lie detection? I don't think so", 11/10/2011, I cited several studies that tested the then-available versions of such technology, and found that they didn't work: Harry Hollien and James Harnsberger, "Evaluation of two voice stress analyzers", J. Acoust. Soc. Am. 124(4):2458, October 2008; James Harnsberger, Harry Hollien, Camilo Martin, and Kevin Hollien, "Stress and Deception in Speech: Evaluating Layered Voice Analysis", Journal of Forensic Sciences 54(3) 2009; Robert Pool, Field Evaluation in the Intelligence and Counterintelligence Context, National Research Council, 2009. The last reference includes an especially systematic review of the literature. (Another relevant article is Harry Hollien et al., "Evaluation of the NITV CVSA", Journal of Forensic Sciences, 2008.)

The only substantive argument in favor of CVSA in the Speech Technology article is that polygraph "lie detection" often fails:

Part of the reason for the growing acceptance of CVSA technology, according to Kane, is the attention now being given to several recent high-profile failures of the polygraph. Former NSA employee and whistle-blower Edward Snowden, for example, reportedly passed two polygraph exams during his tenure with the federal agency.

The article does quote a number of unsubstantiated claims from the CVSA salesman and from an industry lobbyist:

Kane says that compared to polygraphs, CVSA is easier to use; takes less time per exam; is less expensive; yields more positive results; is harder to defeat; has a very low error rate; is noninvasive; and works with voice recordings as well as live interactions.

"As an investigative and decision support tool, CVSA has proven itself to be invaluable to law enforcement," adds Lt. Kenneth Merchant of the Erie, PA, Police Department and legislative director of the National Association of Computer Voice Stress Analysts.

Independent research has tied CVSA technology to an accuracy rate that exceeds 95 percent. Polygraph, Merchant says, "is not nearly as close. Results can be inconclusive, which is not something that you have with CVSA."

No indication is given of what this "independent research" is. A search on Google Scholar for "computer voice stress analysis" turns up only nine hits since 2010:

Five of these are patent applications, which contain no test results.

One is a legal document "Recommendation to Retain under DOD Control for Guantánamo Detainee, Gha'im Yadel" which mentions that "detainee was given a Computer Voice Stress Analysis, which showed deception on a number of questions", and another is a journalism-school masters project about an unsolved murder, which mentions that "Mallory changed his story after police told him a Computer Voice Stress Analysis test appeared to be a “deceptive indicator". These show that authorities sometimes use CVSA, and that the results sometimes succeed in pressuring suspects (which seems to be the main value of such tests).

One is a review article that cites one of the studies showing that CVSA doesn't work ("A well controlled study from the University of Florida, on 70 adult volunteers using computer voice stress analysis (CVSA), indicated that the sensitivity of CVSA was about as accurate as chance (Hollien et al., 2008)").

And the last hit is a book on The Unsolved Mystery of Noah's Ark, which doesn't appear actually to contain any references to CVSA.

So I looked at the web site for the National Institute for Truth Verification: Federal Services ("The World Leader in Voice Stress Analysis"). I couldn't find any references to the "Independent research that has tied CVSA technology to an accuracy rate that exceeds 95 percent". It's possible that this quote is the result of some kind of telephone-game exaggeration chain; but it also may reflect an application of the methodology that I discussed in "Determining whether a lottery ticket will win, 99.999992849% of the time", 8/29/2004.

I found only one somewhat-relevant thing on the NITV web site — a page entitled "U.S. Air Force Research Lab Study",, which says, in its entirety

A New Study Of Voice Stress Analysis Funded by the National Institute Of Justice and Conducted By The U.S. Air Force Research Lab Establishes VSA’s Accuracy As “Performance approaching that of current polygraph technology.”  

In a highly regarded report presented to the 38th Hawaii International Conference on System Sciences, researchers reported findings that directly contradict both previous and current polygraph-funded studies.  These polygraph funded studies, which did not utilize the protocols established by the manufacturers, found the accuracy of voice stress analysis as a truth verification device to be less than chance.  The Air Force Lab researchers, using protocols established by the manufacturers of the VSA, were able to determine that VSA technology is, in fact, a viable alternative to the polygraph.

This seems to be a reference to C.S. Hopkins et al., "Evaluation of Voice Stress Analysis Technology", HICSS 2005, which does indeed conclude that "This study has found that VSA technology can identify stress better than chance with performance approaching that of current polygraph systems". though the report adds that  "However, [VSA] is not a technology that is mature enough to be used in a court of law".

The study used real-world materials:

The audio data collected consisted of recorded truth  verification examinations, where bipolar (Yes/No)  responses were given. Ground truth was required to identify  deceptive/non-deceptive results. This ground truth typically  consisted of a confession and some form of corroborating  evidence. In the case of the non-deceptive individuals, a  confession or arrest of another person or clearing by other  means of investigation was sufficient.

The specific test results cited come nowhere near "an accuracy rate that exceeds 95%":

I believe that "Positive" and "Negative" mean "True Positive" and "True Negative". Thus overall, there were 118+198=316 cases where an analyst decided that the subject was lying, and the analyst was wrong 118/316 =  37.3% of the time.  There were 127+73=200 cases where a trained analyst decided that the subject was telling the truth, and here the analyst was wrong 73/200 =  36.5% of the time. So these findings might support a claim of 63% accuracy, but certainly not 95%.

And it's important to note that these were not automated lie/truth outputs from a machine, they were the interpretations of analysts with a lot of training, and often with decades of law-enforcement experience:

It's not clear from the report how much of the audio recordings the analysts had access to, in addition to whatever parts they put through the VSA systems, but the report's description of VSA methodology goes into considerable detail about "pre-test", "test", and "post-test" procedures, used to establish subject-specific baselines comparable to the methods used in polygraph examinations. So a plausible control might have been to ask experienced law-enforcement interrogators to evaluate truthfulness on a purely (rather than partly) subjective basis.

The NITV web page does not provide a link to a later NIJ-funded study on voice stress analysis. This might be because its headline is "Voice Stress Analysis: Only 15 Percent of Lies About Drug Use Detected in Field Test" (NIJ Journal No. 259, March 2008), and it starts this way:

Law enforcement agencies across the country have invested millions of dollars in voice stress analysis (VSA) software programs.[1] One crucial question, however, remains unanswered:

Does VSA actually work?

According to a recent study funded by the National Institute of Justice (NIJ), two of the most popular VSA programs in use by police departments across the country are no better than flipping a coin when it comes to detecting deception regarding recent drug use. The study's findings also noted, however, that the mere presence of a VSA program during an interrogation may deter a respondent from giving a false answer.

VSA manufacturers tout the technology as a way for law enforcers to accurately, cheaply, and efficiently determine whether a person is lying by analyzing changes in their voice patterns. Indeed, according to one manufacturer, more than 1,400 law enforcement agencies in the United States use its product.[2] But few studies have been conducted on the effectiveness of VSA software in general, and until now, none of these tested VSA in the field—that is, in a real-world environment such as a jail. Therefore, to help determine whether VSA is a reliable technology, NIJ funded a field evaluation of two programs: Computer Voice Stress Analyzer® (CVSA®)[3] and Layered Voice AnalysisTM (LVA).

Researchers with the Oklahoma Department of Mental Health and Substance Abuse Services (including this author) used these VSA programs while questioning more than 300 arrestees about their recent drug use. The results of the VSA output—which ostensibly indicated whether the arrestees were lying or telling the truth—were then compared to their urine drug test results. The findings of our study revealed:

  • Deceptive respondents. Fifteen percent who said they had not used drugs—but who, according to their urine tests, had—were correctly identified by the VSA programs as being deceptive.
  • Nondeceptive respondents. Eight and a half percent who were telling the truth—that is, their urine tests were consistent with their statements that they had or had not used drugs—were incorrectly classified by the VSA programs as being deceptive.

Using these percentages to determine the overall accuracy rates of the two VSA programs, we found that their ability to accurately detect deception about recent drug use was about 50 percent.

CVSA performed somewhat worse than LVA in these tests, identifying only 8% of the deceptive responses as deceptive, and attributing deception to about 10% of the truthful responses. Here the accuracy was 10% or less, not 95%.

As I wrote back in 2004,

I'm not prejudiced against "lie detector" technology — if there's a way to get some useful information by such techniques, I'm for it. I'm not even opposed to using the pretense that such technology exists to scare people into not lying, which seems to me to be its main application these days. But when a theory about quantitative measurements of frequency-domain effects in speech has been around for half a century, and no one has ever published an equation, an algorithm or a piece of code for making these measurements, and willing and competent speech researchers (like me) can't create reliable methods for making such measurements from the descriptions we find in the literature… something is wrong.

Previous LLOG posts on speech-based lie detection:

"Analyzing voice stress", 7/2/2004
"Determining whether a lottery ticket will win, 99.999992849% of the time", 8/29/2004.
"KishKish BangBang", 1/17/2007
"Industrial bullshitters censor linguists", 4/30/2009 (see especially the comments threads, e.g. herehereherehere.)
"Speech-based lie detection in Russia", 6/8/2011
"Speech-based lie detection? I don't think so", 11/10/2011


Update — In a press release on the NITV site under the title "NITV FS Awarded Third Patent", I found this claim:

A newly published research study in the 2012 annual edition of the scientific journal Criminalistics and Court Expertise reports the accuracy rate of the Computer Voice Stress Analyzer (CVSA®) is greater than 95%, an assertion long made by the law enforcement users of the system. The study’s results are further bolstered by current US Government funded voice analysis research which has established voice technologies performed well for border security applications.

The 18-year field study was conducted by Professor James L. Chapman, the world’s foremost authority on the application of Voice Stress Analysis technologies. The study, titled “Long-Term Field Evaluation of Voice Stress Analysis in a North American Criminal Justice Setting” was ground-breaking in that it validated the tremendous success of the CVSA in the criminal justice system.

However, I've been unable to locate either the journal "Criminalistics and Court Expertise" or the cited research study. James L. Chapman, according to his 2012 obituary, was the Director of the Criminal Justice Program and the Forensic Crime Laboratory at the State University of New York, Corning (which I believe is Corning Community College), as well as the Director of Training and Standards for the National Association of Computer Voice Stress Analysts. But Google Scholar doesn't seem to know of any publications under his name.

I was able to find some press coverage of the 2012 study at World Net Daily. The description there sounds questionable to me:

The study’s findings revealed the CVSA, when used as an investigative support tool, can accurately predict whether a person under investigation is being truthful or deceptive. When the CVSA was used for diagnostic purposes to predict deception, positive results were obtained in over 95% of the cases, with no false positive results identified. Additionally, a strong, indirect relationship (approximately 94%) was discerned between crime consequence and confession rates among guilty subjects.Empirical data collected by the CVSA’s manufacturer, US law enforcement agencies, and the US military have long supported a 95% or greater accuracy rate for the CVSA; however, this is the first independent and peer reviewed scientific study to validate these data.

If anyone can find me a copy of the cited study, I'll be grateful. Any pointers to substantive descriptions of CVSA's technology would also be appreciated. (I did find the CVSA II patent, which describes what is claimed to be a way to quantify micro-tremors — I'll discuss it in a later post.)

 

 



14 Comments

  1. Ernie in Berkeley said,

    May 18, 2014 @ 11:27 am

    I'd be curious to try this thing out. I have a mild neurological tremor that causes my voice to tremble and sometimes stutter, with words feeling like they're forced out of a mouth full of cotton. Come to think of it, I wonder if this would affect galvanic skin response as well.

    [(myl) The original claim underlying CVSA was that there's a laryngeal micro-tremor which is reduced or eliminated under stress. This claim has been challenged, e.g. in T. Shipp & K. Izdebski, "Current evidence for the existence of laryngeal macrotremor and microtremor", J. Forensic Sci. 1986, where 9-Hz tremor was observed in biceps muscle, but nothing comparable in laryngeal muscle.]

  2. Craig said,

    May 18, 2014 @ 1:42 pm

    What is it about law enforcement that makes them so gullible when it comes to snake oil? Should we really trust these people to conduct competent and honest investigations of serious crimes if they can't see through nonsense like this?

  3. Keith M Ellis said,

    May 18, 2014 @ 7:11 pm

    "Should we really trust these people to conduct competent and honest investigations of serious crimes if they can't see through nonsense like this?"

    The whole criminal justice system values eyewitness testimony more highly than almost anything, and yet it is extremely unreliable. It seems to me, off the top of my head, that other than criminal forensic science (which is a notably recent development), the whole of criminal law (both enforcement and justice) is strikingly untouched by and arguably hostile to science. Law enforcement and criminal justice are social institutions that are especially built around intuition and custom.

  4. KevinR said,

    May 18, 2014 @ 10:15 pm

    I'm surprised at the number of departments claimed (1800). Are there Daubert- or Frye-qualified experts able to defend this technology in courts, or is this merely another inadmissable technique to obtain confessions?

    Given the 'quality' of research uncovered by ML, I'm not even sure there'd be a Kumho-qualified expert.

  5. isaiah said,

    May 18, 2014 @ 10:16 pm

    Here is some more alleged information about the alleged journal article.

    The alleged co-author, Marigo Stathis, does appear to be a legitimate scientist.

  6. Michael Watts said,

    May 18, 2014 @ 10:30 pm

    What is it about law enforcement that makes them so gullible when it comes to snake oil?

    Why single out law enforcement? They look for magic that will help them in their jobs, like lie sensing and criminal profiling. Middle Eastern governments look for dowsing rods for explosives, oil, and plain old traditional water. People all over the world buy love charms. What if I asked you what it was about Americans that makes them uniquely vulnerable to crystal salesmen?

    http://www.penny-arcade.com/comic/2010/12/24

  7. Marta said,

    May 19, 2014 @ 2:08 am

    I didn't have any more success than Mark did at locating Criminalistics and Court Expertise, but a DuckDuckGo search of the journal title turned up a supposed reprint of the paper:
    http://www.permontgroup.com.ar/doc/PDF/1Published%20CVSA%20Study%202012(1).pdf

    [(myl) Thanks!]

  8. Eye5600 said,

    May 19, 2014 @ 12:00 pm

    "I'm not even opposed to using the pretense that such technology exists to scare people into not lying, which seems to me to be its main application these days."

    A few years ago, I spent an idle day at work reading web sites about polygraph testing. My overall impression is that the polygraph is not so much a truth detector as polygraph testing is particular interrogation protocol. It relies on the subject believing that it works, thus polygraph examiners and related experts have a vested interest in stressing its reliability. So, you can't believe them.

    I'm sure the same is true of CVSA.

  9. Alan Gunn said,

    May 19, 2014 @ 4:04 pm

    @Eye5600

    I think you are absolutely right about polygraphs. I spent nearly ten years working on child-abuse cases, where polygraphs were used a lot, and I've seen transcripts of polygraph exams by skilled questioners in which people admitted quite a bit, usually under the apparent assumption that the examiner knew when they were lying, or at least that they were nervous. It was also clear that the kind of examination in which an examiner asked questions and then confidently reported that the subject was or was not lying was worthless. The key here may be that an examiner who really believes in the "science" is useless, though the kind that knows it's a con and can exploit it can find out things..

    Polygraphs aren't nearly the worst example of nonsensical techniques in law enforcement. I'd nominate handwriting identification, which is even allowed in trials, unlike polygraphs (in most places, anyway). There are a fair number of studies showing that handwriting "experts" can't do what they claim to do. Years ago I worked on a case in which a "leading handwriting expert" got it wrong. That inspired me to look into the matter of how someone gets to be a "leading handwriting expert," as it isn't something you study in school. Turns out, if you can get a court to let you testify about someone's handwriting, that gives you a credential, which will then persuade some other courts to let you testify, and after you've done that a lot, you're a giant in your field. Scary.

  10. Rubrick said,

    May 19, 2014 @ 4:58 pm

    I'd guess the only way to stop this (expensive) nonsense would be a massive fraud lawsuit. But who would file it?

  11. J. W. Brewer said,

    May 19, 2014 @ 6:06 pm

    I'm not sure I disagree with Keith Ellis on the proposition that both law enforcement investigative practices and the judicial system's handling of criminal matters are in many ways unscientific and have a lot of dubious folk theories of epistemology embedded in them, but it is interesting to note the lengthy divergence in this area, where polygraph-type techniques are widely used outside court while equally widely (perhaps not universally, I haven't had cause to look at the issue recently) barred from evidentiary use in-court. With respect to CVSA there may be a similar reticence – I came across a recent appellate case that seemed to assume the legitimacy of federal convicted felons on supervised release being required to answer various potentially incriminating questions posed by their probation officer (e.g. "have you been looking at child pornography on the internet since our last meeting?") while subject to CVSA evaluation but carefully sidestepped the question of what consequences could or couldn't attach to "failing" the CVSA evaluation in that context.

  12. isaiah said,

    May 19, 2014 @ 9:52 pm

    Does that "permontgroup.com.ar" link in Marta's comment work for anyone? It doesn't work for me at all, my browser can't even find that domain. And now I'm very curious to see if this is a real paper.

    And if people have found the paper, what language is it in?

  13. Marta said,

    May 20, 2014 @ 11:03 pm

    @isaiah said,:

    The link is corrupted but the URL is correct. Just copy and paste it in its entirety into your browser.

    The paper is in English.

  14. daniel said,

    September 6, 2014 @ 12:57 pm

    Though I do believe the technology will someday be attained, I don't think it has been done at this point. Here is the reason why. Voice stress analysis requires the frequency domain between roughly 4hz and 20hz to be examined.
    But as it happens, because this frequency is well below human hearing, microphones do not typically record in this range. depending on the quality of the microphone, it could even have a cutoff as high as 200hz. (My laptop computer) True, there is still a tiny portion of the signal that gets through, but it would be a different proportion for the upper end (15hz) than for the lower end, and would be different on every microphone.
    Although people who are very good with FFT can attain one hz discrimination, the vast majority cannot get better than 4hz at this frequency. Since the frequencies of interest are primarily 8 to 14, the FFT accuracy becomes yet another problem.
    Finally, the portion of the speech to analyze it typically about 100ms out of perhaps every 100000ms, (10 seconds) or 1% of the signal is of actual interest, and the rest is not. You would need a context sensitive algorithm to know which keyword, and which portion of that keyword to do the analysis on. For example, the subject might say, "The last time I saw the jelly donut was Tuesday morning, and it was okay when I left it then." It would only be in the context of the other questions that one would even know which words were the keywords. Stress on Jelly donut might be from hunger, or from guilt or from childhood associations, or from lying.
    Long story short, I am in agreement that the current stuff out their is mostly junk and that even the stuff that does have a special microphone and a world class fft person writing their algorithm still lacks the algorithm to know context. And without context, it wont get it right. yet.

RSS feed for comments on this post