The judge in the Zimmerman case has recently decided to let the jury decide for themselves about the source of the screams in the 911 tape ("Jury to decide whose voice on 911 call in Zimmerman case"). This decision is a stinging rebuke to the "expert" testimony of Tom Owen and Alan Reich, and supports the testimony of Peter French, George Doddington, and Hirotaka Nakasone. For a summary of the dueling experts, see Andrew Branca, "Zimmerman Case: Dr. Hirotaka Nakasone, FBI, and the low-quality 3-second audio file", Legal Insurrection 6/7/2013, "Zimmerman Prosecution’s Voice Expert admits: 'This is not really good evidence'", 6/8/2013, and "Zimmerman Case: Experts Call State’s Scream Claims 'Absurd' 'Ridiculous' and 'Imaginary Stuff'", 6/9/2013.
I don't have time this morning to discuss the issues at greater length, but it's clear that the judge's evaluation of the situation was correct.
If you're not familiar with the Speaker Recognition Evaluations (e.g. here and here), or the Language Recognition Evaluations (here and here), that NIST has been running for the past 15 years, you should follow the links above, and take a look at how those projects have been structured.
These programs have been designed to suit the needs of the intelligence community, and are therefore far from a perfect fit to the issues that arise in the courtroom. But there's been enormous progress over the past 15 years, due to the power of the "common task" approach that has had such a striking effect in speech recognition, document retrieval, machine translation, and so on.
It has always seemed strange to me that there's been no comparable effort to put forensic speech and language analysis on a sound quantitative footing. A cynical answer might be that intelligence analysts care about facts and the evidence for them, whereas prosecutors care only about convictions.