At the recent Language Diversity Congress in Groningen, one of many interesting presentations was Martijn Wieling and John Nerbonne's "Inducing and using phonetic similarity". More than a thousand LL readers played a role in the creation of this work, by responding to a request back in May ("Rating American English Accents", 5/19/2012) to participate in an online experiment.
A longer explanation of the experiment and its outcome can be found in Martijn Wieling et al., "Automatically measuring the strength of foreign accents in English":
We measure the differences between the pronunciations of native and non-native American English speakers using a modified version of the Levenshtein (or string edit) distance applied to phonetic transcriptions. Although this measure is well understood theoretically and variants of it have been used successfully to study dialect pronunciations, the comprehensibility of related varieties, and the atypicalness of the speech of the bearers of cochlear implants, it has not been applied to study foreign accents. We briefly present an appropriate version of the Levenshtein distance in this paper and apply it to compare the pronunciation of non-native English speakers to native American English speech. We show that the computational measurements correlate strongly with the average “native-like” judgments given by more than 1000 native U.S. English raters (r = -0.8, p < 0.001). This means that the Levenshtein distance is qualified to function as a measurement of “native-likeness” in studies of foreign accent.
One thing that still remains to be done is to compare these results to the distribution of correlations among the human judges — given the diversity of opinion on the nature of the task, it would not surprise me to find that the automatic algorithm agreed with average human judgments as well as or better than individual human subjects agreed with the average. A related question is how much of the unexplained variance (1.0-0.8^2 = 36%) is noise, and how much is due to systematic effects that are missing from the phonetic transcriptions that are input to their MPI-weighted string-edit distance, such as sub-IPA phonetic variation, or prosodic differences, or … It might be true that their algorithm agrees with average human judgments better than individual human judges do, and at the same time be true that there are factors influencing human judgments that their algorithm doesn't pay attention to.
Martijn is good at finding creating ways to recruit experimental subjects, as this video indicates:
Martijn plays the role of the subject.