Erratic in time, anyhow — maybe no more erratic than usual in terms of content.
My first stop will be Florence for ICASSP 2014, where I'm involved in three papers whose details won't interest most of you. But here they are anyhow:
Andreas Stolcke, Neville Ryant, Vikramjit Mitra, Jiahong Yuan, Wen Wang, and Mark Liberman, "Highly Accurate Phonetic Segmentation Using Boundary Correction Models and System Fusion".
Jiahong Yuan, Neville Ryant, and Mark Liberman, "Automatic Phonetic Segmentation in Mandarin Chinese: Boundary Models, Glottal Features and Tone".
Neville Ryant, Jiahong Yuan, and Mark Liberman, "Mandarin Tone Classification Without Pitch Tracking".
The general idea behind this work is the automation of phonetic transcription and measurement. The goal is to make it possible to use very large available collections of digital audio in phonetics research – you could call it the Robot Phonetician Project.
After ICASSP, I'm going to London, for a panel at the British Academy on "Language, Linguistics, and the Data Explosion". I won't be talking about the Robot Phonetician, at least not in any detail, but there's obviously a connection. And here's a draft blog post about my contribution to the panel. This was solicited by the editor of a section of the Guardian online ("The case for language learning"), which is apparently part of the BA's partnership with the Guardian. Neither the panel nor my post has much to say about language learning, but I've offered to add some relevant observations. I think that some version of this will appear later in the week.
Then comes a General Linguistics Seminar at Oxford, where I'll try to make some linguistic sense out of the "Tone without Pitch" results. (Some additional experiments in that line will be presented at Speech Prosody 2014: Neville Ryant, Malcolm Slaney, Mark Liberman, Elizabeth Shriberg, and Jiahong Yuan, "Highly Accurate Mandarin Tone Classification In The Absence of Pitch Information".) Here's the abstract for my talk:
A "deep neural network" classifier, applied to a diverse corpus of new broadcasts, achieved the best performance ever recorded on the task of recognizing Mandarin tones. Oddly, the classifier accomplished this using generic spectral inputs that do not encode fundamental frequency (F0) in any obvious way. The same classifier had much worse performance with amplitude and F0 estimates as input; and adding F0 to the generic spectral inputs degraded performance slightly.
After various less interesting ideas have been considered and rejected, we offer three increasingly general (and speculative) explanations:
(1) The psychological dimension of pitch involves more than F0;
(2) The phonetic dimension of tone involves more than pitch;
(3) The reflection of phonological categories in articulation and sound is more complex than linguists generally assume.
Implications for phonological, phonetic, and sociolinguistic research will be discussed.
[Joint work with Neville Ryant, Jiahong Yuan, Malcolm Slaney, and Elizabeth Shriberg]