I spent this morning at at ICASSP-2014 session on "Speaker Diarization". As the picture indicates, the room was not exactly handicapped accessible…
Luckily this is not a problem for me, but my experience of three torn knee ligaments a few years ago sticks with me.
Anyhow, I made it up the stairway to Room Scherma, and learned some useful and interesting things about current techniques for speaker diarization, which is the problem of determining who spoke when in an arbitrary audio or video recording.
I'll spare you the details, though I intend to try some of the ideas out myself later. What I want to underline here is something that the six papers in the session had in common.
The authors were some a variety of institutions – Institute of Automation, Chinese Academy of Sciences; IDIAP Research Institute, Switzerland; Saint-Petersburg National Research University of Information Technologies, Mechanics and Optics; University of Eastern Finland; Université du Maine. The topics were also diverse: "Variational Bayes Based I-vector for Speaker Diarization of Telephone Conversations"; "Information Bottleneck based Speaker Diarization of Meetings using Non-speech as Side Information"; "Improving Speaker Diarization using social role information"; "Bayesian Analysis of Similarity Matrices for Speaker Diarization"; "Filterbank Slope based Features for Speaker Diarization"; "A Conditional Random Field approach for Audio-Visual people diarization".
What they all had in common was that they reported results on published databases. Two papers used NIST SRE 2008 data. Three papers used the NIST RT05, RT07, RT08, and/or RT09 datasets. One paper used the AMI corpus. And one used the REPERE collection.
None of the presentations used proprietary or unpublished data. This illustrates the fact that in most speech processing fields, it has become normal to cite the performance of new algorithms on data that is also available to others, so that comparisons are quantitatively meaningful.
In some sense, this is also really about accessibility. When you want to evaluate or extend someone's ideas, it's critical to be able to replicate their work — and that requires access to the datasets they analyzed.
This is not the norm in most areas of linguistics — but it should be.