Accessibility and diarization

« previous post | next post »

I spent this morning at at ICASSP-2014 session on "Speaker Diarization". As the picture indicates, the room was not exactly handicapped accessible…

Luckily this is not a problem for me, but my experience of three torn knee ligaments a few years ago sticks with me.

Anyhow, I made it up the stairway to Room Scherma, and learned some useful and interesting things about current techniques for speaker diarization, which is the problem of determining who spoke when in an arbitrary audio or video recording.

I'll spare you the details, though I intend to try some of the ideas out myself later. What I want to underline here is something that the six papers in the session had in common.

The authors were some a variety of institutions — Institute of Automation, Chinese Academy of Sciences; IDIAP Research Institute, Switzerland; Saint-Petersburg National Research University of Information Technologies, Mechanics and Optics; University of Eastern Finland; Université du Maine. The topics were also diverse: "Variational Bayes Based I-vector for Speaker Diarization of Telephone Conversations"; "Information Bottleneck based Speaker Diarization of Meetings using Non-speech as Side Information"; "Improving Speaker Diarization using social role information"; "Bayesian Analysis of Similarity Matrices for Speaker Diarization"; "Filterbank Slope based Features for Speaker Diarization"; "A Conditional Random Field approach for Audio-Visual people diarization".

What they all had in common was that they reported results on published databases. Two papers used NIST SRE 2008 data. Three papers used the NIST RT05, RT07, RT08, and/or RT09 datasets. One paper used the AMI corpus. And one used the REPERE collection.

None of the presentations used proprietary or unpublished data. This illustrates the fact that in most speech processing fields, it has become normal to cite the performance of new algorithms on data that is also available to others, so that comparisons are quantitatively meaningful.

In some sense, this is also really about accessibility. When you want to evaluate or extend someone's ideas, it's critical to be able to replicate their work — and that requires access to the datasets they analyzed.

This is not the norm in most areas of linguistics — but it should be.




  1. Dan Milton said,

    May 6, 2014 @ 3:38 pm

    "Determining who spoke when" seems a far stretch from "writing in a diary", which I presume is the basic meaning of "diarization".
    Can you or someone say something on the history of the term?

    {(myl) Diary has long been used for any time-marked record of events. Dictionaries give glosses like "An account of daily events or transactions; a journal; specifically, a daily record kept by a person of any or all matters within his experience or observation: as, a diary of the weather; a traveler's diary."

    Examples of such "diaries" include cases where the individual entries are quite simple and stereotyped, e.g. a diary of weather observations like the start and end of periods of rain, or a diary of medical symptoms and treatments.

    So it's not a very big stretch to call a time-marked account of who spoke when a "diary" of a conversation or a meeting. And the process of automatically creating such an account from a recording would thus be "diarization".

    I'm not sure who came up with the word diarization, or in what context. The first time that I heard it was about 15 years ago, in exactly the context of the speech processing task. So it's possible that someone at NIST came up with the word, though it's also possible that the term had earlier uses in medical or business jargon.

    In any case, it's widely used in that sense now.]

  2. valency said,

    May 6, 2014 @ 8:04 pm

    @Dan Milton

    Perhaps from Greek διαίρεσις, diairesis, meaning "division, distribution, distinction", with a bit of metathesis thrown in? That's unrelated to English diary, which apparently, according to Wiktionary, derives from diārium, meaning a daily allowance for soldiers, from dies "day".

    Although if true this opens the question of why Greek is still considered the go-to language for technical terminology of this sort.

    [(myl) No, it's from Latin. AHD:

    Latin diārium, daily allowance, daily journal, from diēs, day; see dyeu– in Indo-European roots.

    And the word was adopted into English when Latin was still the European lingua franca.]

  3. Chris Brew said,

    May 6, 2014 @ 10:39 pm

    NLP has the same expectation. A new technique should be applied to datasets that others are able to use. This is a huge positive, but it has a dark side. Some of the key datasets in NLP were very costly to create, so there is a very strong incentive to just keep revisiting the same data, when it would really be better to build a completely new dataset. This is a worry, because, over time, the techniques that the research community tend to become over-specialized to the particularities of these datasets, and less and less applicable to the language that people actually want to process these days.

    There are no obvious solutions, other than for brave, public spirited people to continually refresh the community's store of data. Unfortunately, the LDC cannot do this alone.

  4. Peter CS said,

    May 7, 2014 @ 12:40 pm

    A diary, in UK English, usually means a book in which one records future appointments. I was surprised when I moved to the US in 1972 to find out that Americans didn't commonly have a pocket diary in their pocket or a desk diary on their desk. Paper diaries have to an extent been replaced by electronic equivalents but 'diarise' is still business jargon for 'put in the diary' – eg 'we'll diarise the meetings for the rest of the year'. 'Diary' is also still understood in the sense of a historic journal a la Pepys, but not many people keep those nowadays.

RSS feed for comments on this post