A diarization corpus from Amazon

« previous post | next post »

About a month ago, Zaid Ahmed and others in Amazon's speech research group released DiPCo ("Dinner Party Corpus"), "a new data set that will help speech scientists address the difficult problem of separating speech signals in reverberant rooms with multiple speakers".

The past decade has seen striking progress in Human Language Technology, brought about by new methods, more training data, and (especially) cheaper/faster computers. But this rapid progress highlights the fact that "All problems are not solved", as I wrote last year — and in particular, the central problem of "diarization", or determining who spoken when, has turned out to be a surprisingly difficult one. And diarization is not just hard for conversations at dinner parties.

I spent the summer of 2017 in Pittsburgh, working with an ad hoc group of researchers on a set of problems related to diarization,  One of the most important results was a series of Diarization Challenges, of which we've had two so far. The first one, of course named DIHARD, was held in 2018, and you can read about it in  "DIHARD", 2/13/2018; "Hearing interactions", 2/28/2018; "DIHARD again", 4/14/2018.

DIHARD II took place earlier this year, with seven papers presented at a special session of Interspeech 2019:

Neville Ryant et al., "The Second DIHARD Diarization Challenge: Dataset, task, and baselines".
Federico Landini et al., "BUT System Description for DIHARD Speech Diarization Challenge 2019".
Ignacio Vinals et al., "ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge".
Zbyněk Zajíc et al., "UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge".
Tae Jim Park et al., "The Second DIHARD challenge: System Description for USC-SAIL Team".
Prachi Singh et al., "LEAP diarization system for the second DIHARD challenge".
Sergey Novoselov, "Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II".

Plans for DIHARD III are under development.




  1. Scott P. said,

    November 5, 2019 @ 2:42 pm

    Shouldn't these be happening over Christmas?

  2. Haamu said,

    November 5, 2019 @ 3:49 pm

    By the time DIHARD III is actually released, it will of course have been retitled DIHARD WITH A VENGEANCE.

RSS feed for comments on this post