What's hot at ICASSP

« previous post | next post »

This week I'm at IEEE ICASSP 2017 in New Orleans — that's the "Institute of Electrical and Electronics Engineers International Conference on Acoustics, Speech and Signal Processing". pronounced /aɪ 'trɪ.pl i 'aɪ.kæsp/. I've had joint papers at all the ICASSP conferences since 2010, though I'm not sure that I've attended all of them.

This year the conference distributed its proceedings on a nifty little guitar-shaped USB key, which I promptly copied to my laptop for easier access. I seem to have deleted my local copies of most of the previous proceedings, but ICASSP 2014 escaped the reaper, so I decided to while away the time during one of the many parallel sessions here by running all the .pdfs (1703 in 2014, 1316 this year) through pdftotext, removing the REFERENCE sections, tokenizing the result, removing (some of the) unwordlike strings, and creating overall lexical histograms for comparison. The result is about 5 million words for 2014 and about 3.9 million words this year.

And to compare the lists, I used the usual "weighted log-odds-ratio, informative Dirichlet prior" method, as described for example in "The most Trumpish (and Bushish) words", 9/5/2015.

The first thing to say about this process is that it's become miraculously easy. On my 3-year-old laptop, the whole thing took about 15 seconds of computer time, responding to about six lines of code, which I was actually able to type at the command line with only one or two typographical errors. So it didn't take my mind off the lecture for very long.

The second thing to say is that there are a few artefacts in the results. For example, six of the ten most 2014-ish words were IEEE, Speech, Processing, Signal, Conference, International — because the paper template for the 2014 conference put this header on the first page of every paper

but the 2017 template had no such header.

Turning our attention to the other end of the list, the 20 most 2017-ish words are (where "#17" means "word count in the 2017 papers", "perM17" means "frequency per million in the 2017 papers, "#14" means "word count in the 2014 papers", "perM14" means "frequency per million words in the 2014 papers", and "LogOdds" means "the weighted log of the odds ratio"):

WORD          #17  perM17  #14 perM14 LogOdds
__________________________________________________
lstm          1405  (359)   84  (17)  22.169
cnn           1386  (355)  132  (26)  21.087
convolutional 1067  (273)  167  (33)  17.142
graph         3297  (843) 1988 (394)  15.706
layer         3349  (857) 2236 (443)  14.069
rnn           1104  (282)  392  (78)  13.385
learning      4258 (1089) 3215 (637)  13.306
layers        1719  (440)  896 (178)  13.007
dataset       2641  (676) 1714 (340)  12.936
deep          1762  (451)  947 (188)  12.826
ctc            418  (107)    7   (1)  12.730
blstm          531  (136)   71  (14)  12.443
neural        2228  (570) 1581 (313)  10.572
methods       5308 (1358) 4842 (960)  10.068
cnns           356   (91)   54  (11)  9.961
student        410  (105)   90  (18)  9.800
emotion        849  (217)  415  (82)  9.623
network       4749 (1215) 4437 (879)  8.916
fmri           350   (90)   82  (16)  8.882
recurrent      600  (153)  264  (52)  8.719

As you can see, most of these are associated with "deep learning" "neural net" algorithms: LSTM is "Long Short-Term Memory"; CNN is "Convolutional Neural Network" (not "Cable News Network"); RNN is "Recursive Neural Network"; BLSTM is "bidirectional LSTM; CTC is "Connectionist Temporal Classification"; etc.

So now you know what's hot at ICASSP this year.

 



2 Comments

  1. Yuval said,

    March 9, 2017 @ 5:55 pm

    The fact that neural methods brought "graph" so up high is impressive.

    [(myl) There's a special session on "Graph Topology Inference", which actually seems to be one of the topics that isn't really about pseudo-neural algorithms.]

    (Also: fMRI! Srsly?)

    [(myl) fMRI produces signals in need of processing, right? The abstract for one of the relevant papers:

    Functional magnetic resonance imaging (fMRI) has provided a window into the brain with wide adoption in research and even clinical settings. Data-driven methods such as those based on latent variable models and matrix/tensor factorizations are being increasingly used for fMRI data analysis. There is increasing availability of large-scale multi-subject repositories involving 1,000+ individuals. Studies with large numbers of data sets promise effective comparisons across different conditions, groups, and time points, further increasing the utility of fMRI in human brain research. In this context, there is a pressing need for innovative ideas to develop flexible analysis methods that can scale to handle large-volume fMRI data, process the data in a distributed and policy-compliant manner, and capture diverse global and local patterns leveraging the big pool of fMRI data. This paper is a survey of some of the recent research in this direction.

    ]

  2. Ben Zimmer said,

    March 10, 2017 @ 9:56 am

    Mark is too modest to mention that he is receiving the IEEE James L. Flanagan Speech and Audio Processing Award while he's at ICASSP. From the IEEE site:

    Mark Yoffe Liberman’s trailblazing efforts in creating the Linguistic Data Consortium (LDC) have fueled the development and advancement of human language technologies (HLTs) including speech and speaker recognition, machine translation, and semantic analyses. Founded at the University of Pennsylvania in 1992, the LDC became the largest developer of shared language resources, distributing more than 120,000 copies of over 2,000 databases covering 91 different languages to more than 3,600 organizations in over 70 countries. Liberman has also helped to create technologies that have daily impact on HLTs, including a speech activity detector that regularly processes LDC speech data to reduce annotation cost and increase accuracy and a forced aligner that was integrated into the Forced Alignment and Vowel Extraction service that has revolutionized phonetic and sociolinguistic research.
    Liberman is the Christopher H. Browne Distinguished Professor of Linguistics and director of the Linguistic Data Consortium at the University of Pennsylvania, Philadelphia, PA, USA.

    Congratulations on the richly deserved honor!

RSS feed for comments on this post