This week I'm at IEEE ICASSP 2017 in New Orleans — that's the "Institute of Electrical and Electronics Engineers International Conference on Acoustics, Speech and Signal Processing". pronounced /aɪ 'trɪ.pl i 'aɪ.kæsp/. I've had joint papers at all the ICASSP conferences since 2010, though I'm not sure that I've attended all of them.
This year the conference distributed its proceedings on a nifty little guitar-shaped USB key, which I promptly copied to my laptop for easier access. I seem to have deleted my local copies of most of the previous proceedings, but ICASSP 2014 escaped the reaper, so I decided to while away the time during one of the many parallel sessions here by running all the .pdfs (1703 in 2014, 1316 this year) through pdftotext, removing the REFERENCE sections, tokenizing the result, removing (some of the) unwordlike strings, and creating overall lexical histograms for comparison. The result is about 5 million words for 2014 and about 3.9 million words this year.
And to compare the lists, I used the usual "weighted log-odds-ratio, informative Dirichlet prior" method, as described for example in "The most Trumpish (and Bushish) words", 9/5/2015.
Read the rest of this entry »