Thanks, Bill Dunn!
In a comment on a recent LL post, Daniel C. Parmenter wrote:
In my MT days (starting in the early nineties) we used the WSJ corpus a lot. I read recently that the availablity of this corpus was in no small part thanks to you. And so I thank you. In those pre-and-early Google/Altavista days the WSJ corpus was an enormous help. Thanks!
Daniel is referring to an archive of text from the Wall Street Journal, covering 1987-1989, originally published with some other raw material for corpus linguistics by the Data Collection Initiative of the Association for Computational Linguistics (ACL/DCI). And the person who most deserves thanks for the availability of the WSJ part of this publication — perhaps its most important part — is Bill Dunn, who was the head of Dow Jones Information Services in the late 1980s.
As far as I know, Bill's role in making this corpus available is not documented anywhere, so I'll take this opportunity to tell some of the story as I remember it. (The rest of this post is a slightly-edited version of an email that I sent on 5/1/2008 to someone at the WSJ who had corresponded with Geoff Pullum about an article on the use of corpus materials in linguistic research.)
Read the rest of this entry »