This started out to be a short report on some cool, socially relevant crowdsourcing for Egyptian Arabic. Somehow it morphed into a set of musings about the (near-) future of natural language processing…
A statistical revolution in natural language processing (henceforth NLP) took place in the late 1980s up to the mid 90s or so. Knowledge based methods of the previous several decades were overtaken by data-driven statistical techniques, thanks to increases in computing power, better availability of data, and, perhaps most of all, the (largely DARPA-imposed) re-introduction of the natural language processing community to their colleagues doing speech recognition and machine learning.
There was another revolution that took place around the same time, though. When I started out in NLP, the big dream for language technology was centered on human-computer interaction: we'd be able to speak to our machines, in order to ask them questions and tell them what we wanted them to do. (My first job out of college involved a project where the goal was to take natural language queries, turn them into SQL, and pull the answers out of databases.) This idea has retained its appeal for some people, e.g., Bill Gates, but in the mid 1990s something truly changed the landscape, pushing that particular dream into the background: the Web made text important again. If the statistical revolution was about the methods, the Internet revolution was about the needs. All of a sudden there was a world of information out there, and we needed ways to locate relevant Web pages, to summarize, to translate, to ask questions and pinpoint the answers.
Fifteen years or so later, the next revolution is already well underway.
Read the rest of this entry »