"These can be aptly compared with the challenges, problems, and insights of particle physics"

« previous post | next post »

I'm in Paris for Acoustics 2008, and Edouard Geoffrois invited me to come a little early to attend the 10th annual Séminaire DGA/DET "Traitement de la parole, du langage et des documents multimédias" ("Processing of speech, language and multimedia"), held at the École Nationale Supérieure de Techniques Avancées (ENSTA). In this case, "attend" turned out to mean "give two talks at", and one of my assigned topics was "Human Language Technologies in the United States" (and yes, after a brief excuse in French, I'm ashamed to say that I gave the talk in English…)

For this survey, I decided to start with some historical background, and so I went back to the famous ALPAC report. This was a report to the National Academy of Sciences by the Automated Language Processing Advisory Committee, entitled "Language and Machines: Computers in Translation and Linguistics", and released in 1966.

This report is mainly remembered as a rather negative assessment of the quality, rate of progress, and economic value of research on Machine Translation. It's widely credited with eliminating, for more than two decades, nearly all U.S. Government support for MT research. (The Air Force continued to support Systran for further work on a Russian/English system originated at Georgetown; but I gather that this was really development support rather than research.)

ALPAC had a small effect on my own life: in 1966 I had a part-time job as an undergraduate research assistant in an MT project at the Aiken Computational Laboratory at Harvard. But the project came to an end, and the next year, my part-time job was a mechanic's assistant in an ice cream cone factory in Kendall Square.

However, I had forgotten — if I ever knew — that as the title "Computers in Translation and Linguistics" suggests, the ALPAC report had another and more positive message. Here's a sample:

Today there are linguistic theoreticians who take no interest in empirical studies or in computation. There are also empirical linguists who are not excited by the theoretical advances of the decade–or by computers. But more linguists than ever before are attempting to bring subtler theories into confrontation with richer bodies of data, and virtually all of them, in every country, are eager for computational support. The life's work of a generation ago (a concordance, a glossary, a superficial grammar) is the first small step of today, accomplished in a few weeks (next year, in a few days), the first of 10,000 steps toward an understanding of natural language as the vehicle of human communication. […]

We see that the computer has opened up to linguists a host of challenges, partial insights, and potentialities. We believe these can be aptly compared with the challenges, problems, and insights of particle physics. Certainly, language is second to no phenomenon in importance. And the tools of computational linguistics are considerably less costly than the multibillion-volt accelerators of particle physics. The new linguistics presents an attractive as well as an extremely important challenge.

There is every reason to believe that facing up to this challenge will ultimately lead to important contributions in many fields. A deeper knowledge of language could help

1. to teach foreign languages more effectively;
2. to teach about the nature of language more effectively;
3. to use natural language more effectively in instruction and
4. to enable us to engineer artificial languages for special purposes (e.g.,
pilot-to-control tower languages);
5. to enable us to make meaningful psychological experiments in language
use and in human communication and thought (unless we know what
language is we do not know what we must explain); and
6. to use machines as aids in translation and in information retrieval.

However, the state of linguistics is such that excellent research, which has value in itself, is essential if linguistics is ultimately to make such contributions.

Such research must make use of computers. The data we must examine in order to find out about language is overwhelming both in quantity and in complexity. Computers give promise of helping us control the problems relating to the tremendous volume of data, and to a lesser extent the problems of data complexity. But, we do not yet have good, easily used, commonly known methods for having computers deal with language data. Therefore, among the important kinds of research that need to be done and should be supported are (1) basic developmental research in computer methods for handling language, as tools for the linguistic scientist to use as a help to discover and state his generalizations, and as tools to help check proposed generalizations against data; and (2) developmental research in methods to allow linguistic scientists to use computers to state in detail the complex kinds of theories (for example, grammars and theories of meaning) they produce, so that the theories can be checked in detail.

ALPAC's background is explained in the report's preface:

The Department of Defense, the National Science Foundation, and the Central Intelligence Agency have supported projects in the automatic processing of foreign languages for about a decade; these have been primarily projects in mechanical translation. In order to provide for a coordinated federal program of research and development in this area, these three agencies established the Joint Automatic Language Processing Group (JALPG).

In 1964, the JALPG set up the ALPAC, chaired by John Pierce, who was then the executive director of the communications sciences division at Bell Labs, and had played a central role in the development of communications satellites and had named the transistor, among other contributions. I believe that Pierce was the intellectual heart of ALPAC, but its other members were also important: John Carroll, Eric Hamp, Charles Hockett (who left the committee at the end of 1964), Anthony Oettinger, and Alan Perlis.

Although I'm a big linguistics booster, I can't claim that our field has yet lived up to the assertion that its challenges, insights and potentialities match those of particle physics. But it's only been 42 years. So all you bright young high school students, you could still get in on the golden age…


  1. kyle gorman said,

    June 24, 2008 @ 2:11 pm

    mark steedman's recent article in Computational Linguistics (34:1, 2008) touches on some of the same points. one area that i think computational linguistics and linguistics in general has failed is the inability to recognize and advertise the public value of the linguistic Holy Grails: human-like ASR and MT systems. i'd like to think that these are closer to singularity than particle physics is to their theory of everything, and presumably the product of nearness-to-goal and usefulness should confer much more public interest and respect than superstring theorists currently enjoy.

  2. Nick B said,

    June 24, 2008 @ 8:03 pm

    'Multibillion-volt accelerators' should be electron-volt (a measure of energy) rather than volt (a measure of electrical potential difference). In human scales, giga-volts sound impressive compared to the 240 volts mains. A giga-electron-volt unfortunately is of the order of a nano-Joule, which doesn't quite have the same ring to it.

  3. john riemann soong said,

    June 25, 2008 @ 6:43 am

    "you could still get in on the golden age…"

    I do hope so!

    It sounds kind of morbid, but I can't wait until all the teachers who still live in the "My Fair Lady" era die. Or at least retire.

  4. Blake Stacey said,

    June 25, 2008 @ 1:05 pm

    "[…] presumably the product of nearness-to-goal and usefulness should confer much more public interest and respect than superstring theorists currently enjoy."

    Thanks to the manufactroversy over string theory, perhaps.

  5. Rick S said,

    June 28, 2008 @ 2:23 pm

    There is a pair of invalid SGML/XML characters in this post which are preventing RSS feeds from succeeding in Internet Explorer 7. The characters are VT (vertical tab) control characters, with ASCII codes x0B, and they occur just before the third paragraph in red, which starts "There is every reason to believe". Deleting these characters, or perhaps replacing them with spaces, should restore the feeds.

    BTW, the feeds also don't pass W3C's XML validator (here). The flaws may be in the WordPress software, or they may be in customization files. However, these flaws don't seem to stop feeds from working as long as no invalid characters are present. A search of the WordPress help forums shows may be helpful.

RSS feed for comments on this post