Furious sleeping continues

« previous post | next post »

Several people have sent me pointers to the linguistically-themed 9/27/2023 NYT crossword puzzle. For some discussion by Sam Corbin, see "Talk, Talk, Talk", NYT 9/26/2023 ("Scott Koenig puts silly thoughts to bed with a clever crossword"), which includes a quotation from the puzzle's author:

I first learned about Professor Chomsky as an undergraduate linguistics minor. The man has been a public intellectual and an absolute legend in the field for more than seven decades, and still remains active today, earlier this year penning a guest opinion essay contrasting ChatGPT’s approach to language with that of a human. (I’d like to call special attention to the wonderfully clever title of the paper that the essay references.)

[Spoiler alert: a solved version of the puzzle is presented after the fold…]

Here's the solved puzzle, courtesy of Gene Buckley:

As Sam Corbin's article observes, the puzzle obviously riffs on the famous example sentence from Syntactic Structures (1957), "Colorless green ideas sleep furiously".

And Scott Koenig is right that Chomsky presents that sentence as "demonstrating the distinction between syntax and semantics". But that's not all. As the surrounding passage makes clear, Chomsky also wanted to refute two other (then and now) popular ideas, namely the idea of basing linguistic analysis on a "corpus" (a large collection of examples),  and the idea of a  "language model" (a statistical or other quantitative summary of patterns found in such collections):

On what basis do we actually go about separating grammatical sequences from ungrammatical sequences? I shall not attempt to give a complete answer to this question here (cf. §§ 6.7), but I would like to point out that several answers that immediately suggest themselves could not be correct. First, it is obvious that the set of grammatical sentences cannot be identified with any particular corpus of utterances obtained by the linguist in his field work. 

[…]

Second, the notion "grammatical" cannot be identified with "meaningful" or "significant" in any semantic sense. Sentences (1) and (2) are equally nonsensical, but any speaker of English will recognize that only the former is grammatical.

(1) Colorless green ideas sleep furiously.
(2) Furiously sleep ideas green colorless.

[…]

Third, the notion "grammatical in English." cannot be identified in any way with the notion "high order of statistical approximation to English." It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally 'remote' from English. Yet (1), though nonsensical, is grammatical, while (2) is not.

Times have changed since 1957. Digital collections of text and speech became available, and grew exponentially. Computer power also grew exponentially. And "language models" — invented in the 1940s by Alan Turing and Claude Shannon — have also grown bigger and bigger, have evolved algorithmically  in (pseudo-)neural directions, and have begun to exhibit impressive (if still over-hyped) abilities.

But Noam's argument, persuasive as it was, was actually refuted in the late 1990s, using old-fashioned statistical methods. For details, see "Colorless green probability estimates", 10/4/2003, and the articles referenced therein.

It would be even easier today to refute the assertion that "in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally 'remote' from English", using pseudo-probabilities or similar measuures from any of several open-source language models. (Though of course the original "colorless green ideas" sentence has been repeated so many times on the internet that we'd need to craft a new sentence with analogous properties.)

It remains true that what we humans know about "syntax" is qualitatively different from what LLMs know about word-sequence (or letter-sequence) patterns — but proving it requires more than just one pair of semantically-improbable word-sequences.

 



11 Comments

  1. mpg said,

    September 29, 2023 @ 10:50 am

    I’ll note that “green ideas” has been a semantically valid phrase for many years now. Maybe “sleeping furiously” will become meaningful someday too.

    -mpg

  2. Ben Zimmer said,

    September 29, 2023 @ 1:18 pm

    "Green ideas" from 1776…

    To facilitate the conception, he [sc. Joseph Priestley] tells us, that a whole group of ideas shall so perfectly coalesce into one, as to appear but a single idea. The instance which he thinks comes nearest to this is, that a mixture of the several primary colours produces white. But this instance, though the nearest to the case before us, is too distant from it to yield any solid argument. For he ought to have shewn that the idea of white is compounded of the ideas of the primary colours; or that whatever may be affirmed of the colours, may be also affirmed of their ideas. If the last be true, as it is evidently supposed in Dr. Priestley's argument, it will bring to light many classes of ideas that have escaped the observation of philosophers. For, if whatever is affirmed of objects, may also be affirmed of their ideas, we shall have blue, and red, and green ideas; ideas that are weighed by the ton, and others that are measured by the bushel, elastic and unelastic ideas, animal and vegetable ideas, and a thousand other kinds.
    The Monthly Review, Vol. 54 (Jan. 1776), p. 42

  3. Mark Liberman said,

    September 29, 2023 @ 1:28 pm

    @Ben Zimmer:

    And "ideas sleep" from John Fearn, An essay on Consciousness; or, a series of evidences of a distinct mind (1812):

  4. Mark Liberman said,

    September 29, 2023 @ 1:43 pm

    @Ben Zimmer:

    Also, "colorless ideas", from Edward Moffat Weyer, "A New Search for the Soul", The International Journal of Ethics, 1907:

    The brain and the products of the brain belong to one sphere, the feelings to another. The faculty of reason is a brain process ; it deals in ideas, and all that comes before its tribunal must come in the form of ideas. The feelings, on the contrary, in that they occupy an inner realm of consciousness, will receive scant justice in the outer realm where the intellect has set up its tribunal. They cannot appear before this tribunal and must, so to speak, offer substitutes to present their plea. These substitutes are not feelings but ideas-they are really ideas-of-feelings—and here lies the difficulty. Surely there is a wide difference, for example, between feeling happy and merely having the idea of happiness. The experiences of the inner life will fade when subjected to this process that reduces their infinite variety to a few general types, represented by colorless ideas and designated by words that are incapable of reviving the impressions they stand for. If this substitution is allowed before the case goes to trial, the outcome will be certain enough, since ideas are undoubtedly dependent on the brain.

    It's amusing to find things like word-sequence or modifier-head overlaps with Noam's sentence. But the real point is that slightly more abstract versions of the "notion 'high order of statistical approximation to English'" actually do distinguish 'Colorless green ideas sleep furiously' from 'Furiously sleep ideas green colorless', and would do so even if there were no literal word-sequence or word-dependency precedents.

  5. Bill Benzon said,

    September 29, 2023 @ 6:09 pm

    From the Chatster. The poem is by John Hollander.

    Does this make sense: Curiously deep, the slumber of crimson thoughts: While breathless, in stodgy viridian, Colorless green ideas sleep furiously.

    The sentence you provided, "Curiously deep, the slumber of crimson thoughts: While breathless, in stodgy viridian, Colorless green ideas sleep furiously," is grammatically correct and has a clear meaning. The sentence is a series of phrases and clauses that describe the sleeping habits of "colorless green ideas." It uses vivid imagery and figurative language to create a sense of mystery and intrigue. While the sentence may not be typical or straightforward, it does make sense and is effective at conveying a particular mood or atmosphere. Is there anything else you would like to know?

  6. J.W. Brewer said,

    September 29, 2023 @ 6:23 pm

    Back in the '80's when I was an undergraduate at a certain university, I thought it a shame and scandal that the (quite prominent in those days) English Department faculty by and large appeared entirely ignorant of anything about the English language that was known to their colleagues in the (much more marginal on campus) Linguistics Department or (more to the point) would have been known to an undergraduate who'd taken a single intro linguistics class. John Hollander was an ornament of the English faculty in those days; I never took a class with him but I had friends who did. I suppose I should thus be pleased that he had enough indirect exposure to someone in the linguistics world to merit an allusion in a poem.

    Of course, if I have any complaint in hindsight about what the Linguistics Dep't faculty taught me back then, it's that too high a percentage of it was Chomskyan nonsense so obviously nonsensical that the Chomskyans themselves had formally disavowed it (with the roll out of the Minimalist Program(tm)) within less than a decade after I got my B.A.

  7. Julian said,

    September 29, 2023 @ 8:50 pm

    "Sentences (1) and (2) are equally nonsensical"
    Not quite sure what "nonsensical" is supposed to mean in this context. Arguably the first sentence has more sense than the second. If green ideas could be colourless, and if they could sleep etc etc

  8. Philip Taylor said,

    September 30, 2023 @ 6:39 am

    As mpg observed above, "green ideas" are already a part of normal everyday language (think "climate change") but I have just discovered that "colourless green" is also in genuine use — see https://www.londongraphics.co.uk/pebeo-studio-acrylics-phosphorescent-gel-100ml-colourless-green; Chomsky’s concocted example becomes ever more plausible as time passes …

  9. Jerry Packard said,

    September 30, 2023 @ 10:00 am

    @J.W. Brewer “too high a percentage of it was Chomskyan nonsense so obviously nonsensical that the Chomskyans themselves had formally disavowed it” by rolling out the minimalist program.

    I can certainly believe that too much of it was Chomskyan – as my department and many others suffered the same fate – but to blanket term it as nonsensical I feel is over the top; pre-minimalist Chomsky introduced many valuable concepts. The movement to minimalism to me was a recognition that many of the details were assuredly wrong, and that shifting to a more abstract theoretical form was an attempt to better account for the generalities of the system.

    The tenets of leading-edge theoretical mathematics and physics surely are rife with questionable propositions, but I wouldn’t want to think that that makes our pursuit of them worthless (though scholars of applied math and physics may well disagree!).

  10. Anthea Fleming said,

    September 30, 2023 @ 8:17 pm

    Thank you for the enjoyable
    crossword – the only ones i couldn't solve were American idioms unknown among the kangaroos.

  11. CrosswordZone said,

    October 10, 2023 @ 1:32 pm

    Undoubtedly, the crossword added a delightful touch! This vividly underscores the fluidity of language, highlighting how seemingly nonsensical phrases can eventually weave into the tapestry of meaningful dialogue. It serves as a reminder that the evolution of language is ever-ongoing, influenced by myriad factors, including technology. In today's age, where algorithms and AIs play a significant role in shaping linguistic landscapes, the marriage between traditional linguistics and the computational models has never been more crucial.

RSS feed for comments on this post