Bing gets weird — and (maybe) why

« previous post | next post »

For weeks, everyone was talking about how great the Large Language Model (LLM) ChatGPT is, or else showing that it can make serious mistakes of fact or logic. But since the alliance between OpenAI and Microsoft added (a version of) this LLM to (a version of) Bing, people have been encountering weirder issues. As Mark Frauenfelder pointed out a couple of days ago at BoingBoing, "Bing is having bizarre emotional breakdowns and there's a subreddit with examples". The cited subreddit, r/bing,  has examples going back to the start of the alliance. And today, Kevin Roose posted a long series of strikingly strange passages from his own interactions with the chatbot , "Bing's A.I. Chat: 'I Want to Be Alive", NYT 2/16/2023.

One question about these interactions is where the training data came from, since such systems just spin out word sequences that their training estimates to be probable. Someone suggested to me, "It seems like they might have gotten a bunch of conversations from a bad dating app, maybe one that is known for catfishing or something." But OpenAI tells us that

We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

So an army of low-paid "AI trainers" created training conversations, and also evaluated such conversations comparatively — which apparently generated enough sad stuff to fuel those "bizarre emotional breakdowns".

A second question is what this all means, in practical terms. Most of us (anyhow me) have seen this stuff as somewhere between pathetic and ridiculous, but C.M. pointed out to me that there might be really bad effects on naive and psychologically vulnerable people.

Update –Haamu makes some good points in the comments, namely that Bing/Sydney/Chat seems to have a couple of layers beyond the normal LLM architecture:

  1. "[S]ome sort of set of self-governance rules that overlay basic text generation, supplied by the researchers to shape the responses. You can see this when you try to get ChatGPT to make an inappropriate comment, express a political opinion, or comment on events or circumstances that are more recent than the scope of its training corpus."
  2. "Contextual awareness, where ChatGPT […] seems to “understand” not just grammatical antecedents, but conceptual antecedents, and not just recent ones, but deep ones as well."

I agree that #1 suggests an additional "just say no" layer — which apparently can be circumvented in various ways, if the published conversations are accurate.

As for #2, it's not clear whether the contextual stuff is a separate (but interacting) system, or just a bigger/more elaborate/more extensive deep-net architecture, along the general lines of "Long Short-Term Memory". My bet would be on the second option.

But so far, OpenAI and MS are not telling us enough about the system's details to answer these (and other) questions.

Update #2 — In the comments, Rick Rubenstein and Phil H observe that the latest LLMs seems to have mastered "grammar", in some important sense. My own impression is that this is basically the fulfillment of what Zellig Harris was claiming back in the 1950s, about how syntax (and semantics, and pragmatics) were emergent properties of the (appropriate) mathematical analysis of very large amounts of text. That old operationalist underestimated the required scale of training data, and had a very different idea of the mathematical processes involved than the Deep Nets folks do, but the basic idea was the same.

And yes, I know that children manage the task with training data that's many orders of magnitude smaller, and that many people think that machines could do so too, with the right nativist or rationalist assumptions, and/or maybe additional modalities, active participation in learning, etc.

But all the same, what's happening with systems like ChatGPT is not a qualitative change in approach, but just another example of how quantity can turn into quality.  For a bit more discussion, see my slides for a panel last week, about the effect of LLMs on writing instruction, organized by Penn's Critical Writing Program.

Update #3 — MS has fixed the problem, or at least arranged to prevent it from happening. See "The new Bing & Edge — Updates to Chat", Microsoft Bing Blogs 2/17/2023; and Eric Bellman, "Microsoft Puts Caps on New Bing Usage After AI Chatbot Offered Unhinged Responses", WSJ 2/17/2023.


  1. Alexander said,

    February 17, 2023 @ 1:04 am

    In Roose’s interaction with Bing/Sydney, I was struck by how often it’s responses were structured in triads: three items in a list, three adjectives describing a subject, three sentences following the same pattern…

  2. Rick Rubenstein said,

    February 17, 2023 @ 3:00 am

    As sources of information, ChatGPT and its brethren clearly have deep problems, and I really doubt they're one simple fix from being solved. However, their grammar is extremely impressive. I wonder if, were the engineers to get the stars out of their eyes for a moment, they could be harnessed to (finally) create a non-terrible automated grammar checker.

  3. AntC said,

    February 17, 2023 @ 5:37 am

    But this sad blurting of grammatical(-ish) but off-the-point/conversationally inappropriate chunder/training data is exactly what ELIZA used to do ~50 years ago.

    OTOH perhaps Bing has been clicking on too many of the get-rich-quick/pseudo-medical adverts at that BoingBoing link.

  4. Haamu said,

    February 17, 2023 @ 7:56 am

    … such systems just spin out word sequences that their training estimates to be probable.

    This seems dismissive. At a minimum (and in light of the quoted material you provide immediately afterward), it ought to be amended to

    … such systems just spin out word sequences that their training estimates to be probable and preferable.

    I get the impression that the common conception of how these models work is that they are basically Markov-Chain-Via-Neural-Net, whatever that entails, but there clearly seems to be more going on.

    First, there is some sort of set of self-governance rules that overlay basic text generation, supplied by the researchers to shape the responses. You can see this when you try to get ChatGPT to make an inappropriate comment, express a political opinion, or comment on events or circumstances that are more recent than the scope of its training corpus. I suppose this could be accomplished by training it on comparison data – here’s an inappropriate response, preference score X; and here’s an avoidance or redirection response, preference score Y – but the model seems so insistent on certain aspects of self-governance that it feels like there’s a self-governance subroutine in there somewhere.

    And then there’s contextual awareness, where ChatGPT in particular (I haven’t tried the other current ones) vastly exceeds anything I’ve seen before. This is not just a difference in degree, but a difference in kind. It seems to “understand” not just grammatical antecedents, but conceptual antecedents, and not just recent ones, but deep ones as well.

    In the most dramatic example for me, I spent over two hours with it on a single topic the other night because I was so impressed with the continuity of the responses and the quality of the advice it was giving. I had already had about 40 conversations with it of varying quality, mostly just goofing around, but this time I was up in the middle of the night, unable to sleep, and I decided to ask it to help me think through the outline of a book I’ve been thinking about writing for the last few years. I decided to be as natural as possible and treat it like an expert human interlocutor, and I got back something that met the challenge. Once we established that the plan was to structure the book around 4 major concepts, it understood that objective for the remainder of the two-hour-plus conversation – not just correctly resolving what expressions like “the four points” meant, but seemingly understanding their importance to the framing and progression of other ideas in the general outline and even suggesting a fifth point. Another thing that took me aback was when ChatGPT decided, unprompted, to provide a better answer to a previous question based on something that occurred to it (I’m trying not to be too anthropomorphic, but it’s challenging) as a result of the new topic we were presently talking about.

    I understand that this could all be just a probabilistic illusion, but it worked. I found it quite valuable in helping me to break through a creative block. (In this sense, these models could be useful writer’s tools – not to author text, but to serve as sounding boards.) The way it exceeded previous performance with me probably had to do with the different/better way I was prompting it and testing its contextual awareness. But the point is some sort of very extensive contextual awareness capability has been used to augment, or serve as massive input to, the probabilistic aspects of the model.

    In summary: to the extent that these models begin to show quirks, personalities, and even mental-illness-like behaviors, it could be due to complex interaction between the training regime, the self-governance rules, and the context/memory capabilities that have been built in.

  5. Scott P. said,

    February 17, 2023 @ 8:41 am

    I sort of assumed a lot of these 'weird' examples were invented, much like the popular fad a few years back to pretend you fed a lot of training data to an AI and had it generate a list of names for cats or superhero movies or whatever. Is there any way to be sure these stories are real?

  6. Phil H said,

    February 17, 2023 @ 9:58 am

    Yeah, just to echo what Rick Rubinstein said. GPT-3 and its spinoffs ChatGPT and Bing Chat have done something amazing, which is to solve English grammar. GPT-2 still produced a high proportion of sentences which were ill-formed, either syntactically or semantically. GPT-3 seems to me to produce no more ill-formed sentences than a human speaker of English. This enormous breakthrough has been completely overshadowed in the popular domain by the fact that GPT models still regularly talk nonsense. But I hope linguists are sitting up and taking notice. Something revolutionary has happened here.

  7. Eric Ringger said,

    February 17, 2023 @ 11:05 am

    Regarding: " As for #2, it's not clear whether the contextual stuff is a separate (but interacting) system, or just a bigger/more elaborate/more extensive deep-net architecture, along the general lines of "Long Short-Term Memory". My bet would be on the second option. "

    Yes. This is a consequence of the Transformer architecture, which is modeling dependencies at both local and global scales as well as granularities in between. Start with the paper "Attention is all you need" and keep reading from there to fully appreciate the entire mechanism.

  8. Chester Draws said,

    February 17, 2023 @ 3:29 pm

    My experience is the reverse Haamu.

    If you play around with ambiguous statements, that are only understood in context ("Time flies like an arrow", "a bird in the hand is worth two in the bush") it gets nowhere. It stubbornly refuses to play along, continually reverting to the standard meanings. It clearly does not hold onto the concept that there might be a thing like a "time fly" or that "bird" has other meanings.

  9. DJL said,

    February 17, 2023 @ 3:40 pm

    Isn't it still the case that what LLMs do, and only do, is predict the most likely word given a string of words as input, and based on the distributional properties of words and strings (in terms of vectors, matrices, weights, and the rest of it)?

    Admittedly, this is not what a user experiences when dealing with ChatGPT, but this is because of the advances in "prompt engineering" and the fact that a user interacts with a LLM via a dialogue management system (or assistant), which queries the underlying LLM appropriately as to give the illusion of engaging in conversation, but this doesn't change what LLMs actually do.

    I bring this up because it still seems absurd to claim that next token prediction systems have mastered "grammar" in any meaningful understanding of what this actually involves, and in this sense the point about children acquiring language in a short period of time would appear to be a non sequitur. LLMs haven't learned any natural language; at best they are weakly generating language, and whilst this may be important in formal language theory and computational linguistics, it isn't "competence".

  10. AntC said,

    February 17, 2023 @ 4:42 pm

    @Rick R, @Phil H have done something amazing, which is to solve English grammar.

    What, all of it?

    It's not difficult to restrict its output to a few sentence-frames know to be grammatical. Does it _parse_ all of English's weird inversions and dependent clauses, etc? Or does it just slurp up 'content words' from its input? "Most of the trees were made of wood. And so were the rest". "The mat sat on the cat."

    I see nothing 'solved' here.

  11. Bill Benzon said,

    February 17, 2023 @ 8:59 pm

    Researchers at Anthropic have been doing some interesting research attempting to 'reverse-engineer' in-context learning: In-context Learning and Induction Heads. From the opening paragraphs:

    Specifically, in our prior work we developed a mathematical framework for decomposing the operations of transformers, which allowed us to make sense of small (1 and 2 layer attention-only) models and give a near-complete account of how they function. Perhaps the most interesting finding was the induction head, a circuit whose function is to look back over the sequence for previous instances of the current token (call it A), find the token that came after it last time (call it B), and then predict that the same completion will occur again (e.g. forming the sequence [A][B] … [A] → [B]). In other words, induction heads “complete the pattern” by copying and completing sequences that have occurred before. Mechanically, induction heads in our models are implemented by a circuit of two attention heads: the first head is a “previous token head” which copies information from the previous token into the next token, while the second head (the actual “induction head”) uses that information to find tokens preceded by the present token. For 2-layer attention-only models,Note that induction heads don’t occur in 1 layer models, because they require a composition of attention heads in different layers. we were able to show precisely that induction heads implement this pattern copying behavior and appear to be the primary source of in-context learning. […]

    In this paper, we take the first preliminary steps towards building such an indirect case. In particular, we present preliminary and indirect evidence for a tantalizing hypothesis: that induction heads might constitute the mechanism for the actual majority of all in-context learning in large transformer models. Specifically, the thesis is that there are circuits which have the same or similar mechanism to the 2-layer induction heads and which perform a “fuzzy” or “nearest neighbor” version of pattern completion, completing [A*][B*] … [A] → [B] , where A* ≈ A and B* ≈ Bare similar in some space; and furthermore, that these circuits implement most in-context learning in large models.

    I've been working with stories, using a method derived from the from how Lévi-Strauss analyzed myths in Mythologiques. I present ChatGPT with a prompt having two components: 1) a short story, and 2) an instruction to create a new story based on the story I've given it, but changing the protagonist. I'm interested in the ensemble of changes it makes in order to preserve coherence. For example, when princess XP-708-DQ is substituted for Princess Aurora, what else changes in the story? In that case ChatGPT assumed that XP-708-DQ was a robot, though I didn't say anything about that, and made extensive changes on that basis. In another case I asked it to make Princess Aurora a giant chocolate milkshake. When I asked it to make her a colorless green idea, it responded that it couldn't tell a story because a colorless green idea lacked the characteristics necessary to function in a story. I've written some of this up.

    Steven Wolfram as an interesting and insightful article on ChatGPT and LLMs: What Is ChatGPT Doing … and Why Does It Work? There's nothing in there about in-context learning, but there are some very useful visualizations.

  12. Phil H said,

    February 18, 2023 @ 2:25 am

    @Ant C
    "What, all of it?"
    That's an excellent question! One that I hope greater linguists than me are addressing. In some ways, this question is related to Chomsky's poverty of the stimulus argument – if a certain kind of grammatical form is rare in English production, can a blank-slate system learn it? I have no idea whether GPT's grammatical competence varies depending on how common the grammatical form is in English (specifically, its training data).
    But that doesn't change the nature of my observation. No computer generator of text before GPT-3 could reliably produce relevant grammatical sentences across a reasonable range of inputs. (Someone cited Eliza above, but the limitations on those older systems were immense). GPT-3 and its offshoots can.

    "Does it _parse_ all of English's weird inversions and dependent clauses, etc?"
    ChatGPT can "parse" them if you ask it to. It can also produce them. Those are two different functions, and it can do both.

    "Most of the trees were made of wood. And so were the rest". "The mat sat on the cat."
    This is the big thing that has changed from GPT-2. GPT-2 made up sentences just like you suggest. GPT-3 doesn't (or does so so rarely that it's at a human level). That's what I mean when I say there's been a breakthrough.

    "is basically the fulfillment of what Zellig Harris was claiming back in the 1950s"
    I remember Geoff Pullum making similar arguments here, I think. I'm kinda looking forward to a bunch of triumphalist papers to come out pointing out how GPT is another nail in the coffin of the Chomsky's LAD… I dunno if it's as clear as that, but it feels like this must represent another step towards a final answer on that debate.

  13. DJL said,

    February 18, 2023 @ 4:37 am

    I don't see how LLMs have much to say about poverty-of-the-stimulus arguments and the like. First, LLMs haven't learned any 'natural language' – all they do is predict the next word given a string, and the only representation of "language" they have and use is the distributional properties of words and strings as calculated by machine learning methods, and on the basis of the human-generated text these models are fed. What any one person acquires when we talk about the acquisition of language is a sound-meaning mapping wherein phonological, syntactic, and semantic (at the very least) properties play an important at every stage of the process, often imposing specific interpretations on the input children receive (and often, too, over and above the statistical properties children are no doubt also sensitive to).

    Also, and contrary to common descriptions of POS arguments, the key observation is that the knowledge of language children have mastered by age 5 or so far exceeds the input they have been exposed to, and not the claim that it is not possible to "learn" any form of language from data alone (the interrelations between the two points notwithstanding, though). As pointed out above, LLMs may be more relevant to formal language theory and computational linguistics, where a "language" is often regarded as a set of well-formed strings (and the methods to generate these strings), but this may well be of rather limited importance to those linguists who actually study natural languages.

  14. AntC said,

    February 19, 2023 @ 1:34 am

    @DJL contrary to common descriptions of POS arguments, the key observation is that the knowledge of language children have mastered by age 5 or so far exceeds the input they have been exposed to,

    (not clear if you're espousing that "input they have been exposed to" claim. or merely reporting it.)

    Is this any more than the usual (specious) claim with induction as a scientific method that no matter how many times we've seen the sun come up in the morning, we can't be sure it will tomorrow?

    I rather thought that child language acquisition facilitators (linguist M.A.K.Halliday comes to mind, but any doting grandparent would do) had overwhelmingly demonstrated that kids up to age 5 are hugely _over_-stimulated with language repetition. Until they get to the point of 'Pop goes the weasel' — as Halliday puts it.

    I rather suspect LLMs are primed (via the assumptions behind their programmers' approach) to already be 'expecting' nouns vs verbs vs adjuncts. (Scare quotes because I'm certainly not ascribing mental states to them. But note that the terminology for parsing programming/formal languages is full of pseudo-mental states: 'look-ahead', 'anticipation', … )

  15. DJL said,

    February 19, 2023 @ 6:42 am

    As a doting parent (but not a grandparent), I can say that that stuff about constant repetition to children is just nonsense – it just doesn't happen all that often or consistently to make any difference to language acquisition, and it certainly isn't universal; and as a linguist, I can safely say that Halliday's work hardly constitutes any consensus in the field of language acquisition. The point is a simple one: what children know about language by age 5 (and this also includes what they understand, not only what they produce) goes beyond what they have heard in their lives by some margin.

    Nothing to with Hume's induction, by the way.

  16. wanda said,

    February 19, 2023 @ 2:37 pm

    "I rather thought that child language acquisition facilitators (linguist M.A.K.Halliday comes to mind, but any doting grandparent would do) had overwhelmingly demonstrated that kids up to age 5 are hugely _over_-stimulated with language repetition."
    I've read that this is cultural. The book _Anthropology of Childhood_ says that in many traditional cultures, people don't talk to babies because they can't talk back. They are treated more like things to be taken care of. (I'm sure this is influenced by the fact that there are always babies.) Obviously those kids still learn to talk, even without much speech directed at them.

  17. AntC said,

    February 19, 2023 @ 7:35 pm

    that stuff about constant repetition to children is just nonsense

    Then we'll just have to disagree — since I find your response disagreeable and "nonsense" disrespectful.

    All the kids I've had contact with (including observing my younger siblings' interaction with doting grandparents; and their childrens' interactions) have had buckets of repetition. Reaching the 'Pop goes the weasel' threshold would be a regular feature not a bug.

    @wanda I've read that this is cultural.

    I can report from a Taiwanese family I'm closely involved with — currently grandma regularly looks after two, often four, sometimes eight. They all get heaps of stimulation both from adults and siblings/cousins.

    The youngest, at 2 ½ years on my last visit was just figuring out there's two languages going on — English and Putonghua; they haven't yet grokked there's also Hoklo. Her older brother when at that age I lost count of the number of times I repeated the 'Round and round the garden' game.

    Perhaps kids in nuclear families get their repetition by continually re-running electronically their favourite cartoons and nursery rhymes?

    But of course I'm only providing (anecdotal) evidence — of the sort Chomsky eschews so violently.

  18. DJL said,

    February 20, 2023 @ 2:27 pm

    No need to be offended, but it really is not a matter of disagreeing – there isn't really any account of language acquisition out there, nativist or otherwise, that literally claims that children learn language through being exposed to repetition, repetition, repetition from adults (something that doesn't even happen all that much, anyway). Not any account that has a chance of being right, anyway (e.g., Halliday's).

  19. AntC said,

    February 20, 2023 @ 6:23 pm

    @DJL isn't really any account … exposed to repetition

    Google seems to think there is. (I concede many of those hits are non-academic/possibly from doting grandparents, or marketing from childcare organisations probably telling parents what they want to hear. But here's a few more academic treatments:

    Benefits of word repetition to infants, University of Maryland

    Fathers’ repetition of words is coupled with children’s vocabularies National Library of Medicine

    "it was found that repetitions played a significant role in the acquisition of new vocabulary " Journal of Psycholinguistic Research

    The Role of Repetition in TzeltalMax Planck Institute. (via wikipedia, Tzeltal, a Mayan language, has a notable verbal structure, hard-to-learn as a second language.)

    I appreciate we need a lot more evidence. I haven't unearthed any claim repetition does actual harm to language learning, or at least is neutral. But I think there's enough here to reject your "nonsense" or "[not] any real account".

  20. AntC said,

    February 20, 2023 @ 10:12 pm

    (via wikipedia …

    Correction, wikipedia's contribution (at 'Language Acquisition') is:

    Nonword repetition and word learning: The nature of the relationship University of York — one of my almae matres/Applied Psycholinguistics

    This seems to me to be tackling a somewhat different aspect: speaking (or indeed signing) is AMOT a motor skill — like walking upright, playing an instrument, shooting hoops … AFAIA, all training for motor skills necessitates huge amounts of repetition of a 'pointless' nature: walking (or initially staggering) from sofa to table and back; playing scales; shooting hoops continually from exactly the same spot on the court — or indeed not on any court; getting your laughing gear around strange sounds, just to reproduce and hear the result, being the case in point. (I note the profoundly deaf — despite achieving amazing mastery of language — never get to sound quite right. More probing 'experiments' would of course be out of the question ethically. So I don't see how any claim wrt the 'Poverty Of the Stimulus' can be drawn from observation.)

    Is the cognitive part of language learning[**]/as opposed to the motor skills somehow not susceptible of improvement by repetition? How is it, then, that kids get from one-word utterances to multi-word utterances (famously ungrammatical at first, from Halliday's papers) to grammatical whole sentences? Do they not hear 'want banana' corrected by the saintly grandparents to 'please give me a banana'? Do they not, indeed, fairly quickly get to grok 'want banana' isn't the mode juste[***], but repeat it anyway "because they know it teases"?

    [**] I feel at that phrase I — and indeed the whole debate — have sidetracked into metaphysics. Can we separate out the 'cognitive part' of walking or of playing Für Elise or of scoring from a rebound? We can maybe talk about interpretive or performative mastery of a skill. In my experience (I hike up mountains, play Beethoven Sonatas, but have never shooten hoops) repetition is still essential.

    [***] yes, a deliberate solecism

  21. DJL said,

    February 21, 2023 @ 4:28 am

    Apart from the Brown study on Tzeltal, all the others are about the benefits of repetition for word learning – fair enough, but we weren't talking about the learning of vocabulary, but about the acquisition of language overall (and, more specifically, of syntax, given that we were talking about LLMs).

  22. AntC said,

    February 22, 2023 @ 3:58 am

    the benefits of repetition for word learning – fair enough, …

    If you're allowing that repetition has benefits for word-learning, why wouldn't it have benefits for learning other features of a language?

    For languages with inflection (or rather more than English's), how could you learn a word without also learning its various forms? There could perhaps be a parallel study to that Tzeltal examining if 'baby language' suppresses inflections?

    For languages with triliteral roots, how could you learn a word without also noticing the variation in vowels?

    For languages with Sandhi, how could you learn a word without … similar parallels?

    And wouldn't you notice coincidences between sound-variations in one word-class appearing with another — that is, as soon as you get to multi-word utterances? Indeed wouldn't you use that as the criteria for distinguishing word-classes? "by age 5" was the metric you introduced.

    In short: I can find no criteria to separate your "word learning" vs "acquisition of language overall".

    I am not presuming some specialised 'Language Acquisition Device'. Merely a general cognitive capacity (exhibited in all learning — indeed over-exhibited) to see contrasts and coincidences/parallels.

    And that Tzeltal study is a superb example of the sort of close examination it needs to assess the claim of 'Poverty Of Stimulus'. I wanted to quote some key parts — but in the end would have quoted nearly all of it.

    syntax shmyntax. If you've grokked word-classes as part of word-learning; you're most of the way there (to what you need by age 5). I well remember when travelling in France getting asked by kids much older than 5 "quelle heure il-est?" — who of course only wanted to make fun of my terrible accent.

  23. DJL said,

    February 22, 2023 @ 6:12 am

    No criteria to separate word learning from the acquisition of language overall, in particular syntax? We are moving into the territory of speculation and anecdote once again. I can only repeat what I said before: none of this is the consensus in actual studies of language acquisition, and in that sense, I can only recommend you read a textbook on the matter. My preference is Guasti's Language Acquisition book, but others would work too.

  24. AntC said,

    February 22, 2023 @ 10:53 pm

    Guasti: The theoretical framework used is the generative theory of Universal Grammar; [from the publisher's blurb]

    Then of course I'm only going to get Poverty Of Stimulus propaganda.

    From wp on Lang acqu:

    … many criticisms of the basic assumptions of generative theory have been put forth by cognitive-functional linguists, who argue that language structure is created through language use.[30] These linguists argue that the concept of a language acquisition device (LAD) is unsupported by evolutionary anthropology, …

    — which would appear to be contra your claim of "consensus".

    I'm particularly intrigued what becomes of the LAD after age about 9~12

    By around age 12, language acquisition has typically been solidified, and it becomes more difficult to learn a language in the same way a native speaker would.

    The stimulus under-determines language (by the Chomsky account) just as much at age 12 as under 5.

    The same decline in ability to learn applies for other complex cognitive and motor-co-ordination skills like playing an instrument. Then we can account for the observed phenomena as a general cognitive/brain plasticity maturation, no need to make a special case for Language. Note the aptitude for purely cerebral skills (like mathematics) doesn't decline until well past 40 (contra the mis-impression you'd get from the Fields medal).

    Is there a text with a data-driven as opposed to pre-judiced/ideology-driven treatment?

RSS feed for comments on this post