Language Log

Stochastic parrots

June 10, 2021 @ 2:07 pm · Filed by Mark Liberman under Computational linguistics

Long, but worth reading — Tom Simonite, "What Really Happened When Google Ousted Timnit Gebru", Wired 6/8/2021.

The crux of the story is this paper, which is now available on the ACM's website: Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?🦜" In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610-623. 2021.

As a result of a (somewhat strange) review process, described at length in the Wired article, Timnit Gebru and Margaret Mitchell were fired (or declared to have resigned) from their leadership roles in Google's Ethical AI group.

The most interesting part of the story seems to be how unnecessary all the turmoil was. Read the ACM article, and I think you'll see that it should not really have ruffled any feathers. From the Wired article:

Gebru sent a message to Emily M. Bender, a professor of linguistics at the University of Washington, to ask if she had written anything about the ethical questions raised by these new language models. Bender had not, and the pair decided to collaborate. Bender brought in a grad student, and Gebru looped in Mitchell and three other members of her Google team.

The resulting paper was titled “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” The whimsical title styled the software as a statistical mimic that, like a real parrot, doesn’t know the implications of the bad language it repeats.

The paper was not intended to be a bombshell. The authors did not present new experimental results. Instead, they cited previous studies about ethical questions raised by large language models, including about the energy consumed by the tens or even thousands of powerful processors required when training such software, and the challenges of documenting potential biases in the vast data sets they were made with. BERT, Google’s system, was mentioned more than a dozen times, but so was OpenAI’s GPT-3. Mitchell considered the project worthwhile but figured it would come across as boring. An academic who saw the paper after it was submitted for publication found the document “middle of the road.”

Plenty of people inside Google knew about the paper early on, including [Jeff] Dean. In October, he wrote in a glowing annual review that Gebru should work with other teams on developing techniques to make machine-learning software for language processing “consistent with our AI Principles.” In her reply, she told him about the paper she was drafting with Bender and others. Dean wrote back: “Definitely not my area of expertise, but would definitely learn from reading it.” […]

It’s not clear exactly who decided that Gebru’s paper had to be quashed or for what reason. Nor is it clear why her resistance—predictable as it was—prompted a snap decision to eject her, despite the clear risk of public fallout.

The Wired article was apparently sourced mainly from Gebru and Mitchell's side of the story, so there may be more to say about all this. But the ACM article at the center of the controversy seems so bland and uncontroversial that it's hard to see why approving its publication would have been problematic for Google in any meaningful way. So the resulting mishegas strikes me as at best an own goal, and at worst a sign of some significant internal problems.

June 10, 2021 @ 2:07 pm · Filed by Mark Liberman under Computational linguistics

Permalink

18 Comments

Seth said,

June 11, 2021 @ 2:31 am

Ooooh … I have no inside information, but I've read more than one account of this incident. I don't want to write an article of my own, especially in a comment. But this is a case where, if you don't understand the argument consuming the sides, accept that you simply don't understand the argument, rather than mistakenly incorrectly believing there's no argument. The phrase "significant internal problems" does not do it justice.

The Wired article does nod to the large number of people who think their technical objections are met by accusations of engaging in, quote, "a toxic cocktail of racism, sexism, and censorship". But it's done in a very distant tone, without much personalizing sympathy for anyone on the other side of a dispute who is terrified of being the target of a social media hate mob, or worse – "One reason managers were not more open in explaining their feedback to Gebru, according to Google, was that they feared she would spread it around inside the company.". That's a very mild way of describing worries over getting personally attacked with all sorts of direct accusations of those supposed "sexist and racist tendencies".

This is all very complicated and high stakes. You're basically looking at the last argument straw, and asking why such a tiny little thing broke such a huge camel's back.
AG said,

June 11, 2021 @ 7:16 am

Seth –

I am trying to understand the logic of what you're implying.

1) You believe that Google managers and a large number of people had technical objections to the paper (i.e., they had substantial reasons to honestly believe the paper was seriously flawed).

2) They proceeded to take action in this matter in a very strange and occult way, entirely because they were genuinely terrified of being accused of "racism, sexism, or censorship".

3) The unusual way they proceeded produced a result which to nearly all outside observers appears consistent with racism, sexism, and censorship.

4) You believe that these people are owed more personalizing sympathy and a more empathetic tone when their side of the dispute or argument is described.

Do I have that right?

If so, might I ask for your thoughts on why none of this large number of people have shared these legitimate technical objections? Seems like that would be a simple solution to all of this!
Seth said,

June 11, 2021 @ 9:19 am

@AG – No, no, no. The idea is to understand this incident about the paper is part of a long and highly contentious overall political dispute.

The problem is not about the paper in terms of its content being any sort of blockbuster, though it is argued over. People have indeed shared their objections after the fact. Almost nobody cares about the specific technical dispute, it's a pretty typical argument over the limits of AI. Rather, it's more about – what is acceptable in handling such disputes? As she put it on her Twitter feed:

"11/ But there you have it because of this

"Gebru's demeanor could make some people shy away from her or avoid
certain technical topics for fear of being pulled into arguments about
race and gender politics"

Translation…"

"12/ I've been in an environment molded for me, building harmful
products, doing harmful research without having the slightest
discomfort ever in a place that Black women who don't take all that
shit can't survive in for even a couple of years."

Pick your terrorist/freedom-fighter side, of 1) Righteously called out for structural racism and entrenched misogyny, versus 2) Having race and gender abuse vituperatively heaped on you and being expected to grovel.

It is false to claim "nearly all outside observers appears consistent with racism, sexism, and censorship". It's also consistent to many observers of someone who was trying to bully and intimidate their way through technical arguments by hurling accusations. And yes, I feel more personalizing sympathy should be devoted to people on the end of this aspect – that many writers ignore or trivialize the consequences of abusive accusations of racism, sexism, etc especially in a corporate context. This isn't about mere name-calling. Nobody wants to be "collateral damage" in a lawsuit (as in, e.g. have to hire a lawyer and worry about the cost and mobbing if their technical review gets cited as part of some boilerplate regarding a "hostile environment").
J.W. Brewer said,

June 11, 2021 @ 9:42 am

I am confused by why Margaret Mitchell would have suffered adverse consequences from any controversy about a paper co-authored by Shmargaret Shmitchell.

The whole controversy tends to confirm that no one working at a company like this, on any side of any given squabble, is likely to be morally admirable, because of how extraordinarily powerful and corrupting the ego/money/power incentives for bad behavior are.
AG said,

June 11, 2021 @ 10:13 am

Seth –

You're expressing yourself quite diplomatically and politely, but reading between the lines I once again feel like I can't follow the logic of what you're saying, so please don't take offense at this, but I'd like to recast what I think you're saying once again from how I read what you've written.

As I understand it, your viewpoint is that one black woman had the entire power structure at Google – by most standards some of the most powerful human beings on Earth – terrified of her "bullying", and unmanned by the possibility that they might have to "grovel" before her if she accused them of racism.

This, according to you, was the state of affairs when she wrote a paper claiming that it was theoretically possible for AI to be racist.

This was the final straw, and the – non-racist, innocent, and bullied corporation – cast out this bully, as they could no longer tolerate her reign of terror.

In this scenario, the facts that she was black, that her research was (coincidentally) about racism, and that the people who she "bullied" and who fired her were (I'm just assuming here) mostly white men were all irrelevant.

The only relevant fact was that she threatened them unfairly, for they are not racist and must not be bullied, and so she had to be expelled for this behavior.

None of this was racist and, in fact, if someone suggested it was, they would also of course be "bullying" and would need to be, in turn, expelled.

Anything you object to in that characterization?
William Berry said,

June 11, 2021 @ 10:36 am

@AG:

Excuse me for butting in, but, in your last comment, are you sure you don’t have Seth’s argument exactly backward?

Maybe you’re not clear on the agency (who is saying what and to whom) in this passage;

As she put it on her Twitter feed:

"11/ But there you have it because of this

"Gebru's demeanor could make some people shy away from her or avoid
certain technical topics for fear of being pulled into arguments about
race and gender politics"

Translation…"

"12/ I've been in an environment molded for me, building harmful
products, doing harmful research without having the slightest
discomfort ever in a place that Black women who don't take all that
shit can't survive in for even a couple of years."
William Berry said,

June 11, 2021 @ 10:49 am

On second thought, maybe not.

Reading Gebru’s tweet, I find her translation compelling. AG seems, ever-so-politely, to see it as “bullying” If I have that right, their position is not one I want to be identified with.

F*** a bunch of privileged, tech douchebro snowflakes.
Seth said,

June 11, 2021 @ 1:22 pm

AG, I'll continue to reply politely, but I think you're showing the problem. People are not their identities. While of course various identities are important overall in life, at the same time, they are not everything either. A junior engineer is not "the entire power structure at Google". Calling someone "one black woman" omits their status of being in an extremely powerful position in the company. This sort of sleight-of-hand in describing conflicts between people of roughly comparable power overall is very misleading. It's entirely and completely possible for a person of any identity whatsoever to get caught up in various heated social conflicts and get hurt. This should be a prosaic idea. If it isn't, well, that's basically the issue.

Let me try to illustrate the debate in a very very simplified form:

"AI will be intrinsically racist and sexist, because of technical X, Y, Z …"
"I don't think so – I believe we can fix via technical A, B, C …"
"You can't see why you're wrong because your privilege blinds you …"
"Please don't be rude, I'm just saying I think you're wrong about the
technical X, Y, Z and it can be addressed via technical A, B, C …"
"Now you're tone policing me! This is deep disrespect to me as a …"
"What? Look, X, Y, Z is not limiting due to A, B, C …"
"But you don't see why that's not a fix because you're a …"

And imagine this gets nastier (even to mobbing, legal complaints, etc). How nasty does it have to be before it becomes unacceptable professional conduct? Never? (for AI/sexist/racist side). Is there truly no limit? Can you see any reasonable objection there?
Ed said,

June 11, 2021 @ 5:45 pm

Seth, if Team Dean has come up with ways to eliminate racial and sexual bias, where are their publications on the matter? The place to discuss your "technical A, B, C …" is in the literature, just as much as "technical X, Y, Z …" is.

Let's take race and gender out of the equation for a moment (though ultimately it definitely belongs there).

I've worked in Silicon Valley for more than two decades, at some of the bigger names in the industry. In my experience, the tendency to form silos of expertise with rigid internal and external hierarchies is astounding, a tendency that gets stronger the larger the company grows. This makes such groups highly sensitive to outside criticism, especially when such criticism involves factors addressing the composition of those hierarchies. Things aren't helped by the commercial pressures on what is, outside of certain technical bounds, highly proprietary technology.

This is hardly the first time such internal pressures came to a boil within Google. Take social media, an area that should have been a natural segment for Google. Back in the late '00s as Facebook was making its big break into social media (100 million to half a billion members in a bit over a year) much of their design and infrastructure was built by disaffected Googlers who felt shut out of Google's failing Social Media efforts and left en masse.

It's true that the particular nature of the current dispute makes it of significantly greater public interest. I personally feel it should. But Google's ham-handed handling of the situation point to an institutional rigidity that makes them incapable of dealing with the situation constructively. Although there are many people there who could help them navigate this issue, they won't find their way to a place where they can until Google's self-vaunted "corporate culture" is subject to open self-reflection.
Bathrobe said,

June 11, 2021 @ 5:45 pm

Reading Seth’s comments, I sensed two points:

The Wired article gave only one side. There is another side that should be taken into account.

Fighting within organizations is often as much about personalities as about policies.

We should at least try to understand the other side, even if we don’t agree with it, and we should make allowance for the possibility that Gebru alienated a lot of people with her approach. Which is not to deny that she had a point.
AG said,

June 11, 2021 @ 7:27 pm

(excuse typos, typed this on my phone)

Seth – I see what you're saying, but I feel like your scenario as you just scripted it out is setting up a "male engineers are Vulcans, Black women are hysterical" dynamic that I don't think is the case here and in fact captures my point. Here's more what I see as having happened in this "debate":

side 1: feels threatened at work because they subjectively sense (based on some evidence) the other side is unfairly attacking them. responds by overreacting in a way that could get the other side fired.

side 1: feels threatened at work because they subjectively sense (based on some evidence) the other side is unfairly attacking them. responds by overreacting in a way that could get the other side fired.

your mistake, if you’ll forgive the term and in my opinion, is in predominantly characterizing side 1’s overreactions as being motivated by “technical” reasons, while describing side 2’s actions as if they are wild accusations. both sides are making emotion-based decisions and putting forth claims that mix fact and emotion. neither side here should be treated as more “technical” or “objective”.

in fact, and this is what i was trying to point out in my first post, i think the situation seems to be actually the reverse of how you’ve presented it. side 2 (gebru) has plenty of “technical” proof that both google’s human employees and its products are in fact actually not as objective or logical or fair as they claim to be.

google’s managers, on the other hand, have NOT presented compelling “technical” proof that either gebru or her accusations were or are wrong. hurtful, yes, scary, yes, possibly threatening their comfort at work, yes, but that’s not refuting them factually.

it seems a very noteworthy coincidence that the side with more males and more institutional power has been consistently characterized by you as having “technical” objections to the other side’s conduct, and the minority female’s objections to the other side’s conduct are painted as more emotional, more based on desire for power and vengeance (making people “grovel”, “bullying”) and less factual, when based on what we know the opposite is probably be the case (i.e., gebru can coherently back up most of her research and accusations, but google really can’t explain why they fired her).

in spite of your lack of respect for “identity politics”, i genuinely cannot explain this coincidence in a way that does not involve issues of unconscious bias, entrenched power and, yes, possible sexism and racism.

(not that you personally have any faults in those areas, of course, but that the social situation we’re in has demonstrably produced this strange result.)
AG said,

June 11, 2021 @ 7:29 pm

(obviously i meant to present "side 1" and an identical "side 2", apologies)
Seth said,

June 12, 2021 @ 6:33 am

Ed, exactly, there's a whole AI argument about this, with a literature on both sides. The procedural objection which started this incident (which again, was didn't come out of nowhere, but part of a long ongoing dispute), was basically claiming that the Parrots paper wasn't willing to engage sufficiently with counter-arguments from the other side. Then it escalated from worry of some specific internal objectors that they'd be painting a target on themselves for various sorts of vicious personal retaliation. And I've been arguing that in this situation, it was a well-grounded fear.

AG, this is what I mean by the terrorist/freedom-fighter question. I think there's a significant distinction where the concerns of internal Google critics of the paper were primarily, first-order, a technical point being hindered by a political one, both sides are not being equally technical there. Importantly, going down the route of calling it all politics makes everything a matter of identity. I know that claim, but the implications end up morally justifying all sorts of intimidation in the supposed service of social good. And hence I also think you are unjustly minimizing as "comfort", well-founded concerns about being dragged through an HR process or potentially being part of a lawsuit. Those are not technical refutations!

What do you mean, "noteworthy coincidence"? I think all humans, of any identity, can in some circumstances fight dirty by trying to buttress a weak technical argument with political mudslinging against opponents, or worse. It happens all the time with various hot issues for all sorts of people. But if that's the human condition, we can't decide any specific case by just making some sort of global overall identity calculation.

Note, I also think you're reasoning from certain procedural misunderstandings relating to legal imperatives. For example, when you say "google really can't explain why they fired her" – to me, obviously anything Google says is just going to go into litigation, thus they'll say as little as possible. But you can mark that up to defending "institutional power". And so on.

Now, I'm not offended by what you said. But I'd seriously like you to address the issue of how I can practically maintain that you are wrong. It's extremely easy to talk of "unconscious bias, entrenched power and, yes, possible sexism and racism." That sets up an unfalsifiable system where we resolve technical disputes by looking to group identities. Do you see the problem here? Do you see why some Google engineers would be very, very, personally concerned about being reviewers of that paper? I ask you, again – how is any engineer of a problematic identity supposed to legitimately dispute an assertion of motivation by racism, sexism, etc? This is why I used "grovel" – that any response other than abject agreement and abasement can get fed into this algorithm of deeming it a manifestation of personal prejudice. I want to be clear, I'm not at all denying there's massive amounts of racism, sexism, etc in society. But there has to be a way of not having that situation be some sort of instant win in all related arguments.

Our debate here is not a particularly heated exchange as these things go. We aren't working together, it's not a social media mob, you aren't going to report me to Human Resources, there isn't a lawsuit brewing. But we have very quickly come to disagreement which could blow up in a slightly different situation.

Suppose this did turn into an angry argument, in a corporate context. Do we need to have an HR person in every meeting as a referee for every remark, calling fouls and red cards? Too many articles simply do not give a fair presentation of the problem here.
Daniel Johnson said,

June 12, 2021 @ 9:13 am

I’ve worked as a corporate research drone all my working life, and there are some basic power dynamics of white collar companies that don’t seem to be understood outside of corporations and seem to be understood only through posting Dilbert cartoons inside corporations.

A white collar corporation organizes authority over its workers vertically by the management hierarchy, and coordinates responsibilties between workers horizontally by its business processes. But it is important to remember that 1) the management hierarchy is not anything like the hierarchy of a legal court system, and 2) processes are only enforced indirectly through performance reviews (with the exception of issues related to legal issues such as drug use on the job, accounting fraud, discrimination…)

When disagreements arise between workers that didn’t get resolved by the process (which is easy- all it takes is for one worker to continue to express their disagreement), the only forum for resolution is through whoever their common manager is. If one worker is in Product Development, and the other is in Product Marketing/Management, that common manager is likely at least a vice-president in the overall Product Division. At any level below the common manager, the only resolution is through informal bargaining between two managers in the two separate chains of authority lying above the disputants.

But there is a scoping issue here. If both disputants report to the same manager, that manager will be interested in the facts of the issue. But any degree of level separation between a manager and a disputant means that the assigned responsibility of the manager is to maintain the business processes the worker is embedded in, not to support the worker. So they only have visibility and responsibility for the processes involved, not for determining the facts of the case. They are not a court and this is not an appeals or mediation process.

This disagreement likely came from a Product Manager. And their boss was likely already a VP. So now you have a double whammy in that the VP will be interested in the facts of the case for the product manager, but only cares about the business processes affected by the researcher – initially publication review and then the use of internal communications. The fact that the original approval process was followed doesn’t trump the fact that one of the VP’s immediate reports is displeased.

So conclusions-
1) even though she had a title citing responsibility for issues involving racism, there is no sign that there was any process or policies in place giving her any actual authority to effect change, other than the authority granted by her immediate management themselves. She had to meet and persuade individual developers in order to affect product development. I’ve been there. It can be rewarding short-term, but fails as soon as somebody higher up in the product organization develops a grudge.
2) There was a grudge, likely by a Product Manager who got defensive about anything that might impune their natural language AI product. Were they rascist? Likely, since they valued product reputation above the underlying rascism issues.
3) Once a blocking grudge occurs, there is very little that can be done to fight it unless there is someone high enough (in this case, whoever sits above Research, Product Development, and Product Marketing, which is probably executive suite for any siloed corporation) who personally decides to champion that issue.

It’s not fair, it’s not right, and I never found any way to change it, except through outside pressure.
R. Fenwick said,

June 12, 2021 @ 11:04 am

@Seth: And yes, I feel more personalizing sympathy should be devoted to people on the end of this aspect – that many writers ignore or trivialize the consequences of abusive accusations of racism, sexism, etc especially in a corporate context.

Does evidence exist supporting the idea that such accusations are particularly widespread, especially when viewed with, on the other hand, the well-demonstrated and quite stupendous extent to which open bigotry actually is epidemic in the corporate arena?

But I'd seriously like you to address the issue of how I can practically maintain that you are wrong. It's extremely easy to talk of "unconscious bias, entrenched power and, yes, possible sexism and racism." That sets up an unfalsifiable system where we resolve technical disputes by looking to group identities.

No, what it does is introduces new and alternate sources of light under which the evidence may be examined. Nobody anywhere is talking about resolving technical disputes by whose team one is on; that's a strawman. What people are suggesting is that disputes must be examined also in the light of power dynamics between the parties of the dispute, and race and gender privileges are extremely plausible contributors to the power imbalances in such disputes – especially when the dispute itself, as in this case, hinges exactly on a topic embroiled in issues of race and gender privilege.

But it's done in a very distant tone, without much personalizing sympathy for anyone on the other side of a dispute who is terrified of being the target of a social media hate mob, or worse

Because to do so is called false equivalency and it's a well-known logical fallacy. When the available evidence leans strongly in one direction (and the other side of the dispute refuses to talk), playing the "good people on both sides" card is giving the other side undue benefit of the doubt.
Seth said,

June 13, 2021 @ 3:55 am

Yes, there's a very long history, which is why I keep making the point that this particular incident can't effectively be viewed as if it just happened, with no background. It's certainly enough to make a reviewer worried over what could happen to them for being deemed on the wrong side of such a contentious issue. That should be clear.

I keep saying, people aren't their group identities. The repeated idea of judging an individual concern entirely by making some sort of collective identity assessment is exactly why there's a bad problem here.

I don't think you've fairly addressed my question, in that "examined also in the light of power dynamics between the parties" doesn't say what a specific person should DO in a situation where they are making a technical point and are accused of "unconscious bias", etc (n.b, they can have less power personally than those on the other side of the conflict!). What do they say in response, then, there? How can they legitimately disagree? (in your view, CAN they even legitimately disagree? – if not, do you see what might concern such a person?). And keep in mind then the risk of mobbing, HR complaint, being part of lawsuit, etc. Again, that there's undeniably open bigotry elsewhere is not a response to the dilemma of such a person.

The full available evidence actually leans very much in favor of Google here on a strict employee conduct basis. People don't want to look at it this way, but commonly any employee of any race or gender who threatened to leave over a paper reviewing dispute would have that called a resignation accepted immediately. The reaction to this is to try to paint it all as intolerable, by selective reporting and dancing around the ultimatum. But I believe both sides essentially agree on the substance of what was said, and the difference is over how it should have been handled.
Philip Taylor said,

June 13, 2021 @ 8:24 am

I have to say, it has been a great pleasure to see both sides of what is without doubt a complex case being argued in such a constructive and non-confrontational manner. If only the case itself could have been handled by Google in just such a manner, the whole matter might have been brought to a conclusion satisfying all parties without it ever reaching the attention of Language Log.
Rose Eneri said,

June 22, 2021 @ 11:41 am

I'm very late to the discussion, but I did read the whole post and the referenced paper. I have no access to the Wired article.

In the post, Dr. Liberman says, "The most interesting part of the story seems to be how unnecessary all the turmoil was." I agree that Ms. Gebru's paper contained nothing controversial, or even especially noteworthy. The turmoil arose because Google employees felt unsafe in expressing valid criticisms about the paper for fear of being called racist or sexist.

From my reading, I gather that Ms. Gebru's paper was discussed within Google and some employees believed that the paper should have, but did not, include discussion of solutions that had already been introduced to address some of the issues pointed out by Ms. Gebru.

Apparently, one of Ms. Gebru's demands in her letter of threatened resignation was for Google to "name names" regarding who was criticizing her paper. Rather than expose its employees to unfounded accusations, Google accepted Ms. Gebru's offer of resignation.

I think that the quashing of valid critique of research for fear of unfounded accusations of racism or sexism is turmoil-worthy.

RSS feed for comments on this post

Stochastic parrots

18 Comments

Seth said,

AG said,

Seth said,

J.W. Brewer said,

AG said,

William Berry said,

William Berry said,

Seth said,

Ed said,

Bathrobe said,

AG said,

AG said,

Seth said,

Daniel Johnson said,

R. Fenwick said,

Seth said,

Philip Taylor said,

Rose Eneri said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta