Language Log

Noise and bias

September 4, 2021 @ 7:59 am · Filed by Mark Liberman under Linguistics in the comics

Today's SMBC:

The mouseover title: "All examples from the book Noise, by Kahneman, Sibony, and Sunstein, which I'm enjoying right now."

The aftercomic:

The blurb for the cited book:

Imagine that two doctors in the same city give different diagnoses to identical patients—or that two judges in the same courthouse give markedly different sentences to people who have committed the same crime. Suppose that different interviewers at the same firm make different decisions about indistinguishable job applicants—or that when a company is handling customer complaints, the resolution depends on who happens to answer the phone. Now imagine that the same doctor, the same judge, the same interviewer, or the same customer service agent makes different decisions depending on whether it is morning or afternoon, or Monday rather than Wednesday. These are examples of noise: variability in judgments that should be identical.

In Noise, Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein show the detrimental effects of noise in many fields, including medicine, law, economic forecasting, forensic science, bail, child protection, strategy, performance reviews, and personnel selection. Wherever there is judgment, there is noise. Yet, most of the time, individuals and organizations alike are unaware of it. They neglect noise. With a few simple remedies, people can reduce both noise and bias, and so make far better decisions.

Both the blurb and the cartoon seem to somewhat mis-state the book's premise. The introduction starts with an example of target-shooting results:

We call Team B biased because its shots are systematically off target. […]

We call Team C noisy because its shots are widely scattered. […]

Team D is both biased and noisy. […]

But this is not a book about target shooting. Our topic is human error. Bias and noise—systematic deviation and random scatter—are different components of error. […]

The shooting range is a metaphor for what can go wrong in human judgment, especially in the diverse decisions that people make on behalf of organizations. […] Many organizations, unfortunately, are afflicted by both bias and noise.

Figure 2 illustrates an important difference between bias and noise.

It shows what you would see at the shooting range if you were shown only the backs of the targets at which the teams were shooting, without any indication of the bull’s-eye they were aiming at. From the back of the target, you cannot tell whether Team A or Team B is closer to the bull’s-eye. But you can tell at a glance that Teams C and D are noisy and that Teams A and B are not. Indeed, you know just as much about scatter as you did in figure 1. A general property of noise is that you can recognize and measure it while knowing nothing about the target or bias. […]

The general property of noise just mentioned is essential for our purposes in this book, because many of our conclusions are drawn from judgments whose true answer is unknown or even unknowable. […] We don’t need to know who is right to measure how much the judgments of the same case vary. All we have to do to measure noise is look at the back of the target.

To understand error in judgment, we must understand both bias and noise. Sometimes, as we will see, noise is the more important problem. But in public conversations about human error and in organizations all over the world, noise is rarely recognized. Bias is the star of the show. Noise is a bit player, usually offstage. The topic of bias has been discussed in thousands of scientific articles and dozens of popular books, few of which even mention the issue of noise. This book is our attempt to redress the balance.

And as the authors suggest might happen, both the publisher's blurb and the SMBC comic make bias "the star of the show".

For an application to linguistic analysis, see e.g. this presentation from a 2008 workshop on "Animacy and Information Status Annotation"…

September 4, 2021 @ 7:59 am · Filed by Mark Liberman under Linguistics in the comics

Permalink

16 Comments

J.W. Brewer said,

September 4, 2021 @ 10:06 am

"Noise" is being used as a pejorative, and treats the phenomenon of variation in individual judgments as by definition a problem in need of a solution. A profession dominated at a given moment in time by groupthink or a herd mentality (and/or stricter regulatory constraints on individual discretion) will by definition be less "noisy" because everyone's individual judgments will tend to cluster more tightly together. Is this better or worse than a "noisier" alternative? Is that a question with a generalizable answer or does it all depend on the other circumstances.

You can read online for free the first chapter of the book, which treats the movement to reduce variability in federal criminal sentencing that led to the enactment of the Sentencing Guidelines in the late Eighties as a heroic story of thoughtful technocratic reform and the subsequent demise of the strict Guidelines regime in the new millennium as a mysterious catastrophe. Who were these ungrateful noise enthusiasts who rejected the benefits of technocratic reform? This is not the way most people who lived through that era as a participant or even informed bystander would tell that story. If anything, the same story can be told as a historical narrative about the arrogance of rationalist technocracy and the inevitability of unintended consequences.

[(myl) Can you suggest a link to the "historical narrative about the arrogance of rationalist technocracy and the inevitability of unintended consequences"? ]
AntC said,

September 4, 2021 @ 4:49 pm

internet as training data … two judges

As someone looking on to American politics from the outside over the past few years, what still stuns me is the number of judges/lawyers who seem to be utterly unconstrained by any sort of legal 'standards'/adherence to facts and evidence. (Judge) Jeanine Pirro, Rudy Giuliani, (Attorney/Prosecutor) Sidney Powell, Michael Cohen, (Sheriff/Alabama Supreme Court Judge) Roy Moore, Michael Avenatti, …

I don't understand how these people were ever allowed to be legal officers. (Several of them have been subsequently disbarred, but only after doing huge amounts of damage to the standing of the legal profession.)

And of course police corruption and racism. I can't tell: is there reason in USA for anybody to respect the law?
DaveK said,

September 4, 2021 @ 6:12 pm

@JWBrewer:
Groupthink and consistency in judgment are exactly what are desirable in judges. Litigants are not like baseball players who over the course of a season face umpires with varying ideas of a strike zone. So long as the umpires are impartial, it all evens out.
But a litigant or criminal defendant may face a trial once in their life and their fate shouldn’t depend on how lucky they are in drawing a favorable or unfavorable judge.
Seth said,

September 4, 2021 @ 9:07 pm

@DaveK What happened then is "how lucky they are in drawing a favorable or unfavorable judge" got replaced by "how lucky they are in drawing a favorable or unfavorable prosecutor". Somewhere in the system, a calculation needs to made regarding something like "seriousness". That didn't go away. The result of "sentencing reform" was it was just shifted into whether the prosecutor was willing to drop or reduce charges, versus piling on.

Maybe the book excerpts above are oversimplification for popular consumption. But many people involved in the judicial system are highly aware of sentencing variability, to put it mildly. It is an extremely complex problem, not amendable to poplit One Weird Trick type fixes.
The Other Mark P said,

September 4, 2021 @ 9:49 pm

(Judge) Jeanine Pirro, Rudy Giuliani, (Attorney/Prosecutor) Sidney Powell, Michael Cohen, (Sheriff/Alabama Supreme Court Judge) Roy Moore, Michael Avenatti,

An interesting selection in an article on bias and noise. These people having more in common than just their legal professions.
Andrew Usher said,

September 4, 2021 @ 10:25 pm

That's interesting, isn't it – ask someone to produce a list of 'biased' people, and usually he will mainly give evidence of his own biases.

Seth:

Engineering problems can be 'extremely complex'. Social problems really can't, especially since a solution doesn't have to work perfectly to be clearly preferable. You're correct that judicial discretion is no worse than prosecutorial direction, but that's no reason to essentially say that nothing can be done. Clearly, the problem is that prosecutors have too much latitude in our system in determining charges for a given crime, and judges have no effective way of countering that power and how it effects sentences (Nor, obviously, can juries).

It's not hard to think of possible improvements in that matter. Yes, actually getting one implemented is much harder but that doesn't make the underlying issue deserve to be called 'complex'.

k_over_hbarc at yahoo.com
Peter Grubtal said,

September 5, 2021 @ 4:28 am

There may be many things which influence the scatter. Two might be plea-bargaining, and an early guilty plea, which saves everyone a lot of bother and attracts a lighter sentence.
AntC said,

September 5, 2021 @ 4:34 am

ask someone to produce a list of 'biased' people, …

Nobody asked me; and I named those people because they're legal officers who seem to be unaware of the rules of evidence. I didn't name them on the basis of being biased. What 'bias' do you think they exhibit? Or that I was suggesting they exhibit?

These people having more in common than just their legal professions.

Yes: they seem unaware of the rules of evidence. I'm curious to know what further 'bias' Michael Avenatti has in common with the others.

But since I'm looking on from afar and you(s) seem more familiar with the situation on the ground, you're presumably in a position to name other legal officers unaware of the rules of evidence — other, that is, than those in the swirl around the former President.

It's the comic and the cited book that hold up judges as supposed paragons of non-bias. I'll ask again: is there reason in USA to regard the legal profession as such a paragon?

Perhaps the book should avoid drawing its examples from the law?
David Marjanović said,

September 5, 2021 @ 5:22 am

It's the comic and the cited book that hold up judges as supposed paragons of non-bias.

Do they? I can't see that in the comic and haven't read the book.

However, I got the shock of my life when I saw some coverage of the electoral campaign of 2000 on TV. There was a banner "[NAME] – REPUBLICAN FOR JUDGE".

In other words, "make me a judge because I will not be impartial".

I forgot the name. I'm not going to forget the rest of this demonstration of why no other democratic country seems to elect its judges.
Timothy George Rowe said,

September 5, 2021 @ 8:14 am

I think the blurb and the cartoon accurately represent the premise of the book. Well, mainly. Time of day, local sports results, etc, are all sources of noise, not bias. It's only the "Plus racism" and the aftercomic that relate to bias.
Rose Eneri said,

September 5, 2021 @ 9:02 am

Aren't most children aware of the bias and noise in their parents' decisions? Which parent to ask permission for which thing/action, when to ask, and how to ask are assessment skills most kids learn early on.

Court systems are run by people, who are inherently flawed, thereby making the court system itself flawed. We enact laws and procedures to try to counter the problems, but we can never eliminate them.
bks said,

September 5, 2021 @ 9:35 am

It's now nearly 40 years since Terry Winograd asked me if I'd rather have an algorithm or a person decide whether I was eligible for a loan. I'm still unsure.
Rodger C said,

September 5, 2021 @ 9:46 am

is there reason in USA for anybody to respect the law?

Well, there you are.
Andrew Usher said,

September 5, 2021 @ 1:06 pm

No doubt the system will never be perfect: as I stated, solving social problems does not depend on perfection, or it would truly be hopeless. And yes, electing judges (or prosecutors) is the wrong thing, and embarrassing really, but could be said to be just the principle of democracy in action.

Sentencing disparities – whether bias or noise – are not the biggest problem we hav e even related to the justice system. But that's what what being discussed, and therefore what I replied to first.

AntC: Obviously you knew my statement applied to you, even though you were never 'asked' – the gist is still the same. There's no point debating anyone with the ideas you display. Instead I can only show astonishment that there are sane people (not even American, in this case) that still believe in Trump as their Devil after he's gone.
J.W. Brewer said,

September 5, 2021 @ 2:19 pm

A few further points, hopefully without turning this into a treatise-length discussion.

1. In response to myl's query, there's a voluminous literature, but I don't have a specific cite to hand that I know both tells the story the way I said it could be told and implements that approach in a way where I would agree with the details.

2. That said, one of the key things that went awry from the start is that the federal Sentencing Guidelines were initially marketed as primarily a noise-reduction thing — I remember attending a talk circa 1988 by not-yet-Justice Breyer, who was a big player in the project, where he emphasized that they weren't trying to figure out from first principles what a theoretical optimal sentence for an offense with such-and-such characteristics would be, they were just trying to gather data about the range of actual practice and construct formulae that would approximate the average that range clustered around. But that's not how it worked in practice. Almost from the very beginning, judges were much more likely to want to depart down from the calculated Guidelines sentencing range rather than depart up and that pattern has continued in the post-2005 regime where judges have much more freedom to depart.

3. This meant that the "noise reduction" marketing could not be separated out from the reality of a "bias" in favor of longer sentences than a genuine noise-reduction approach would have yielded from the average of prior practice, and meant that regardless of the abstract desirability of noise-reduction from the standpoint of potential criminal defendants who did not yet know which judge they would draw, in reality everyone who took a pro-defendant perspective rapidly concluded that constrained judicial discretion ("less noise") was generally bad for them and less constrained judicial discretion ("more noise") would be better for them.

4. One reason for this is that the Guidelines kicked in in the lateish Eighties at just the same historical point in time (perhaps not anticipated when the technocratic reformist effort had initially gotten underway in the Seventies) when Congress was getting in the habit of passing more and more draconian sentencing laws, with mandatory minimums etc in order to reflect their unhappiness with the sentencing status quo. The Guidelines regime ended up treating this external input (from a source typically unhappy with average prior judicial practice) as functionally identical with the calculated-average-prior-practice input as if it could create a harmonious looking blend. This was IMHO an unhelpful thing to do, rather than have a formula that first gave you what would at least look like a good-faith average of prior practice and then tell you at the next step in the flowchart whether that had or hadn't been overridden by external statutory input.

5. The Guidelines regime also imho fell prey to the classic modeling issue of "you manage what you can measure" and thus ended up giving more weight in the formula's output to seemingly objective/quantitative inputs (exactly how much cash did the bank robbers get away with? exactly how many grams of heroin got sold to the undercover cop?) and less weight to "softer" and more qualitative factors still considered typically quite salient by typical judges, like the extent to which the particular defendant was a major player rather than a bit player in the particular situation (that was a dimension that was measured as an input to the formula, but the total swing from the high point to the low point on that dimension was not as dramatic in terms of the impact on the formula's output as some of the seemingly-more-objective dimensions).

6. Most people are happy enough with the current system where you have to calculate the Guidelines range first before deciding whether or not to ignore the recommendation that there is not sufficient political will to try again and come up with a revamped edition that would genuinely approximate an averaging out of current actual practice.

7. However, an old friend of mine has taken advantage of that gap and recently constructed a proprietary database based on current actual practice and is marketing access to it (together with statistical analysis) to defense counsel interested in making empirically-based arguments rooted in current average actual practice of judges across the nation as to why their client should get such-and-such below-Guidelines sentence. Feel free to tell any friends of yours who represent federal criminal defendants to check out Mike's project at http://www.empiricaljustice.com.
KeithB said,

September 7, 2021 @ 8:30 am

@David:
Charlie Pierce calls elected judgeships the "second worst idea in American politics", a balanced budget amendment being the first.

RSS feed for comments on this post

Noise and bias

16 Comments

J.W. Brewer said,

AntC said,

DaveK said,

Seth said,

The Other Mark P said,

Andrew Usher said,

Peter Grubtal said,

AntC said,

David Marjanović said,

Timothy George Rowe said,

Rose Eneri said,

bks said,

Rodger C said,

Andrew Usher said,

J.W. Brewer said,

KeithB said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta