Draft words

« previous post | next post »

Reuben Fischer-Baum, Aaron Gordon, and Billy Haisley, "Which Words Are Used To Describe White And Black NFL Prospects?", Deadspin 5/8/2014

Do NFL scouts talk about white players and black players differently? Are certain words reserved for white players? Are others used primarily to describe black players?

Let's try and find out. We've pulled the text from pre-draft scouting reports from NFL.com (written by the infamous Nolan Nawrocki), CBS, and ESPN, split them by player race, counted the number of times individual words appeared using the Voyant tool, and then calculated the rate at which each word appeared per 10,000 words. (In total we pulled 68,465 words on 99 white players—6,228 unique—and 223,868 words on 288 black players—10,580 unique). You can play with the data in the interactive below; simply plug a single word into the input field, hit search, and see how often the word appeared in black and white scouting reports.

Here's what the "interactive" looks like:

(For readers who are unfamiliar with the culture of American football, "center" is the name of a position on the offensive line, while "safety" in the name of a position in the defensive backfield.)

It's interesting to see such a nimble use of simple "text analytics" in this context. But the most striking part of it, to me, is this:

You can check out the code/documentation for the graphic over on Github.

It's neat to see magazine writers posting their data and code!

So I downloaded the .zip file and subjected the raw word counts to the same ranking method used earlier to rank "Obama's favored (and disfavored) SOTU words" (1/29/2014) — the "weighted log-odds-ratio, informative Dirichlet prior" algorithm described on p. 387-8 of Monroe, Colaresi & Quinn "Fightin' Words: : Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis 2009.

For each word in the list, I've printed out seven numbers:

1. The count in the scouting reports on black players
2. The black-player count expressed as frequency per million words
3. The count in the scouting reports on white players
4. The white-player count expressed as frequency per million words
5. The sum of 1 and 3
6. The sum of 2 and 4
7. The weighted log-odds ratio after Bayesian shrinkage and regularization

For the two position-words in the examples above, the results are

center 48 (214.412) 59 (861.754) 107 (366.021) -5.355
safety 148 (661.104) 3 (43.818) 151 (516.534) 4.300

Other offensive-line words tend to be white-associated:

guard 79 (352.887) 84 (1226.9) 163 (557.583) -5.885

And other defensive-backfield words tend to be black-associated:

cornerback 105 (469.026) 3 (43.818) 108 (369.442) 3.509

By this criterion, many of the most white-associated words are connected with the quarterback position:

accuracy 18 (80.4045) 68 (993.208) 86 (294.185) -8.096
pocket 72 (321.618) 94 (1372.96) 166 (567.846) -6.970
arm 172 (768.31) 149 (2176.29) 321 (1098.06) -6.798
placement 37 (165.276) 57 (832.542) 94 (321.551) -5.844
throws 163 (728.108) 121 (1767.33) 284 (971.495) -5.353
pressure 47 (209.945) 57 (832.542) 104 (355.759) -5.226
throwing 21 (93.8053) 39 (569.634) 60 (205.245) -5.181
delivery 6 (26.8015) 25 (365.15) 31 (106.043) -4.981
velocity 9 (40.2023) 27 (394.362) 36 (123.147) -4.892
passing 74 (330.552) 67 (978.602) 141 (482.327) -4.713
mobility 13 (58.0699) 28 (408.968) 41 (140.251) -4.597

This is probably the main reason for the difference in (normalized) frequency of "intelligent"

intelligent 15 (67.0038) 17 (248.302) 32 (109.464) -2.748

The black-associated words seem to be connected to a wider range of positions:

burst 360 (1608.09) 41 (598.846) 401 (1371.72) 4.386
return 127 (567.299) 4 (58.424) 131 (448.119) 3.816
coverage 398 (1777.83) 64 (934.784) 462 (1580.39) 3.427
acceleration 100 (446.692) 5 (73.03) 105 (359.179) 3.142
man 178 (795.111) 20 (292.12) 198 (677.31) 3.108
cuts 89 (397.556) 4 (58.424) 93 (318.13) 3.027
leaping 82 (366.287) 4 (58.424) 86 (294.185) 2.860
explosive 169 (754.909) 22 (321.332) 191 (653.364) 2.733
receivers 187 (835.314) 26 (379.756) 213 (728.621) 2.721
runner 231 (1031.86) 36 (525.816) 267 (913.342) 2.703
returner 62 (276.949) 2 (29.212) 64 (218.928) 2.658

The whole list is here.

It's nice to see that the (currently 317) comments on the Deadspin piece are free of racist invective, as far as I can tell. Either someone is policing their comments closely, or (more likely) it's just a different crowd from the people who comment in some other places.

[Tip of the hat to JP Settles]

 

Share:



21 Comments »

  1. KevinR said,

    May 11, 2014 @ 1:06 am

    Is there enough data to segregate by position? If the "white" words tend to be associated with quarterbacks, is that because many quarterback candidates are white or is it potentially indicative of a bias of some sort?

    [(myl) Unfortunately they don't provide the original texts, just the lexical histograms for (their identification of) "black" and "white" players. If all the texts were available, coded for relevant features, I think there would be enough to do the analysis, especially if you harvested the reports over a period of a few years. Maybe when I get home and have more reliable access to a good internet connection, I'll try it.]

  2. GeorgeW said,

    May 11, 2014 @ 4:55 am

    Since I am confident that whites are over represented at some positions (e.g. quarterbacks) and blacks at others (e.g. defensive backs), what were the authors trying to demonstrate?

    Of course, NFL scouts would talk about white players and black players differently since they are describing attributes of the various positions which are occupied disproportionately by race.

    What did I miss?

    [(myl) Well, it's always a good idea to check whether things that "everybody knows" are actually true -- often they aren't.

    And in this case, there are also some non-position-associated words that have quite different relative frequencies in the two lists, for whatever reason:

    overachiever 1 (4.46692) 11 (160.666) 12 (41.0491) -3.605
    understands 24 (107.206) 27 (394.362) 51 (174.459) -3.449
    intangibles 21 (93.8053) 25 (365.15) 46 (157.355) -3.426
    gritty 5 (22.3346) 10 (146.06) 15 (51.3113) -2.687
    tough 215 (960.387) 103 (1504.42) 318 (1087.8) -2.668
    works 71 (317.151) 44 (642.664) 115 (393.387) -2.643
    smooth 81 (361.82) 9 (131.454) 90 (307.868) 2.111
    physicality 71 (317.151) 6 (87.636) 77 (263.398) 2.266
    instincts 215 (960.387) 37 (540.422) 252 (862.031) 2.311
    natural 263 (1174.8) 48 (701.088) 311 (1063.86) 2.348
    aggressive 129 (576.232) 16 (233.696) 145 (496.01) 2.476
    fluid 107 (477.96) 11 (160.666) 118 (403.649) 2.537

    ]

  3. GeorgeW said,

    May 11, 2014 @ 6:42 am

    myl: Yes. But even some of these might be more relevant to certain positions: fluid and smooth (receivers, running backs), aggressive (linebackers), physicality (defensive players), instincts (defensive players), tough (not kickers), etc.

  4. Nate said,

    May 11, 2014 @ 7:10 am

    Blacks are over-represented in the NFL period. Sorry to state the obvious. I know that makes a lot of people uncomfortable.

  5. James said,

    May 11, 2014 @ 8:02 am

    My wife heard a player described as 'instinctive' and guessed he must be black. She was right in that case, but, although I can't check the weighted log-odds-ratio, informative Dirichlet prior, I found that the word is more commonly used to describe white players, 2.48 (per 10.000) to 1.61.

    [(myl) Actually, you can, because the whole word list is here, and instinctive is

    instinctive 36 (160.809) 17 (248.302) 53 (181.3) -1.050

    A good example of how what people think is going to happen sometimes doesn't.]

  6. Jerry Friedman said,

    May 11, 2014 @ 10:35 am

    For those who, like me, hadn't been reached by the "infamy" of Nolan Nawrocki, his contributions to the sample are relevant because people have said he often describes black players, especially quarterbacks, much less favorably than comparable white players.

  7. Scott McClrue said,

    May 11, 2014 @ 11:26 am

    Season 2, episode 3 of _The League_ identifies 'scrappy' as a term used to describe white players but not black players (http://en.wikiquote.org/wiki/The_League#The_White_Knuckler_.5B2.3.5D). This remark doesn't seem to hold true for this year's NFL draft, however: 0.44 uses per 10,000 words for white players, and 0.54 uses per 10,000 words for black players.

  8. Faith said,

    May 11, 2014 @ 12:49 pm

    @GeorgeW
    Isn't the point the exact opposite? That is, blacks and whites are represented disproportionately in certain positions because people like coaches and scouts and others who decide where players fit have already decided what race is suitable for what position before they even look at the individual? In other words, the point of this exercise is that these players are being fit into positions based on their race, rather than their abilities.

  9. GeorgeW said,

    May 11, 2014 @ 3:26 pm

    Faith: I wouldn't argue that stereotyping doesn't occur in sport, at least in some positions.

    But, certain adjectives, I think are relevant to certain positions because of the demands of the position, e.g. "aggressive" for linebackers. This would be a good attribute for the position whether the player is black or white. So, if there are more blacks at the linebacker position, the data would show 'aggressive' being applied more often to blacks.

    It is commonly thought, and for good reason, that intelligence is a necessary quality for a quarterback. And traditionally, most QBs have been white. So, to see to 'intelligent' used to describe a QB, would necessarily apply more often to whites than blacks.

  10. Scott McClure said,

    May 11, 2014 @ 8:06 pm

    Does this study use a background text, as the SOTU study did? I'm curious about the genre of the background text used for this study — is it football-related?

    [(myl) The Deadspin interactive just reports raw frequencies, and graphs relative frequencies (normalized per 10,000 words) -- when the raw counts are below a threshold, it simply refuses to show a result.

    In my numbers -- for which the term "study" is way too ambitious -- I did what Monroe et al. recommend as an appropriate expedient in such cases, which is to use the sum of the two contrasted histograms as the "background" text.]

  11. J.W. Brewer said,

    May 11, 2014 @ 8:11 pm

    So are we agreed that the interesting question is whether race would have any impact once you adjusted for the different racial mix by position, but that the data necessary to perform that adjustment is not readily available?

    [(myl) That's clearly true for e.g.
    mobility 13 (58.0699) 28 (408.968) 41 (140.251) -4.597
    vs.
    burst 360 (1608.09) 41 (598.846) 401 (1371.72) 4.386
    but not so clearly true for
    overachiever 1 (4.46692) 11 (160.666) 12 (41.0491) -3.605
    vs.
    natural 263 (1174.8) 48 (701.088) 311 (1063.86) 2.348
    ]

  12. J.W. Brewer said,

    May 11, 2014 @ 9:16 pm

    For context, here's a link http://www.jsonline.com/blogs/sports/129967143.html giving stats on racial mix by position in the NFL as of a few years ago, illustrating how extreme the variation is.

  13. Rich Rostrom said,

    May 11, 2014 @ 9:37 pm

    Body types vary with ethnicity. I know a man who told me he could buy clothes that fit him off the rack in Hamburg, but not in Milan. (He was of north German descent.)

    The broad, heavy-set body type optimal for offensive linemen is more common among north Europeans than most other groups.

    Contrariwise, it's well established that west Africans are disproportionately able in sprinting and leaping; they dominate the wide receiver and defensive backfield positions.

    So the proportion of words used to describe different groups will vary.

  14. chris said,

    May 12, 2014 @ 6:13 am

    But, certain adjectives, I think are relevant to certain positions because of the demands of the position, e.g. "aggressive" for linebackers. This would be a good attribute for the position whether the player is black or white.

    True, but I think it's worth considering the possibility that stereotypical views of blacks (such as those exemplified by Rich Rostrom above) play a role in steering them into positions that their (supposedly) aggressive, explosive, physical nature is well suited to.

  15. GeorgeW said,

    May 12, 2014 @ 6:37 am

    Chris: Good point, but I am not sure how important stereotyping is, at least for many positions. Physical characteristics (without regard to race) do determine the positions an athlete can play. There are no 300 lb. wide receivers or 180 lb. offensive linemen in the NFL. Height is a limiting factor for quarterbacks but much less so for corner backs. Etc. I don't think these are the result of stereotyping.

    However, it probably does affect certain positions such as the 'intelligence' factor for quarterback.

  16. J.W. Brewer said,

    May 12, 2014 @ 6:45 am

    Whatever "steering" phenomenon there might be would have occurred at an earlier point in the players' careers. No one at the stage of being considered for the NFL draft is marketing himself to potential employers as being equally able to play center and play safety, because specialization has already occurred by then. Obviously whenever some other phenomenon is statistically correlated with race, it is difficult to know ex ante what that correlation means. If for example some other apparent statistical difference between the races disappears once you control for income level, that certainly doesn't exclude the possibility that racial stereotyping is causally implicated in the income level differentials. But it's still useful to know.

  17. J.W. Brewer said,

    May 12, 2014 @ 6:55 am

    BTW, a relevant-seeming quote (that resonates with concerns myl has raised in numerous prior posts about people getting muddled up trying to talk about statistical phenomena in ordinary language) that I saw by chance on the internet (from a review of Nicholas Wade's new book on genetics and racial stuff): "every race has individuals with all sorts of attributes, even if the averages turn out to be a little different. But not everyone has a solid grasp on these kinds of statistical concepts. For many, there is no difference between "genes that increase X are slightly more common in this racial group" and "members of this racial group are inherently high in X." When X is, for example, intelligence or propensity to violence, this perception can lead to serious societal problems."

  18. J.W. Brewer said,

    May 12, 2014 @ 7:27 am

    Methodological question: does 11 hits for "overachiever" mean it occurs in 11 different reports, or could it be equally consistent with e.g. 8 different reports, 5 of which use it once and 3 of which use it twice? If you've got on the order of 1100 different reports (387 separate players with coverage of most if not all of them from 3 separate sources), the fact that such a seeming sports-cliche word is used so infrequently strikes me as at least as interesting as the racial mix of the occasions when it is used.

  19. Jerry Friedman said,

    May 12, 2014 @ 8:23 am

    "Steering" can be done by the players themselves. I can easily imagine that a big, athletic white boy who watches football can see a future for himself on the offensive line, while a black boy with the same qualities can see a future for himself on the defensive line.

    In any case, I don't think this study has a singular explanation, "the point" as Faith put it. There can easily be statistical physiological differences and statistical cultural differences and stereotyping in the descriptions and stereotyping affecting who plays what positions.

  20. Mr Punch said,

    May 12, 2014 @ 9:19 am

    We are talking here about extreme physical types; tiny variations in ability make the difference between success and failure. Here's the basic rule: Nobody is allowed to touch this subject (except Malcolm Gladwell).

  21. Scott McClure said,

    May 13, 2014 @ 7:48 pm

    Thanks, myl, for the explanation about the background text — using the reported counts as the background seems to make a lot of sense in this case, since I imagine that there isn't any other kind of text that's quite like a football draft report.

RSS feed for comments on this post · TrackBack URI

Leave a Comment