Debate words
« previous post | next post »
The Transcript Library at rev.com is a great resource — within 24 hours, they had transcripts of Wednesday's Fox News Republican presidential debate, and also of Tucker Carlson's debate night interview with Donald Trump on X.
So this morning I downloaded the transcripts, and ran the code that I've used several times over the years to identify the characteristic word-choices of an individual or of a group.
Given eight participants in the Fox debate, the number of words that each one used was not very large. In descending order:
Candidate | N Words |
Ramaswamy | 2465 |
Pence | 2362 |
DeSantis | 1921 |
Christie | 1928 |
Haley | 1701 |
Burgum | 1569 |
Scott | 1241 |
Hutchinson | 1219 |
And given that the candidates were asked different questions, and didn't have time to say much overall, it's interesting that the differences are still somewhat interpretable.
For example, Vivek Ramaswamy repeated a number of words that none of the other candidates used even once: generation (6 times), revolution (5 times), reality (5 times), professional (5 times), nuclear (4 times), epidemic (4 times). He used mental 4 times in his 2465 words, for a rate of 1.6 per thousand, whereas the others (specifically Mike Pence) used it 1 time across a total of 11952 words, for a rate of .08 per thousand — a rate almost 20 times lower.
Obviously, it matters how and why Ramaswamy used those words — you can check them out in the rev.com transcript. For example, some of his uses of revolution, reality, and professional occur during a chaotic passage, at around 33 minutes, in which DeSantis, Ramaswamy, Pence and the moderators all interrupt one another repeatedly. The immediate context:
Ramaswamy: I just want to respond to Mike for one second because he invoked me back. Listen, now that everybody’s gotten their memorized, pre-prepared slogans out of the way, we can actually have a real discussion now. The reality and the fact of the matter is-
Pence: Was that one of yours?
Ramaswamy: Not really, Mike, actually. We’re just going to have some fun tonight. And the reality is, you have a bunch of people, professional politicians, super PAC puppets, following slogans handed over to them by their 400-page super PACs last week. The real choice we face in this primary is this: do you want a super PAC puppet or do you want a patriot who speaks the truth? Do you want incremental reform, which is what you’re hearing about? Or do you want revolution? And I stand on the side of the American Revolution, rather than this incrementalism.
Comparing overall rates of word usage is a difficult statistical problem, for reasons discussed at length in Monroe, Colaresi & Quinn "Fightin' Words: : Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis 2009. They argue that a plausible method is to use the "weighted log-odds-ratio, informative Dirichlet prior" algorithm described on p. 387-8 of their paper. I've used that algorithm in a number of earlier posts (see the list at the bottom), and tried it again here.
By that method, here are the top ten words for each candidate (followed by their "weighted log-odds-ratios").
Ramaswamy:
generation 2.279 revolution 2.080 reality 2.080 professional 2.080 address 1.997 nuclear 1.861 epidemic 1.861 love 1.665 mental 1.616 homeland 1.616
Pence:
leadership 2.313 american 2.292 united 2.137 vivek 2.096 states 2.030 promise 1.877 clear 1.877 leader 1.687 yet 1.625 proven 1.625
Christie:
democratic 2.575 jersey 2.384 here 2.101 who 1.996 waiting 1.946 incumbent 1.946 stood 1.864 sit 1.784 tonight 1.728 type 1.716
DeSantis:
decline 3.228 florida 3.078 going 1.993 are 1.761 country 1.722 thousands 1.715 her 1.715 tens 1.685 succeed 1.685 reasons 1.685
Haley:
defense 1.983 all 1.959 they 1.935 weeks 1.758 girls 1.717 classroom 1.717 ban 1.680 ukraine 1.642 senate 1.569 less 1.569
Burgum:
town 2.455 small 2.040 innovation 2.040 oil 2.004 dakota 2.004 buying 2.004 buy 2.004 north 1.783 loves 1.736 feds 1.736
Scott:
must 1.938 number 1.792 percent 1.782 package 1.782 illinois 1.782 justice 1.553 poverty 1.543 leave 1.543 fire 1.543 asking 1.543
Hutchinson:
terms 2.727 arkansas 2.525 science 2.061 computer 2.061 under 1.797 whenever 1.785 solution 1.785 disqualified 1.785 attacking 1.785 important 1.559
There are obviously many other ways to approach such transcripts, e.g. via LIWC; and there are often interesting acoustic measures, e.g. as discussed in "Debate quantification: how MAD did he get?", 10/29/2016. And the Carlson/Trump conversation is still waiting.
But that's all I have time for this morning, except to add a table of pronoun usage. The values are percentages of all the words used by each candidate — thus the 4.57 for Mike Pence's 1st person singular pronouns means that he used "I", "me", and "my" 108 times in his 2362 lexical tokens, and 108/2362 = 0.0457.
Candidate | 1st Person Singular | 2nd Person | 1st Person Plural |
Burgum | 2.62 | 1.02 | 4.02 |
Christie | 2.44 | 1.50 | 3.48 |
DeSantis | 3.94 | 2.38 | 4.30 |
Haley | 2.06 | 2.41 | 3.59 |
Hutchinson | 2.79 | 0.66 | 4.02 |
Pence | 4.57 | 1.40 | 3.13 |
Ramaswamy | 3.81 | 1.70 | 3.61 |
Scott | 2.81 | 1.85 | 4.59 |
The fact that Mike Pence leads by a significant margin in first-person pronouns doesn't reveal a narcissistic personality disorder, it merely reflects that fact that his central pitch was about his experience as vice president. But I'm not expecting George Will and other pundits to start writing column after column accusing Pence of narcissism, as they (falsely) did with respect to Obama.
A few relevant past posts:
"Fact-checking George F. Will", 6/7/2009
"Fact-checking George F. Will, one more time", 10/6/2009
"Another lie from George Will", 5/7/2012
"More B.S. from George F. Will", 8/28/2015
"Obama's favored and disfavored SOTU words", 1/29/2014
"Male and female word usage", 8/7/2014
"The most Trumpish (and Bushish) words", 9/5/2015
"Make America rather formidable again", 9/10/2015
"Political vocabulary display", 9/10/2015
"The most Kasichoid, Cruzian, Trumpish, and Rubiositous words", 3/11/2016
"More political text analytics", 4/15/2016
"Style shifting in student writing assignments", 10/5/2018
J.W. Brewer said,
August 25, 2023 @ 9:00 am
I see that Pence also scored the lowest on first-person plural pronoun usage. I wonder if that's because if your primary resume item is e.g. being governor of North Dakota you can say things like "in North Dakota, we solved problem X via successful policy Y" (rather than "I solved …") and the viewer/listener reaction is not that you're a raging egomaniac using the "royal we" but instead that you're graciously sharing credit with your whole administration and/or allies in the legislature. Perhaps it's harder to do that "it was a team effort" schtick when you're talking about tenure as Vice President?
Mark Liberman said,
August 25, 2023 @ 9:13 am
@J.W. Brewer: Perhaps it's harder to do that "it was a team effort" schtick when you're talking about tenure as Vice President?
And/or if you're trying to distance yourself a bit from the former president?
In the Tucker Debate-night interview, it's not a surprise that Donald Trump's 1st-person-singular pronoun rate is just a bit behind Pence's, at 4.17%. But it's interesting that his 1st-person-plural rate is way lower than any of the debate participants, at 1.3%.
And his ratio of 1st-person-singular to 1st-person-plural pronouns was off the charts at 3.21, compared to Pence with the next highest "I/we" ratio at 1.46. The overall ratio for all 8 debaters was 0.866. There were certainly plenty of questions in the Carlson interview where Trump could have chosen to speak about himself as a member of a group, but chose not to…
Kenny Easwaran said,
August 25, 2023 @ 11:04 am
I'm a bit surprised that words like "who" (for Christie) and "are" (for De Santis) manage to be used at such a different level than they are for others. Most of the other words on those lists indicate something about the subject matter they're talking about, or perhaps a style of speech.
Mark Liberman said,
August 25, 2023 @ 1:33 pm
@Kenny Easwaran: I'm a bit surprised that words like "who" (for Christie) and "are" (for De Santis) manage to be used at such a different level than they are for others.
Christie used "who" 21 times in 1928 lexical tokens, while the others used "who" 39 times in 12489.
(21/1928)/(39/12489) = 3.487991
So Christie used "who" about 3.5 times more often than the others did. Why? It's probably a stylistic rather than a topical difference. It could be more (direct or indirect) questions, or more relative clauses, or more often choosing "who" over "that" in relative clauses where either is idiomatic. We could read through the transcripts and figure it out, and also look at other examples of Christie's speech to see whether it's a consistent pattern.
DeSantis uses "are" 39 times in 1932 tokens, compared to 116 in 12485 for the others.
(39/1932)/(116/12485) = 2.172641
So he uses "are" about 2.2 times as often as the others. Again, we'd need to classify the examples (along some relevant dimensions) in order to figure out why it happened here, and whether it's more generally characteristic of him or specific to this debate.
Jerry Packard said,
August 25, 2023 @ 8:22 pm
Great post – thanks Mark.