Debate words

« previous post | next post »

The Transcript Library at is a great resource — within 24 hours, they had transcripts of Wednesday's Fox News Republican presidential debate, and also of Tucker Carlson's debate night interview with Donald Trump on X.

So this morning I downloaded the transcripts, and ran the code that I've used several times over the years to identify the characteristic word-choices of an individual or of a group.

Given eight participants in the Fox debate, the number of words that each one used was not very large.  In descending order:

Candidate N Words
Ramaswamy 2465
Pence 2362
DeSantis 1921
Christie 1928
Haley 1701
Burgum 1569
Scott 1241
Hutchinson 1219

And given that the candidates were asked different questions, and didn't have time to say much overall, it's interesting that the differences are still somewhat interpretable.

For example, Vivek Ramaswamy repeated a number of words that none of the other candidates used even once: generation (6 times), revolution (5 times), reality (5 times), professional (5 times), nuclear (4 times), epidemic (4 times). He used mental 4 times in his 2465 words, for a rate of 1.6 per thousand, whereas the others (specifically Mike Pence) used it 1 time across a total of 11952 words, for a rate of .08 per thousand — a rate almost 20 times lower.

Obviously, it matters how and why Ramaswamy used those words — you can check them out in the transcript. For example, some of his uses of  revolution, reality, and professional occur during a chaotic passage, at around 33 minutes, in which DeSantis, Ramaswamy, Pence and the moderators all interrupt one another repeatedly. The immediate context:

Ramaswamy: I just want to respond to Mike for one second because he invoked me back. Listen, now that everybody’s gotten their memorized, pre-prepared slogans out of the way, we can actually have a real discussion now. The reality and the fact of the matter is-

Pence: Was that one of yours?

Ramaswamy: Not really, Mike, actually. We’re just going to have some fun tonight. And the reality is, you have a bunch of people, professional politicians, super PAC puppets, following slogans handed over to them by their 400-page super PACs last week. The real choice we face in this primary is this: do you want a super PAC puppet or do you want a patriot who speaks the truth? Do you want incremental reform, which is what you’re hearing about? Or do you want revolution? And I stand on the side of the American Revolution, rather than this incrementalism.

Comparing overall rates of word usage is a difficult statistical problem, for reasons discussed at length in  Monroe, Colaresi & Quinn "Fightin' Words: : Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis 2009. They argue that a plausible method is to use the  "weighted log-odds-ratio, informative Dirichlet prior" algorithm described on p. 387-8 of their paper. I've used that algorithm in a number of earlier posts (see the list at the bottom), and tried it again here.

By that method, here are the top ten words for each candidate (followed by their "weighted log-odds-ratios").


generation   2.279
revolution   2.080
reality      2.080
professional 2.080
address      1.997
nuclear      1.861
epidemic     1.861
love         1.665
mental       1.616
homeland     1.616


leadership 2.313
american   2.292
united     2.137
vivek      2.096
states     2.030
promise    1.877
clear      1.877
leader     1.687
yet        1.625
proven     1.625


democratic 2.575
jersey     2.384
here       2.101
who        1.996
waiting    1.946
incumbent  1.946
stood      1.864
sit        1.784
tonight    1.728
type       1.716


decline   3.228
florida   3.078
going     1.993
are       1.761
country   1.722
thousands 1.715
her       1.715
tens      1.685
succeed   1.685
reasons   1.685


defense   1.983
all       1.959
they      1.935
weeks     1.758
girls     1.717
classroom 1.717
ban       1.680
ukraine   1.642
senate    1.569
less      1.569


town       2.455
small      2.040
innovation 2.040
oil        2.004
dakota     2.004
buying     2.004
buy        2.004
north      1.783
loves      1.736
feds       1.736


must     1.938
number   1.792
percent  1.782
package  1.782
illinois 1.782
justice  1.553
poverty  1.543
leave    1.543
fire     1.543
asking   1.543


terms        2.727
arkansas     2.525
science      2.061
computer     2.061
under        1.797
whenever     1.785
solution     1.785
disqualified 1.785
attacking    1.785
important    1.559

There are obviously many other ways to approach such transcripts, e.g. via LIWC; and there are often interesting acoustic measures, e.g. as discussed in "Debate quantification: how MAD did he get?", 10/29/2016. And the Carlson/Trump conversation is still waiting.

But that's all I have time for this morning, except to add a table of pronoun usage. The values are percentages of all the words used by each candidate — thus the 4.57 for Mike Pence's 1st person singular pronouns means that he used "I", "me", and "my" 108 times in his 2362 lexical tokens, and 108/2362 = 0.0457.

Candidate 1st Person Singular 2nd Person 1st Person Plural
Burgum 2.62 1.02 4.02
Christie 2.44 1.50 3.48
DeSantis 3.94 2.38 4.30
Haley 2.06 2.41 3.59
Hutchinson 2.79 0.66 4.02
Pence 4.57 1.40 3.13
Ramaswamy 3.81 1.70 3.61
Scott 2.81 1.85 4.59

The fact that Mike Pence leads by a significant margin in first-person pronouns doesn't reveal a narcissistic personality disorder, it merely reflects that fact that his central pitch was about his experience as vice president. But I'm not expecting George Will and other pundits to start writing column after column accusing Pence of narcissism, as they (falsely) did with respect to Obama.

A few relevant past posts:

"Fact-checking George F. Will", 6/7/2009
"Fact-checking George F. Will, one more time", 10/6/2009
"Another lie from George Will", 5/7/2012
"More B.S. from George F. Will", 8/28/2015

"Obama's favored and disfavored SOTU words", 1/29/2014
"Male and female word usage", 8/7/2014
"The most Trumpish (and Bushish) words", 9/5/2015
"Make America rather formidable again", 9/10/2015
"Political vocabulary display", 9/10/2015
"The most Kasichoid, Cruzian, Trumpish, and Rubiositous words", 3/11/2016
"More political text analytics", 4/15/2016
"Style shifting in student writing assignments",  10/5/2018


  1. J.W. Brewer said,

    August 25, 2023 @ 9:00 am

    I see that Pence also scored the lowest on first-person plural pronoun usage. I wonder if that's because if your primary resume item is e.g. being governor of North Dakota you can say things like "in North Dakota, we solved problem X via successful policy Y" (rather than "I solved …") and the viewer/listener reaction is not that you're a raging egomaniac using the "royal we" but instead that you're graciously sharing credit with your whole administration and/or allies in the legislature. Perhaps it's harder to do that "it was a team effort" schtick when you're talking about tenure as Vice President?

  2. Mark Liberman said,

    August 25, 2023 @ 9:13 am

    @J.W. Brewer: Perhaps it's harder to do that "it was a team effort" schtick when you're talking about tenure as Vice President?

    And/or if you're trying to distance yourself a bit from the former president?

    In the Tucker Debate-night interview, it's not a surprise that Donald Trump's 1st-person-singular pronoun rate is just a bit behind Pence's, at 4.17%. But it's interesting that his 1st-person-plural rate is way lower than any of the debate participants, at 1.3%.

    And his ratio of 1st-person-singular to 1st-person-plural pronouns was off the charts at 3.21, compared to Pence with the next highest "I/we" ratio at 1.46. The overall ratio for all 8 debaters was 0.866. There were certainly plenty of questions in the Carlson interview where Trump could have chosen to speak about himself as a member of a group, but chose not to…

  3. Kenny Easwaran said,

    August 25, 2023 @ 11:04 am

    I'm a bit surprised that words like "who" (for Christie) and "are" (for De Santis) manage to be used at such a different level than they are for others. Most of the other words on those lists indicate something about the subject matter they're talking about, or perhaps a style of speech.

  4. Mark Liberman said,

    August 25, 2023 @ 1:33 pm

    @Kenny Easwaran: I'm a bit surprised that words like "who" (for Christie) and "are" (for De Santis) manage to be used at such a different level than they are for others.

    Christie used "who" 21 times in 1928 lexical tokens, while the others used "who" 39 times in 12489.

    (21/1928)/(39/12489) = 3.487991

    So Christie used "who" about 3.5 times more often than the others did. Why? It's probably a stylistic rather than a topical difference. It could be more (direct or indirect) questions, or more relative clauses, or more often choosing "who" over "that" in relative clauses where either is idiomatic. We could read through the transcripts and figure it out, and also look at other examples of Christie's speech to see whether it's a consistent pattern.

    DeSantis uses "are" 39 times in 1932 tokens, compared to 116 in 12485 for the others.

    (39/1932)/(116/12485) = 2.172641

    So he uses "are" about 2.2 times as often as the others. Again, we'd need to classify the examples (along some relevant dimensions) in order to figure out why it happened here, and whether it's more generally characteristic of him or specific to this debate.

  5. Jerry Packard said,

    August 25, 2023 @ 8:22 pm

    Great post – thanks Mark.

RSS feed for comments on this post