The most Kasichoid, Cruzian, Trumpish, and Rubiositous words

« previous post | next post »

I didn't watch last night's Republican debate in Miami. Apparently it was a relatively sober affair — there were no penis comparisons, no one called anyone else a liar or a fraud or a con-man, there was hardly even any shouting or interrupting.

But several people have asked for a reprise of the type of analysis that I did back in September to compare Donald Trump's lexicon with Jeb Bush's ("The most Trumpish (and Bushish) words", 9/5/2015). So here it is, just for the words used in that 3/10/2016 debate.

First, the overall word counts:

Kasich 3,172
Cruz 3,677
Trump 5,114
Rubio 4,701
All (including moderators) 21,117

For each of the four candidates, I calculated the the "weighted log-odds-ratio, informative Dirichlet prior", using the algorithm described on p. 387-8 of Monroe, Colaresi & Quinn "Fightin' Words: : Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis 2009. In each case, I used the overall word counts from the debate as the prior, and compared the selected candidate's counts to the counts of the other three candidates, given the weighting prescribed by Monroe et al.'s algorithm:

Thus the 20 most Trumpish words were:

i        196 (38326.2) 217 (18787.9) 470 (22256.9) 3.546
very      41 (8017.21) 17 (1471.86) 65 (3078.09) 3.248
deal      33 (6452.87) 19 (1645.02) 54 (2557.18) 2.495
don't     43 (8408.29) 33 (2857.14) 83 (3930.48) 2.358
think     31 (6061.79) 19 (1645.02) 56 (2651.89) 2.341
want      34 (6648.42) 24 (2077.92) 87 (4119.9) 2.335
ever      14 (2737.58) 3 (259.74) 17 (805.039) 2.295
nobody    10 (1955.42) 0 (0) 13 (615.618) 2.278
ted       11 (2150.96) 1 (86.5801) 13 (615.618) 2.257
tremendous 9 (1759.87) 0 (0) 9 (426.197) 2.236
love      11 (2150.96) 2 (173.16) 13 (615.618) 2.096
something 11 (2150.96) 2 (173.16) 17 (805.039) 2.044
bid        7 (1368.79) 0 (0) 7 (331.486) 1.972
mean      14 (2737.58) 6 (519.481) 24 (1136.53) 1.868
it       119 (23269.5) 164 (14199.1) 318 (15059) 1.867
many      17 (3324.21) 10 (865.801) 31 (1468.01) 1.776
strong     9 (1759.87) 2 (173.16) 16 (757.683) 1.774
never     12 (2346.5) 5 (432.9) 17 (805.039) 1.755
beat       8 (1564.33) 2 (173.16) 10 (473.552) 1.678
good 17 (3324.21) 11 (952.381) 30 (1420.66) 1.668

Each line has the form

WORD  count1 (permillion1) count2 (permillion2) count3 (permillion3) WLO


  • count1 is the number of times the word was used by the selected candidate
  • (permillion1) is count1 expressed as frequency per million words
  • count2 and (permillion2) are the same things for the other three candidates
  • count3 & (permillion3) are the same things for the debate transcript as a whole
  • WLO is the "weighted log odds" as per the Monroe et al. algorithm

So last night, the 20 most Kasichoid words were

ohio you worried balanced standards ought the thank senator state mr budget secondly college kids positive trump them veteran ourselves

And the 20 least Kasichoid words were

's is it deal very good going never millions he bad example am love made now are think nothing tax

The 20 most Cruzian words were

donald you clinton washington who need obama hillary defend solution hard he immigration terrorist murder ayatollah nuclear billions tax point

And the 20 least Cruzian words were

they i it don't deal all way make no going good there than great mean get things love maybe very

The 20 most Trumpish words were

i very deal don't think want ever nobody ted tremendous love something bid mean it many strong never beat good

And the 20 least Trumpish words were

need when in my the who working america american with on to century kids law debt generation retire veteran us

The 20 most Rubiositous words were

senator in mr trump thank governor florida v.a program issue bipartisan my retire issues cruz debt century budget law miami

And the 20 least Rubiositous words were

i very we and think tax ever hillary iran tremendous care don't beat was bid ted many donald bad deal


  1. J. W. Brewer said,

    March 11, 2016 @ 5:02 pm

    Someone a day or two ago did a compilation of the multi-word n-grams allegedly most characteristic of each of the four. I'm not familiar with the "tf-idf" approach used in the analysis, although I expect Prof. Liberman is.

    [(myl) "tf-idf" stands for "term frequency (times) inverse document frequency", a widely-used weighting for document comparison. I presume that in this case the "documents" are the compilations of candidates' texts — the idea is to find the sequences that are most common in a given candidate and also least common in the others.]

  2. Guy said,

    March 11, 2016 @ 5:10 pm

    Do you have associated variances? I ask because "Rubio" is rated as highly Rubiositous even though he apparently never even said his own name.

    [(myl) Actually he does say it once:

    But every afternoon, he takes his little aluminum chair and he sits outside of an early polling center and holds a sign that says "Marco Rubio."​

    But in fact you found a bug in my script, which I think I've now fixed. It mainly (only?) affected the bottom end of the weighted-log-odds distribution.]

  3. Doctor Science said,

    March 11, 2016 @ 5:52 pm

    Have you done something comparable for the Democrats? Mr Dr Science & I both feel that Hillary says "I" more than Bernie does, but we want FACTS.

    [(myl) Good idea! I've used the same method to compare Obama's SOTU addresses to those of earlier presidents ("Obama's favored (and disfavored) SOTU words", 1/29/2014), but I haven't looked at Hillary vs. Bernie.

    In the 3/09/2016 debate, their rates of "I" usage were HC 3.9%, BS 3.0%. Overall FPSP usage was HC 4.7%, BS 3.8%.

    By the MC&Q algorithm, the most HC-ish words were

    more had to i be comprehensive republicans same lost first border he opportunity laws senate make see how values clean

    And the most BS-ish words were

    united secretary states country street of world clinton may this should kids percent change wages wall major trump american history


  4. Chris C. said,

    March 11, 2016 @ 7:19 pm

    Pretty clearly, one of Trump's major achievements is that he's forced all the other candidates to talk about him extensively, whereas he has only returned the favor toward Ted Cruz.

  5. AntC said,

    March 11, 2016 @ 7:37 pm

    Seriously? The most most Trumpish word is 'I'??!

    [(myl) Indeed. Ditto in the selection of Trump and Bush texts that I analyzed last September ("The most Trumpish (and Bushish) words", 9/5/2015). It's pretty consistent. In one earlier debate, The Donald's first-person singular pronouns were about 7.5% of his vocabulary (see here), compared to Obama's typical rate of about 2.1% (see here).

    In yesterday's debate, Trump's rate of FPSP use was only 4.3% — still comfortably ahead of his rivals Kasich (3.2%), Rubio (2.5%), and Cruz (2.3%)]

    And are all the right-wing commentators jumping up and down about his narcissism?

    [(myl) As far as I know, George Will and company have been silent on this point — they don't like Trump, but a white billionaire's first-person-singular pronouns don't inflame their prejudices like a black intellectual's do. I've only seen one minor right-wing commentator take note of Trump's pronoun frequency — and he combined it with a repetition of the false assertion about Obama ("Did a blind squirrel happen to find a nut?", 8/8/2015).]

    Also 'I' or other first-person words seem to be amongst the least's for other candidates.

    BTW how come 'am' appears for Kasich, but not 'I'?

    [(myl) "am" is on the list of LEAST Kasichoid words — his rate for "am" is 0.9%, compared to 1.6% for Cruz, 3.6% for Rubio, and 3.7% for Trump. Kasich's rate of "I" usage is in the middle compared to the other debaters: Rubio 1.6%, Cruz 1.7%, Kasich 2.5%,Trump 3.8%. "I" can be used with lots of other verbs besides "am" :-)…]

  6. Pflaumbaum said,

    March 11, 2016 @ 7:42 pm

    Personally I think I'd adjectivise them Rubious (Rubioso), Crucial, Kasic and Trumpian.

  7. rosie said,

    March 12, 2016 @ 2:32 am

    Trumpy. Or maybe Trumpous.

  8. AntC said,

    March 12, 2016 @ 4:47 am

    If he keeps notching up primaries, it'll have to be Trumphant.

    (He can supply his own 'I's to make up the spelling ;-)

    @myl "I" can be used with lots of other verbs besides "am". Yes I realised as soon as I posted. Where is that 'Unsubmit dumb Comment' button?

  9. Mike Maltz said,

    March 12, 2016 @ 12:51 pm

    How could you overlook this! This shoe is a perfect fit.

    From Leo Rosten's The Joys of Yiddish:


    Pronounced TROM-beh-nik, to rhyme with "Brahma kick," or TRAUM-beh-nik, to rhyme with "brawn the pick" From the Polish, and/ or Yiddish: tromba: "a trumpet," "a brass horn."

    1. A blowhard, a braggart, a blower of his own horn. "That trombenik can drive you crazy."

    2. A glutton.

    3· A lazy man or woman; a ne'er-do-well.

    4· A parasite.

    5. A fake, a phony, a four-flusher.

    Any way you look at it, trombenik is not a word of praise. A trombenik is part of the raucous gallery of nudniks, shleppers, and paskudnyaks.

    " I," boasted the trombenik, "have been to Europe three times in the past two years."

    "So? I come from there."

  10. Terry Hunt said,

    March 14, 2016 @ 12:26 pm

    @ Mike Maltz

    So, per #1, it's someone who blows his own Trump-et.

    Although I have Rosten's delightful book, it's been too long since I re-read it, so I'd forgotten that one. (The London side of my family has some Jewish connections and almost certainly roots, but as a child I was only ever exposed to the odd Yiddish word.)

    I wonder if, in fact, this is the actual origin of the Trump/Trumph family name?

  11. andyb said,

    March 25, 2016 @ 12:56 pm

    Sorry for jumping in late here, but I just found this post, and it connected with something I've observed but haven't been able to quantify.

    It seems like, since Reagan, candidates in both parties have been using the word "America" progressively more often in debates and speeches. And usually the winners seem to use it more than the losers. But this time around, it seems to be the opposite.

    Rand Paul was definitely the "'Merica 'Merica 'Merica" candidate this time around, and his nearest competitors were other Republicans who also washed out early. Rubio did a bit of it, but he's gone now. Kasich hails only qualified versions, especially "blue-collar America". Clinton mostly avoids name-checking the country, and when she does, it's negative as often as positive, as in "systemic problems in American society". And Trump seems to be almost superstitiously avoiding the name, saying "we" or "here" or even "you", even when it sounds awkward. Sanders and Cruz would be outliers in any other campaign for how little they say America, but next to Clinton and Trump they seem almost traditional in their America-praising.

    So, the first question is, am I right about this? And, if so, does it mean anything? I can guess some reasons for Trump to buck the trend, but it seems odd for the others, especially Clinton and Cruz, who seem like exactly the kind of candidates who'd… well, who'd sound like Obama and W.

    As a side note, I think Clinton and Kasich seem to be the ones who mention God the most, by far, which is usually something you hear from the right side of the Republican primaries. Maybe Rubio was afraid to remind people he's Catholic, or trying hard to position himself as establishment-moderate rather than right-wing-base, but what about Cruz?

RSS feed for comments on this post