« previous post | next post »

Now that there are effectively just two Republican and two Democratic presidential candidates left, I'm starting to get questions about comparing speaking styles across party boundaries. One simple approach is a type-token plot — this is a measure of the rate of vocabulary display, where the horizontal axis is the sequentially increasing number of words ("tokens"), and the vertical axis is the total number of distinct words ("types") at each step.

I've previously noted ("Vocabulary display in the CNN debate", 9/18/2015) that Ted Cruz and Donald Trump were at the extremes of vocabulary display rates for the 9/16/2015 CNN debate:

This pattern continues to hold if we string together four of the Republican debates:

So how about Clinton and Sanders? Since Cruz and Trump really are extreme among politicians on this measure, I'd expect Clinton and Sanders to fall in between them.

And so they do. I've taken Clinton's and Sanders' turns from the Democratic debates in Las Vegas on 10/13/2015, in New Hampshire on 2/11/2016, and in Miami on 3/9/2016, and combined the resulting type-token functions with those for Cruz and Trump from the four Republican debates cited in the previous graph, giving us this:

What do these striking and consistent differences mean, if anything?

I think that they're telling us something about the candidates' rhetorical habits.

Trump: As I've noted several times before, Donald Trump tends to repeat exact word sequences or close paraphrases, sometimes immediately and sometimes after a few intervening phrases. (See e.g. "Donald Trump's repetitive rhetoric", 12/5/2015; "Trump's rhetorical style", 12/26/2016.) And I speculate that this stylistic habit is connected to another striking characteristic of his speaking style: he never uses "filled pauses" like um and uh, and also has a very low rate of longer silent pauses. Even skilled and practiced professional speakers, including most other politicians, generally have significant rates of filled and silent pauses when they're speaking ex tempore. It seems plausible that Trump developed the habit of repeating phrasal fragments as a way to occupy speech-planning time without filled pauses or dead air. And as it turns out, this method is good marketing as well, since it reinforces the repeated aspects of his message at an unconscious level in the audience.

Cruz: It's often remarked that Ted Cruz was a champion debater in college. As the American Parliamentary Debate Association website explains, "Parliamentary debate is an off-topic, extemporaneous form of competitive debate which stresses rigorous argumentation, logical analysis, quick thinking, breadth of knowledge, and rhetorical ability over preparation of evidence." This experience developed his natural abilities in a direction that allows him to deploy lexical resources in an apparently effortless phrasal efflorescence, with very little repetition. And he also almost never uses filled pauses or longer silent pauses, perhaps because these features were discouraged in the debating style.

Clinton and Sanders:  Their type-token plots are typical of modern American politicians in a debate setting — compare Carson, Rubio, Jeb Bush, and Kasich. And their distributions of filled pauses and longer silent pauses are also typical, I think — though a quantitative investigation will have to wait for another day's Breakfast Experiment™. That's not to say that Hillary Clinton and Bernie Sanders don't have individual stylistic characteristics — but we don't see the differences in their rate of vocabulary display.

A type-token plot obviously pays no attention to what words are chosen — common or rare, short or long, positive or negative, comforting or alarming — and those characteristics are also stylistically important. The way that words are combined into phrases and sentences and paragraphs also affect our perception of linguistic style: paratactic or hypotactic, loose or periodic, focused or divergent, and so on.

Those characteristics can also be quantified — but for now I'll just add a small proxy statistic for the use of "big words", namely average letter count for the words used in the debates cited above by the various candidates:

Sanders 4.40
Cruz 4.37
Clinton 4.29
Bush 4.24
Rubio 4.23
Carson 4.13
Kasich 4.07
Trump 3.96

Again, Donald Trump comes out on the low end of a measure of linguistic ostentation. As I've observed, this is in striking contrast to his taste in interior decoration — "Trump the Thing Explainer?", 3/16/2016. (See also "Lexical bling: Vocabulary display and social status", 11/20/2014.)

In the end, I hope that voters will pay more attention to what the candidates say than to how they say it — though some people may argue that style reveals personality, and that personality also matters.

[Note: A slightly different version of the type-token plot would collapse inflectional variants (e.g. try/tries/tried/trying) into a single word "type". And we could also try to distinguish different words that happen to be spelled the same way. I haven't done either of those things in the plots shown above: there a "word type" is just a unique letter sequence.]



  1. Doctor Science said,

    March 27, 2016 @ 8:55 am

    just two Republican — Kasich is still there! really! just ask him!

  2. Charles Antaki said,

    March 27, 2016 @ 9:09 am

    "It seems plausible that Trump developed the habit of repeating phrasal fragments as a way to fill speech-planning time without filled pauses or dead air."
    That sounds very likely – I guess Prof Liberman has in mind the bags of cognitive psycholinguistic research attesting to the difficulty of producing hesitation-free continuous speech. But there's plenty of (more entertaining) evidence in BBC radio's Just a Minute programme, well known, I'd guess, to UK-based LL readers. (Though were Trump to be a contestant, he'd fail rather spectacularly on the no-repetition rule.)

  3. D.O. said,

    March 27, 2016 @ 9:55 am

    I am not sure why you think that Trump's speaking style is a plus for him. His success might be due to his policy positions (or tendencies), character, novelty value, massive TV exposure or who knows what else.

    On a more linguistic note, it seems that Trump's vocabulary display is qualitatively different from other politicians'. It's not simply that he lags behind, it seems that around about 150-200 "types" something happens and he begins to slow down in his display much more visibly. It's not hard to come up with some math metrics of this, but more interesting question is how to think about it qualitatively. Is it some basic number of "types" that allows a generic English speaker to speak fluently. Or is it some basic number plus words that Trump is especially partial to. Or something completely different.

    [(myl) Local small-scale fluctuations in a type-token curve are usually due to topic or style shifts rather than to any stable characteristics of the speaker or writer. You can see that in the type-token plot for the start of Trump's contributions to four different debates:

    Attempts to deduce anything about Mr. Trump from the jigs and jags in those curves is just the exegesis of noise. But at the scale of the concatenation of all four debates, the curves pretty much overlap:


  4. Y said,

    March 27, 2016 @ 4:32 pm

    "The exegesis of noise"—that's excellent. It should be the name of a book or something.

  5. Rubrick said,

    March 27, 2016 @ 5:40 pm

    Your opening sentence did in fact cause me to say "Oh, Kasich dropped out??", and check Google News for headlines. Nope. I guess reading 538 has caused me to be more cognizant of his existence than the rest of the world.

    [(myl) OK, now it's "there are effectively just two Republican and two Democratic presidential candidates" …]

  6. D.O. said,

    March 27, 2016 @ 8:04 pm

    I was unclear. I didn't suggest that individual jiggles on the curves are informative. I think that Tramp usually reaches about 200 "types" after speaking out about 500-600 "tokens" and this is about the norm for other candidates. After that he increases his "types" at noticeably slower pace than other speakers.

  7. Matt McIrvin said,

    March 28, 2016 @ 6:12 pm

    I've had my first sighting of the first-person-pronoun attack being used against Hillary Clinton, by Bernie Sanders supporters.

  8. Yuval said,

    March 29, 2016 @ 10:15 am

    I guess the common/rare issue could be controlled if the y-axis becomes a frequency-sensitive measure like accumulated type idf (/logidf/sqrtidf) based on some canonical corpus count?

    [(myl) Interesting idea. An alternative might be marginal evolution of compressed-text size based on a frequency- and recency-sensitive code.]

RSS feed for comments on this post