Now that there are effectively just two Republican and two Democratic presidential candidates left, I'm starting to get questions about comparing speaking styles across party boundaries. One simple approach is a type-token plot — this is a measure of the rate of vocabulary display, where the horizontal axis is the sequentially increasing number of words ("tokens"), and the vertical axis is the total number of distinct words ("types") at each step.
I've previously noted ("Vocabulary display in the CNN debate", 9/18/2015) that Ted Cruz and Donald Trump were at the extremes of vocabulary display rates for the 9/16/2015 CNN debate:
This pattern continues to hold if we string together four of the Republican debates:
So how about Clinton and Sanders? Since Cruz and Trump really are extreme among politicians on this measure, I'd expect Clinton and Sanders to fall in between them.
And so they do. I've taken Clinton's and Sanders' turns from the Democratic debates in Las Vegas on 10/13/2015, in New Hampshire on 2/11/2016, and in Miami on 3/9/2016, and combined the resulting type-token functions with those for Cruz and Trump from the four Republican debates cited in the previous graph, giving us this:
What do these striking and consistent differences mean, if anything?
I think that they're telling us something about the candidates' rhetorical habits.
Trump: As I've noted several times before, Donald Trump tends to repeat exact word sequences or close paraphrases, sometimes immediately and sometimes after a few intervening phrases. (See e.g. "Donald Trump's repetitive rhetoric", 12/5/2015; "Trump's rhetorical style", 12/26/2016.) And I speculate that this stylistic habit is connected to another striking characteristic of his speaking style: he never uses "filled pauses" like um and uh, and also has a very low rate of longer silent pauses. Even skilled and practiced professional speakers, including most other politicians, generally have significant rates of filled and silent pauses when they're speaking ex tempore. It seems plausible that Trump developed the habit of repeating phrasal fragments as a way to occupy speech-planning time without filled pauses or dead air. And as it turns out, this method is good marketing as well, since it reinforces the repeated aspects of his message at an unconscious level in the audience.
Cruz: It's often remarked that Ted Cruz was a champion debater in college. As the American Parliamentary Debate Association website explains, "Parliamentary debate is an off-topic, extemporaneous form of competitive debate which stresses rigorous argumentation, logical analysis, quick thinking, breadth of knowledge, and rhetorical ability over preparation of evidence." This experience developed his natural abilities in a direction that allows him to deploy lexical resources in an apparently effortless phrasal efflorescence, with very little repetition. And he also almost never uses filled pauses or longer silent pauses, perhaps because these features were discouraged in the debating style.
Clinton and Sanders: Their type-token plots are typical of modern American politicians in a debate setting — compare Carson, Rubio, Jeb Bush, and Kasich. And their distributions of filled pauses and longer silent pauses are also typical, I think — though a quantitative investigation will have to wait for another day's Breakfast Experiment™. That's not to say that Hillary Clinton and Bernie Sanders don't have individual stylistic characteristics — but we don't see the differences in their rate of vocabulary display.
A type-token plot obviously pays no attention to what words are chosen — common or rare, short or long, positive or negative, comforting or alarming — and those characteristics are also stylistically important. The way that words are combined into phrases and sentences and paragraphs also affect our perception of linguistic style: paratactic or hypotactic, loose or periodic, focused or divergent, and so on.
Those characteristics can also be quantified — but for now I'll just add a small proxy statistic for the use of "big words", namely average letter count for the words used in the debates cited above by the various candidates:
Again, Donald Trump comes out on the low end of a measure of linguistic ostentation. As I've observed, this is in striking contrast to his taste in interior decoration — "Trump the Thing Explainer?", 3/16/2016. (See also "Lexical bling: Vocabulary display and social status", 11/20/2014.)
In the end, I hope that voters will pay more attention to what the candidates say than to how they say it — though some people may argue that style reveals personality, and that personality also matters.
[Note: A slightly different version of the type-token plot would collapse inflectional variants (e.g. try/tries/tried/trying) into a single word "type". And we could also try to distinguish different words that happen to be spelled the same way. I haven't done either of those things in the plots shown above: there a "word type" is just a unique letter sequence.]