SOTU evolution
In preparation for Tuesday's State of the Union address, I thought I'd take a look at the language of these addresses over the years. Texts are available at UCSB's American Presidency Project — I downloaded their texts and removed irrelevant mark-up .(Or rather, I wrote scripts to do all of this automatically — I believe that the results are generally correct but there are probably a few uncaught errors.)
There are lots of ways to approach this question. In today's post, I'll set the stage and look at a couple of simple word-frequency features, with more (and maybe more interesting) explorations to come later on.
Read the rest of this entry »