Chris Hanretty, "British headlines: 18% less informative than their American cousins", 11/29/2013:
I’m currently working on a project looking at the representation of constituency opinion in Parliament. One of our objectives involves examining the distribution of parliamentary attention — whether MPs from constituencies very concerned by immigration talk more about immigration than MPs from constituencies that are more relaxed about the issue.
To do that, I’ve been relying on the excellent datasets made available from the UK Policy Agendas Project. In particular, I’ve been exploring the possibility of using their hand-coded data to engage in automated coding of parliamentary questions.
One of their data-sets features headlines from the Times. Coincidentally, one of the easier-to-use packages in automated coding of texts (RTextTools) features a data-set with headlines from the New York Times. Both data-sets use similar topic codes, although the UK team has dropped a couple of codes.
How well does automated topic coding work on these two sets of newspaper headlines?