In yesterday's post "Deceptively valuable", I made use of counts from the Google Books ngram dataset, as seen through Mark Davies' convenient interface. That was a case where the ngram dataset's flaws (uncertain metadata, lack of ability to look at context, etc.) are more than balanced by its virtues. In thinking about some of the other issues involved, I remembered a case that makes it possible to check the ngram dataset's answers against those given by another historical collection: the trend over the past century for Americans to replace "sneaked" with "snuck".
Here's the result of tracking this trend in the Google Books dataset:
In "Graphically snuckward", 6/19/2010, I created a similar plot from Mark Davies' COHA corpus:
For a discussion of some of the linguistic and cultural issues involved, see Stan Carey, "'Snuck' sneaked in", Sentence First 6/18/2010.