In the comments on yesterday's post, Ran Ari-Gur raised the possibility that sentence-initial conjunctions are verbally and plenarily inspired of God, just as singular they is. Ran's evidence came from a sample consisting of the first 80 verses of Genesis in the original Hebrew and in the King James translation. I decided to check more systematically, and so this morning I downloaded the entire KJV and (wrote a script that) counted.
Out of 791,524 total words, there appear to be 12,846 instances of sentence-initial and, for a frequency of 16,229 per million. This is more than four times the rate of sentence-initial and in the COCA "spoken" section (4,048 per million), and more than 60 times the pathetic 263 per million of secular academic prose:
(Here as often, the vernacular is more theologically correct. But even in the darkest groves of academe, a significant glimmer of the divine spark remains: sentence-initial and, at 263 per million, is still much commoner in academic prose than nevertheless in all positions, with an academic frequency of 82 per million, and about the same as therefore, with an academic frequency of 278 per million.)
My script also counted 1,558 instances of sentence-initial but in the KJV, for a frequency of 1,968 per million. This time, the COCA spoken rate comes close, at 1777.0 per million, while even among apostate academics, the rate of 490 per million approaches a quarter of the divine norm.
(In both cases, there is a hint that overall American rates of initial-conjunction use may be gradually becoming more godly.)
I should also note that my copy of the KJV text includes 51,693 total instances of and in all positions, for a rate of 65,308 per million, and that sentence-initial and is thus 24.9% of the total. In comparison, in COCA's spoken section, sentence-initial and is 15.2% of total and uses, whose aggregate frequency is 26,546 per million. In the academic section, the aggregate frequency of and is slightly higher, at 30,312 per million, but the proportion of sentence-initial and is merely 0.9%. This does suggest that academics may have been influenced by the godless "No Initial Coordinators" movement (though alternatively, they may just have longer sentences and fancier options for discourse connectives).
In the case of but, the KJV text exhibits a proportion of 39% sentence-initial use, while COCA shows 37.9% in the spoken section and 18.2% in the sample of academic prose.
All this suggests another line of marginally blasphemous coffee mugs and t-shirts:
(This garment is purely a stylistic supplement, in addition to being hypothetical: Anti-zombie effects have not been demonstrated in clinical trials.)
Also in yesterday's comments, D.O. looked at "the ultimate piece of formal writing in the U.S., the Constitution", and found several instances of initial conjunctions, suggesting in addition that the Supreme Court "is also not shy in using sentence initial conjunctions". So I poured another cup of coffee, and decided to extend this morning's Breakfast Experiment™ by running a script over the roughly 30,000 files in the historical archive of SCOTUS opinions and other documents that I recently got from Jerry Goldman at oyez.org and Tim Stanley at justia.com. I grouped these files into five-year periods from 1801 to 2005 (e.g. 1801-1805, 1806-1810, etc.), and counted the frequency of sentence-initial and and but in each time-slice:
There is pretty clearly some structure here — and it's unlikely to be a sampling artefact, since the number of words per time-slice varies from 237,080 in 1801-1805 to 6,941,126 in 1981-1985, with a mean of 2,762,138. The 1831-1835 slice has 1,088,581, and after that, every sample has at least a million words, except for 1841-1845 with 885,229, and 1861-1865 with 994,278. (Of course, the number of authors per year is much lower, and stylistic variation among individual justices and clerks is a plausible source of year-to-year variation.)
After two centuries of apparent decline, the use of sentence-initial coordinators seems to have been rebounding a bit recently — here are the year-by-year frequencies from 1980 to 2005:
I think there's a plausible prima facie case for some genuine variation over time here, but there's a lot of checking to do before concluding that the changes are as depicted, much less venturing any specific explanation for them.
One thing is clear, though. For the past two centuries the U.S. Supreme Court has been using sentence-initial and at rates substantially higher than those found in COCA's "academic" section: the SCOTUS median is 563 per million, compared to the COCA academic-section frequency of 263 per million. And similarly for sentence-initial but, where the SCOTUS median of 852 per million compares to the COCA academic-section value of 490. The Supremes still fall well short of the divine ideal, however, at least on this dimension.