David Craig asked whether Anand Giridharadas is suffering from the Recency Illusion in his small piece on "so" (Follow My Logic? A Connective Word Takes the Lead, NYT 5/21/2010), which observes that
“So” may be the new “well,” “um,” “oh” and “like.” No longer content to lurk in the middle of sentences, it has jumped to the beginning, where it can portend many things: transition, certitude, logic, attentiveness, a major insight. […]
Giridharadas disarms antedating by citation:
One can dredge up ancient instances of “so” as a sentence starter. In his 14th-century poem “Troilus and Criseyde,” Chaucer launched a verse with, “So on a day he leyde him doun to slepe. …” But for most of its life, “so” has principally been a conjunction, an intensifier and an adverb.
What is new is its status as the favored introduction to thoughts, its encroachment on the territory of “well,” “oh,” “um” and their ilk.
So it is widely believed that the recent ascendancy of “so” began in Silicon Valley. The journalist Michael Lewis picked it up when researching his 1999 book “The New New Thing”: “When a computer programmer answers a question,” he wrote, “he often begins with the word ‘so.’ ” Microsoft employees have long argued that the “so” boom began with them.
And it's wonderful to see that he cites a linguist, Galina Bolden, and links to one of several papers that she's written on the subject ("Implementing incipient actions: The discourse marker ‘so’ in English conversation", Journal of Pragmatics 41:974–998, 2009).
However, Bolden's work doesn't address the recency question, as far as I've found; and as Giridharadas recognizes, her analysis of so doesn't seem to be quite the same as the one that he puts forward:
But in the algorithmic times that have come, “so” conveys an algorithmic certitude. It suggests that there is a right answer, which the evidence dictates and which must not be contradicted. Among its synonyms, after all, are “consequently,” “thus” and “therefore.”
And yet Galina Bolden, a linguistics scholar who has studied of recorded ordinary conversations and has written academic papers on the use of “so,” believes that “so” is also about the culture of empathy that is gaining steam as the world embraces the increasing complexity of human backgrounds and geographies.
To begin a sentence with “oh,” she said in an e-mail message, is to focus on what you have just remembered and your own concerns. To begin with “so,” she said, is to signal that one’s coming words are chosen for their relevance to the listener.
The ascendancy of “so,” Dr. Bolden said, “suggests that we are concerned with displaying interest for others and downplaying our interest in our own affairs.”
Here's what she wrote in the abstract of the paper that he links to:
The discourse marker ‘so’ is most commonly described as indexing inferential or causal connections. However, recordings of everyday talk show that these are not its only functions. The article uses the methodology of conversation analysis and examines a large corpus of recorded conversations to explicate the role of ‘so’ in implementing incipient actions. The analysis focuses on the use of ‘so’ for prefacing sequence-initiating actions (such as questions) and demonstrates that speakers deploy this preface to indicate the status of the upcoming action as ‘emerging from incipiency’ rather than being contingent on the immediately preceding talk. ‘So’ prefacing is recurrently used in contexts where the activity being launched has been relevantly pending. Additionally, speakers can use ‘so’ to characterize and constitute a particular action as advancing their interactional agenda.
Her 2005 UCLA dissertation ("Delayed and incipient actions: The discourse markers "-to" and "so" in Russian and English conversation"), argues for the same analysis.
And at this point, we need to point out that there are many, many different uses of so; and even the OED's roman numeral II sub-entry for so (of VIII major sub-entries), described as "Placed at the beginning of a clause with continuative force", has several major arabic-numeral sub-subsections; and (for example) sub-subsection II.10.b is further divided into two:
II.10.b.(a) As an introductory particle, without a preceding statement (but freq. implying one)
II.10.b.(b) [Reflecting Yiddish idioms.] Without implication of a preceding statement, or with concessive force: = well then, in that case, very well; also (introducing interrogative clauses) with adversative force: = but then, anyway.
Giridharadas is apparently writing about fairly large number of diffferent (or at least differentiable) uses of so, only some of which are the same as those that Bolden analyzed.
Still, let's take a shot at David Craig's question, namely whether Giridharadas might be a victim of the recency illusion. One extremely crude test is whether the overall frequency of sentence-initial so has changed much over the last couple of hundred years. A query to Mark Davies' Corpus of Historical American English gives us an appropriately crude answer:
This seems to confirm the general idea — though the change is a rather gradual one, and of course the mix of sentence-initial so's has probably changed over the years, with earlier instances perhaps tending more toward things like
So help me heaven, As I shall keep this oath!
So fiercely did the flames rage, that at one time it was feared the fire would cross the river to the side on which the fort is situated, in which case it and all within must have been destroyed.
So may the Redeemed some day sing the Doxology in Heavenly courts.
To really address David Craig's question, we'd have to create an operational definition of what Giridharidas was talking about, so as to classify individual instances of initial so as relevant or not relevant; and then we'd have to filter the results of such a query through this classification process. That's not quite all — we'd also need to control or compensate somehow for the potentially changing mix of styles and genres in our historical corpus (and in this case, the actually changing mix in Mark Davies' resource).
I'm not about to do this, since (even my extended summertime Sunday) breakfast is nearly over. But one thing we can do, as a sort of crude proxy for various sorts of more careful experimentation, is to look at the history of various alternative words and word sequences.
If we limit our search to initial so followed by a pronoun, we can eliminate some of the alternative uses exemplified above ("so help me…", "so fiercely did …", "so may the Redeemed …"). The results are not very different:
If we limit the search to initial "So the" or "So it", we get even less indication of a radical change in recent decades:
Looking at initial "So it is" or "So it's" does show more of a recent increase:
Initial "So what" shows a steady increase from the 1920s onwards:
At this point, though, you ought to be wondering whether the whole enterprise is infected by the influence of gradually decreasing mean sentence length. Maybe — but not all initial connectives have increased in frequency in this corpus:
And there are some cases where an initial connective shows a non-monotonic trend:
Could the decline in initial However since the 1950s be the influence of Strunk & White? Maybe — but I don't recall any warnings about initial Now, which also seems to be declining in frequency (at least in Mark Davies' selection of texts…):
Anyhow, the verdict on David Craig's question remains unclear, in my opinion. We could turn Anand Giridharadas's column into a testable hypothesis about historical change, by distilling from his rather diverse list of observations a classification scheme that is well enough defined for us to be able to tell when we have an example of the allegedly new use that he's talking about. Then we could apply this scheme to some historical corpus, and look at how the frequency of different classes changes over time.
Unfortunately, different people might produce different operational definitions, and these different outcomes would constitute different hypotheses, some of which might be true and some false. But however we defined our terms, and however we chose the materials to examine, the crude evidence available so far suggests that we will not find evidence that so is "the new 'well,' 'um,' 'oh' and 'like', at least if this hypothesis entails that a substantial fraction of those words' frequency counts should have shifted to initial 'So' in recent years.
[Note — if you want to turn COHA (CHAE?) counts into frequencies, you will need to normalize for the varying sizes of the samples per decade, which appear to be:
Years <- seq(1810, 2000, by=10) Sizes <- c(1181022, 6927005, 13773987, 16046854, 16493826, 17125102, 18610160, 20872855, 21183383, 22541232, 22655252, 25632411, 24413247, 24144478, 24398180, 23927982, 23769305, 25178952, 27877340, 29479451)
expressed in R-ish terms.]
[Update — this can't compete with Beowulf and Chaucer, but apparently a half century ago, John W. Campbell had a regular feature in Analog under the heading "So What's New", e.g. this example from Volume 65, Issue 4, 1960:
This confirms my memory that Bolden's "incipiency" so has been around at least that long, whatever its changes in relative frequency may be.]