Various readers have pointed out to to me that the "QWERTY Effect" is back. (For coverage of the first QWERTY-Effect paper, see "The QWERTY Effect", 3/8/2012; "QWERTY: Failure to Replicate", 3/13/2012; "Casasanto and Jasmin on the QWERTY effect", 3/17/2012; and "Response to Jasmin and Casasanto's response to me", 3/17/2012.)
The new paper is Casasanto, D., Jasmin, K., Brookshire, G. & Gijssels, T. "The QWERTY Effect: How typing shapes word meanings and baby names". In P. Bello, M. Guarini, M. McShane, & B. Scassellati (Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 2014.
As before, the idea is that typing letters with the right hand makes us like them more; or in the words of their abstract,
Filtering words through our fingers as we type appears to be changing their meanings. On average, words typed with more letters from the right side of the QWERTY keyboard are more positive in meaning than words typed with more letters from the left: This is the QWERTY effect (Jasmin & Casasanto, 2012), which was shown previously across three languages. In five experiments, here we replicate the QWERTY effect in a large corpus of English words, extend it to two new languages (Portuguese and German), and show that the effect is mediated by space-valence associations encoded at the level of individual letters. Finally, we show that QWERTY appears to be influencing the names American parents give their children. Together, these experiments demonstrate the generality of the QWERTY effect, and inform our theories of how people’s bodily interactions with a cultural artifact can change the way they use language.
The most interesting new result is the baby-names experiment, in my opinion; and since I'm stuck in Heathrow Airport for a while, I thought I'd take a quick look at it.
For each year of birth YYYY after 1879, we created a comma-delimited file called yobYYYY.txt. Each record in the individual annual files has the format "name,sex,number," where name is 2 to 15 characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name. Each file is sorted first on sex and then on number of occurrences in descending order. When there is a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to determine a name's rank. The first record for each sex has rank 1, the second record for each sex has rank 2, and so forth.
To safeguard privacy, we restrict our list of names to those with at least 5 occurrences.
Casasanto et al. describe their first baby-name experiment this way:
We first analyzed the mean RSA of all names from 1960–2012 that had been given to at least 100 children every year (n = 788 distinct names).
Their definition of RSA:
Following J&C, we calculated the Right Side Advantage for each word by taking the difference of the number of letters on the right side of the keyboard (y, u, i, o, p, h, j, k, l, m n) and subtracting the number from the left side (q, w, e, r, t, a, s, d, f, g, z, x, c, v, b );[RSA=(# right-side letters) - (# left-side letters)]).
Their tabulation of mean-baby-name-RSA presents a convincing picture:
There's certainly a trend since the late 1980s. They associate this with the "QWERTY era":
It is difficult to pinpoint the moment in history at which QWERTY became ubiquitous in Americans’ homes, and a part of people’s daily lives across a wide variety of demographics. Apple Macintosh and Windows home computers became available, though not yet widely used, in 1984 and 1985, respectively. America Online made the Internet widely available in people’s homes starting in 1991. We chose the year 1990 as the beginning of the “QWERTY era” based on a survey of technological landmarks like those listed above, and on the inflection point observed in figure 5a, rounded to the nearest decade.
In trying to replicate and extend their results, I encountered some difficulty in determining exactly what their recipe for calculating the values in Fig. 5a was. The SSA's list separates names by sex, e.g. in 1960 we have:
Therefore in 1960, neither Alva-for-males nor Alva-for-females makes the threshold of 100, but Alva is certainly one of the "names … that had been given to at least 100 children" in that year.
If I interpret their recipe literally, and add the counts for male and females in each year, I get 802 names with a count of 100 or more children in every year from 1960 to 2012. If instead I threshold male and female names separately, and accept only those where one or the other had a count over 100, I get 791 distinct names.
Since neither 802 nor 791 is equal to 788, either my two interpretations of their recipe are both wrong, or there's a bug in my code, or there's a bug in their code. Anyhow, I'll go with the second list, since 791 is closer to 788 than 802 is.
But there's a further uncertainty in the recipe for calculating the "mean RSA" of baby names in a given year.
The name "Adrian" has two right-side letters (i+n) and four left-side letters (a+d+r+a), for an RSA of -2. The name "Sandy" has two right-side letters (n+y) and three left-side letters (s+a+d), for an RSA of -1.
In 1960, the SSA gives the following counts for these two names:
And both of them make both ">= 100" lists, not only for 1960 but also for all subsequent years. But what's the "mean RSA" here? Is it the per-word-token mean RSA:
(-2*107 + -2*558 + -1*3649 + -1*175)/(107+558+3649+175) = -1.148
Or is it the per-letter-token mean RSA:
(-2*107 + -2*558 + -1*3649 + -1*175)/(107*6+558*6+3649*5+175*5) = -0.223
The scale on their Fig. 5a suggests that it's the latter.
But this leaves one more uncertainty. Consider the name "Alex" in 1961:
It's on both ">=100" lists, because of the reliably large number of male Alexes. But the number of female Alexes doesn't surpass 100 until 1986. So in pre-1986 years, do we count the female Alexes or not? Given that "Alex" has 1 right-side letter (l) and three left-side letters (a+e+x), for a difference of -2, does "Alex" contribute -2*4*1219 to the RSA numerator, and 1219*4 to the RSA denominator? Or does it contribute -2*(21+1219) to the numerator, and (21+1219)*4 to the denominator?
I'll assume the former, and go forward. On that basis, I get this graph as an attempted replication of their Fig. 5a:
The shape is similar to their Fig. 5a, though it's different enough that it's clear my recipe is not quite the same as theirs. Still, I'll take this as replication of their result, and evidence that "Mean RSA" (under my interpretation as well as theirs, whatever exactly it is) has increased (by about 70%) since 1990.
But two obvious questions arise. Why limit consideration to names that occur more than 100 times in every year? And why look only at the evidence since 1960, given that the SSA data goes from 1880 forwards?
Limiting consideration to the commonest names will emphasize the effect of popularity changes in particular names, given the power-law (or more likely, log-normal) distribution of name frequencies. But I don't see any obvious reason to think that letter-preference effects should apply only to names with some minimum count across the 1960-2012 period.
If instead we do the same calculation for all the names in the SSA list (those with a count of greater than 5), we get this picture for the period from 1960 to 2012:
There's certainly still an upwards inflection shortly after 1990, though now the post-1990 change is only about 39%, and there's almost as much of a rise (31%) from 1960 through 1979.
If we do the same calculation from 1880 to 2012, we get this graph:
And this picture, in my opinion, makes it somewhat less plausible that whatever has happened since 1990 is due to the increase in typing caused by the rise of AOL and so forth.
What explains the steep drop from 1945 to 1955? Or the big rise (of 42%) from 1955 to 1979, now larger and longer than the 39% rise from 1990 to 2011?
I would guess that the down/up/down oscillations from 1945 to 1955 to 1979 to 1990 were caused by swings in the popularity of a few names, name-morphemes, or name fragments.
In the pre-personal-computer era, these causes seem more plausible than oscillations in the amount of typing that Americans do, or any other activities that might be imagined to give them changing valence-associations for different letters. And it's not at all obvious that what's happened since 1990 is different.
In addition to changes in the popularity of particular names, there are large changes in name spelling that may have quite a bit of leverage on "mean RSA" measures. Thus the popularity of names with final "i" has changed recently, in ways that parallel the "mean RSA" changes:
This might be because "i" is a right-side letter — or it might be because Naomi is one of the biblical names that have become popular recently, and because Charli and Laci seem somehow cuter than Charlie or Lacie, and …