Language Log

Response to Jasmin and Casasanto's response to me

March 17, 2012 @ 3:19 pm · Filed by Mark Liberman under Psychology of language

For the background of this discussion, see "The QWERTY effect", 3/8/2012; "QWERTY: Failure to replicate", 3/13/2012; and "Casasanto and Jasmin on the QWERTY effect", 3/17/2012. In their reply to me, C&J make three basic points:

"We’re not concerned with Liberman’s subjective evaluation of the QWERTY effect’s size or of our study’s importance."
"The QWERTY effect is reliable. Replication is the best prevention against false positives. In this paper, we demonstrated the QWERTY effect *six times*: in 5 corpora (one of which we divided into 2 parts, a priori), in 3 languages, and in a large corpus of nonce words."
"There’s a reason why scientific results go through peer review, and why analyses are not simply self-published on blogs. If there were a review process for blog posts, or if Liberman had gone through legitimate scientific channels (e.g., contacting the authors for clarification, submitting a critique to the journal), we might have avoided this misleading attack on this paper and its authors; instead we might have had a fruitful scientific discussion."

I'll take these up one at a time.

1. The QWERTY effect's size. As far as I'm concerned, and as far as the general public is concerned, the size (and therefore the practical importance) of the QWERTY effect (if it exists) is the key question. This is not an entirely subjective matter — we can ask, as I did, what proportion of the variance in human judgments of the emotional valence of words is explained by the "right side advantage". The answer is "very little", or more precisely, around a tenth of a percent at best (at least in the modeling that I've done).

I focused on the effect-size question because the press release said the following (and the popular press took the hint):

Should parents stick to the positive side of their keyboards when picking baby names – Molly instead of Sara? Jimmy instead of Fred? According to the authors, “People responsible for naming new products, brands, and companies might do well to consider the potential advantages of consulting their keyboards and choosing the 'right' name."

So C&J may not be interested in my subjective evaluation of the effect size, but they promoted their own subjective evaluation by suggesting that the effect is important enough to matter to people choosing names. I felt (and feel) that this represents a serious exaggeration of the strength of the effect; and it seemed (and seems) appropriate to me to say so publicly.

2. The statistical reliability of the QWERTY effect. My first response to the article and the press release was to be skeptical of the size and practical importance of the effect. So I independently obtained the English (ANEW) data, calculated the "right side advance" for each of the words, and fit a regression line in order to see how much of the variance was accounted for. As I observed in the original post, the answer was "very little". But the other thing that emerged from the regression was that the slope of the regression line was not statistically distinct from 0 (… in the simple linear regression that I performed — another kind of analysis might yield a different estimate of the uncertainty of the slope estimate…)

I probably should have ignored this, since my main interest was in the strength of the relationship between RSA and emotional valence of words, not in the question of whether there's any real relationship at all. Rather than go into the statistical details, I emphasized the weakness of the effect by showing how comparatively easy it was to obtain a similar result by chance re-assignment of valences to RSA values. That argument was informal at best and misleading at worst — so let's try it again in a more careful and responsible way.

Since C&J quite properly point to replication as the key to effect reliability, I hunted down the Spanish (SPANEW) data from Redondo et al. 2007. Here's the scatter plot with the regression line.

(The Spanish data itself, taken from the file provided with Redondo et al. 2007, is here — the fields are word, RSA, mean valence, std valence. In order to account for the layout of Spanish keyboards, I've used the equivalent U.S.-keyboard letters, such as ';' for 'ñ', except that I've used underscore in place of single quote, because R read.table() doesn't like single quotes).

True enough, the slope of the regression line (0.028) is positive. But again, the effect is on the (wrong side of the) borderline of significance. Here's what R says about the fit:

Coefficients:
___________Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.76936 0.07197    66.265  <2e-16 ***
x           0.02777 0.02558     1.086    0.278

---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.142 on 1032 degrees of freedom
Multiple R-squared: 0.001141,	Adjusted R-squared: 0.0001728
F-statistic: 1.178 on 1 and 1032 DF, p-value: 0.2779

As another approach to significance testing, we could try looking at the distribution of slopes for random re-assignments of SPANEW valence estimates to RSA values. Rather than doing it three time, I did it 10,000 times. Here's the distribution of slopes in the 10,000 random re-assignments:

The slope is as great or greater than 0.028 in 14.05% of these (equivalent to a one-tailed test; in a two-tailed test the number would be roughly twice as great) — so this test also suggests that the effect might not be statistically significant (in a simple linear regression on the SPANEW data set).

I've now tried this on six data sets — (1) the overall ANEW data, (2) the ANEW data for male subjects, (3) the ANEW data for female subjects, (4) the DANEW data from the C&J paper, (5) the overall SPANEW data, (6) the English Hedonometric data from Peter Dodds' lab.

The first five cases were all basically like the Spanish data discussed above — a weakly positive regression slope, which (depending on what statistical test you use) is generally not statistically distinguishable from zero at the .05 level. However, I agree that it's suggestive that all five cases (and the others that C&J discuss) seem to show a positive (though very small) effect. And (according to their analysis) merging the English, Danish and Spanish data together does pass the statistical-significance bar.

In the sixth case — the Hedonometric values — the slope was weakly negative.

This is all certainly worth looking into more carefully, though the main point from my perspective is that any relationship is a very weak one. In exploring the nature and possible causes of these patterns, there are a lot of possibilities to explore. One observation is that the positive slope of the relationship between RSA and valence seems to be driven to some extent by the large leverage of the small number of words with extremely low or extremely high RSA values — thus in the SPANEW data, if we look only at the 1023 (of 1034) words with RSA between -7 and 5, the slope goes down from 0.028 to 0.014, and the p value goes up to 0.596. We might also consider the idea that the arrow of causation might go in the other direction, e.g. because the inventor of the QWERTY layout was motivated to assign negative-valence letters (i.e.letters that are relatively common in negative-value morphemes) to the left hand. But the interest of the exploration, in my opinion, is lessened by the weakness of the relationship.

3. The appropriateness of discussing scientific publications in a blog. Some serious scientific discussion has always taken place in informal and un-refereed media — hallway conversations at conventions, lab meetings, letters and email, colloquium presentations and the associated question periods, presentations at unrefereed or lightly-refereed conferences, working papers, and the like. In recent years, important discussion in several fields now takes place in exchanges of un-refereed preprints in the arXiv and other repositories. And yes, even in blogs — and in principle, I'd defend the value of carrying on such informal discussions in the light of blogospheric day.

But that's not what's happening here.

What happened here is that a scientific paper became — with a stiff push from its authors — a topic of interest to the general public, with reports in media outlets ranging from serious intellectual outfits like WBUR to tabloids like The Daily Mail. There's nothing wrong with that at all — I'm 110% in favor of promoting scientific research results in the public square. But when a piece of science (or engineering, or humanistic scholarship) becomes a matter of public interest and public discussion, it's odd to argue that it's a violation of professional etiquette for other scientists and engineers and scholars to join that discussion, and that instead they must submit their comments for evaluation and possible eventual publication in a peer-reviewed journal.

If C&J's QWERTY paper had been published in Psychonomic Bulletin and Review without any public fanfare, I wouldn't have written a word about it. But when someone sends me a link to something like the image below in the popular press, I'm curious enough to look into it and to report what I find.

It's possible that a productive exchange can result, as (for instance) it recently did with Keith Chen. But my initial motivation is to improve the quality of the public discussion. I continue to believe that I've done so.

March 17, 2012 @ 3:19 pm · Filed by Mark Liberman under Psychology of language

Permalink

51 Comments

Jeremy Wheeler said,

March 17, 2012 @ 4:24 pm

I'm afraid that I am not in a position to comment on the scientific points being argued here but I would ask, in view of the picture shown, what I should make of my Hungarian keyboard (on which I also write in English) which has a QWERTZ arrangement – the Z and the Y being in the opposite positions to a QWERTY.
Jeremy Wheeler said,

March 17, 2012 @ 4:32 pm

Having done second what I should have done first (read ALL the comments on the original post) I realise that I have made a twit of myself with my comment above. Hmmm… a lesson there, then.
D.O. said,

March 17, 2012 @ 4:35 pm

…the hypothesis that the inventor of the QWERTY layout was motivated to assign negative-valence letters to the left hand.

Is there really such a thing as happy/sad valence for letters? I will be surprised (which is a small matter), but more importantly, this fact should have changed the discussion completely. If there is such an effect, it presumably derives from the letter-sound correspondence and happy/sad effect for sounds, which must be the main feature on top of which all specific letter effects (QWERTY or not) is a small ripple. It is very strange to discuss statistically borderline discernible effect ignoring the main thing completely.

[(myl) I didn't mean to argue that such an association exists — though C&J aim to show this — but merely to suggest that if it does, it might have caused the design of the QWERTY keyboard layout, rather than resulting from it.

C&J reject hypotheses of this general sort, mainly on the grounds that they found the effect in three different languages. But there may be enough cognates to obviate this objection. Then again, something entirely different may be going on.]
Jeff Carney said,

March 17, 2012 @ 5:12 pm

On the question of appropriateness, Mark, I might add that the blogosphere effectively eliminates one of the perennial problems of traditional scholarship: timeliness. Had you gone through "legitimate scientific channels" (assuming such exist as they once did), you complaint might have taken many months to reach the public eye. Here, you reach it very quickly. This is all the more important since it is the reaction of the popular press that you are concerned about, and rightly so.

As to the appropriateness of YOUR blog in particular, I think we all know by now that LL holds a unique place in the world of scientific publication. Your work here, and Pullum's, and Mair's, et al., get cited all over the freaking place. You have already redefined what "legitimate scientific channels" are in the first place.

Am I fawning? Sorry. I just like what you do here.
The Ridger said,

March 17, 2012 @ 5:33 pm

The picture makes me look at my keyboard and find things like "happy" KILL POUT PUNK and "sad" DEAR, AWARD, SWEET …

Goodness.
Andrew Gersick said,

March 17, 2012 @ 6:57 pm

Very graceful backpedaling! I think you would pay the most respect to your reader's intellects if you were explicit that secondary analysis was a complete blunder, though. The only valid complaint you have is that via the popular media, C&J may have somewhat exaggerated the importance of the effect. That's a far cry from your initial stance, and it's clear now you're just trying to muddy the waters after their rather potent riposte. By not explicitly retracting your failed analysis, you're trading saving face with your selection of core fans (some of whom are drooling despite poor handling of the paper in this very comment thread) for respectability with the larger community. This only serves to support C&J's observation that you were acting irresponsibly and misusing this platform from the start.

[(myl) I'm trying to be civil here, but I haven't changed my mind about anything. It remains true that none of the single-language data sets that I've tested show a statistically significant relationship; to show the basic effect in the original paper required some non-obvious techniques, like amalgamating data across languages. And the paper's sweeping conclusion is far from warranted by the at-best marginal effects:

As people develop new technologies for producing language, these technologies shape the language they were designed to produce.

The meanings of words in English, Dutch, and Spanish are related to the way people type them on the QWERTY keyboard.

If there's an effect here — and it's not clear that we can find the effect in an individual language — it's accounting for a few hundredths of a percent of the variance in one rather weak dimension of meaning; and much of this seems to be contributed by a few words with extreme values of the "RSA" feature. This hardly justifies phrases like "these technologies shape the language" or "the meanings of words … are related to the way people type them". And the press release for the paper exaggerates more than slightly — it takes these over-broad conclusions and hypes them as relevant to baby-name choice and the commercial naming process.

I don't feel that I was either acting irresponsibly or misusing this platform in pointing these things out. I feel that it's a bad thing for journal PR staff to over-hype marginal results, and a bad thing for researchers to encourage this, and a good thing in such cases for someone to say, in public, "Wait a minute, this doesn't seem to make sense".

I'm always happy to publish and link to responses, and to admit error when I'm wrong. ]
Colin Danby said,

March 17, 2012 @ 9:21 pm

Thanks Mark for such careful and patient work. This would be a great example for a statistics class.
Aaron Toivo said,

March 17, 2012 @ 9:21 pm

It doesn't take a rocket scientist to see that demanding that criticism be routed through slow channels is demanding to stifle it.

Of course it is right and proper that one may wish to defend one's results, but the method of transmission of criticism has little to do with its merit, and let us not pretend that its merit will not be analyzed just because it is a blog post.
Chance said,

March 17, 2012 @ 9:48 pm

HAPPY words (right side):

punk
junk
ploy
pout
lumpy

SAD words (left side):

dear
best
cared
grew
fast
feast
Pharmamom said,

March 17, 2012 @ 10:48 pm

This is what happens when non-scientists misuse scientific tools. In medicine, an effect of this size–if indeed there is an effect, would be called clinically insignificant. That is, a medication that moved the needle so slightly, and only upon meta-analysis of several trials, would never pass the laugh test.

And it is indeed hilarious to suggest that marketers choose names based on the effect. This is obvious: either the consumer base for a product is huge–in which case, the vast majority don't habitually type at all, so the keyboard's magic properties are irrelevant to their emotional responses, or the consumer base is a discreet, highly educated group–many of whom type well. In the latter case, I suspect an affinity for LH words because they are easier and faster to type. So my emotional response to LH words is happiness.

Linguistics is certainly a fascinating and legitimate area of study, but it isn't science.
Michael Johnson said,

March 17, 2012 @ 11:03 pm

@Pharmamom

"Linguistics isn't science"? On what grounds do you assert that? Cassanto and Jasmin aren't linguists. And even if they were, no right-thinking scientist would accept an inference from "The effect so-and-so is trumpeting in paper X doesn't pass the laugh test" to "there is not nor can there be a scientific study of the objects in the domain of X".
Jeff Carney said,

March 18, 2012 @ 12:13 am

@Andrew Gersick

You can't possibly be serious? Are you even a real person? Are you instead some sort of shill for Casasanto and Jasmin?

Mark's original conclusion was this:

Again, the effect is not statistically significant — and in any case is not large enough to be a concern for companies naming products or parents naming children, with 0.1% of variance in valence judgments accounted for by the "QWERTY effect".

I see no evidence of backpedaling from this. Quite the contrary.
marie-lucie said,

March 18, 2012 @ 12:19 am

Linguistics is certainly a fascinating and legitimate area of study, but it isn't science.

Perhaps the writer thinks that the analysis of QWERTY, etc above is what linguists typically do. This is one of the many misunderstandings that Language Log is trying to remedy.

This reminds me of another definition of a linguist that I recently ran into: a friend lent me a book about a true crime story, an actual case in which a man described as "a brilliant linguist" was convicted of the murder of his ex-wife. The man had indeed taught linguistics and even published an English grammar, but the author's many references to his brilliance as a linguist had to do with his alleged mastery of "rhetoric", his ability to argue back and forth with the police and not let them manipulate him into giving the answers they expected.
Matthew Stephen Stuckwisch said,

March 18, 2012 @ 12:25 am

To remove the briefly mentioned cognate effect between languages, wouldn't it be good to compare with languages whose keyboards are substantially rearranged due to a different script? E.g. having a go with Russian, Hebrew, or Arabic? Unfortunately Korean is probably a no-go because it simply has consonants on the left and vowels on the right.

[(myl) I don't know any available lists of average perceived "valence" of words in any of those languages, though in this era of rampant "sentiment analysis" and similar things, we should see such things soon…]
Jens Fiederer said,

March 18, 2012 @ 12:55 am

@Jeff Carney, I think the "backpedaling" bit was not about the conclusion but about that one paragraph beginning with "I probably should have ignored this…"

I'm probably biased (as a Language Log fan), but what struck me the most in the exchange was the courtesy, playfulness, and interest of our hosts compared to the touchiness and hostility of the authors. It sucks to feel yourself "attacked", but it seems to me that scientists should WELCOME public interest in their work, and where that interest seems misguided, offer gentle corrections rather than broadsides.
That's what SHE said,

March 18, 2012 @ 1:28 am

I'm a bit disappointed (but not surprised) by C&J's reply. I'm tempted to quote several passages from Sedivy's "Replication Rumble" post today, but this should still be fresh on everyone's mind (if not, read it now). This, too, is not a cagematch. C&J have made certain claims; Liberman finds their conclusions overstated. For C&J to claim that they "are concerned with [Liberman's] misrepresentation of the reliability of [C&J's] findings" does not help their cause. The charge of "misrepresentation" is especially overblown, since Liberman has merely questioned C&J's conclusions. Their insistence on "legitimate scientific channels" is somewhere between distressing — insisting that criticism not leveled through the proper channels is somehow tarnished reminds me of the practices of oppressive regimes — and ridiculous — as if a blog is inherently illegitimate or unscientific. This sort of posturing detracts from the substantive discussion. (Also: If anything, the quality of discussion of this issue in the blogs has been above the coverage in the popular media.)

Perhaps a useful notion to consider is what Andrew Gelman refers to as "Type S" and "Type M" errors. A quick and not entirely accurate summary by a third party can be found here: http://www.johndcook.com/blog/2008/04/21/four-types-of-errors/ Briefly, C&J want to argue in terms of Type 1 errors: they claim the slope of the regression line is nonzero, and if in reality it was zero, they would have made a Type 1 error. But Liberman and others have been complaining about something vaguely akin to Gelman's "Type M" error: the magnitude of the effect has been overhyped in the popular media, allegedly with some support from C&J. To be fair: A genuine "Type M" error would be if the true magnitude of the effect is far off from the estimated magnitude, whereas here we're dealing with the reported magnitude, i.e. the reports in the popular media conveying that this is A Big Deal when it is anything but. The point that seems irrefutable is this: even if we suspend any disbelief and assume that the QWERTY effect is real, it's still not spectacular. It explains very little about the emotional valence of words. Adding more right-side letters to a made-up word buys you very little additional valence.

But even engaging C&J on their turf, I find it hard to get around several doubtful aspects. I do not put too much stock in replication tests. I'd rather see a bootstrap of the regression slope. When I do this myself on ANEW, I see that the 95% bootstrap confidence interval of the slope includes 0, and more generally that even the most optimistic estimate of the magnitude of the slope is small.

The tentative attempts at a causal explanations in terms of "left == sinister" have been unsatisfying. I wouldn't be surprised if this contributed a lot to the doubts expressed here. A nearby causal explanation that should be considered would be in terms of the bouba-kiki effect. Suppose some kind of sound symbolism effect like bouba-kiki applies to the nonce words. Then there is a potentially complex relationship between sounds and letters, and a simple one between letters and keyboard arrangements. This becomes much harder to study, since the sound symbolism effect could well be language-specific, the sound-letter relationship varies by language, as do keyboard arrangements. One would have to play recordings of spoken words to test subjects in one condition vs. having them read nonce words in another, plus a pilot to make sure the way subjects read nonce words matches the recordings. I would be stunned if a robust QWERTY effect (regardless of magnitude) emerged after controlling for sound symbolism based on an audio-only condition.
Gene Callahan said,

March 18, 2012 @ 7:21 am

@Gersick: "Very graceful backpedaling! I think you would pay the most respect to your reader's intellects if you were explicit that secondary analysis was a complete blunder, though."

NOTHING in this new post is indicative of any backpeddling! You must have an axe to grind, Gersick.
Theo Vosse said,

March 18, 2012 @ 9:37 am

When I squint at the advantage vs valence plots, both in this article and in the previous one, I don't really see a normal distribution (which is most likely assumed in all analyses). The plot above actually looks bimodal. The data from Dodds on the other hand looks more like it.

[(myl) You're right about the bimodality:

Bootstrap estimates of various statistics should still be OK, though.]
Zé said,

March 18, 2012 @ 10:17 am

One thing to keep in mind, when using the Portuguese data, the accents are all on the right, `´^~, as are the ç and the subscripts for o and a which are used with numbers and honorifics, Spanish is similar, with all the common accents and the ç and the ñ being on the right.

Examples of layouts:
https://www.forlanglab.lsu.edu/exams/KeyboardLayout/CommonKeyboardLayouts.aspx

[(myl) What Portuguese data are you talking about? As for the SPANEW data, I of course treated the accents and so on according to where they lie on typical Spanish keyboards, which is indeed on the right side.]
Kyle said,

March 18, 2012 @ 11:51 am

@Jeff Carney: Andrew Gersick is so much a real person that he has an office across the hall from mine.

[(myl) The Andy you know is not the person who posted the comment above, whose IP address locates him near Youngstown OH.]
Jeff Carney said,

March 18, 2012 @ 1:57 pm

@Kyle

I won't ask
UK Lawyer said,

March 18, 2012 @ 3:29 pm

This hallway perhaps?

http://www.ircs.upenn.edu/people/index.shtml
Perestroika said,

March 18, 2012 @ 3:43 pm

@ Pharmamom:

Exactly what criteria are you using to distinguish science from pseudoscience? The kind of discord that statistical methodology sparks among people who subscribe to its beliefs is common to all areas in which it is used. This is, of course, to be expected since the validity of virtually all claims scientists and "non-scientists" make hinges on sound statistical methodology.

– "a medication that moved the needle so slightly, and only upon meta-analysis of several trials, would never pass the laugh test."

While your condescension is much appreciated, to make the observation that clinical trials (the results of which often directly affect the lives and health of many people) go through a more stringent statistical "check" than academic papers which decidedly do not have this effect is, quite frankly, insignificant.

There is a wonderful article by Tom Siegfried published in Science News which talks specifically about the dangers and misconceptions associated with conducting research that aims at achieving statistical significance.
You can find it here: http://ckwri.tamuk.edu/fileadmin/user_upload/A-Litt/Odds_Are__It_s_Wrong_-_Science_News.pdf
Kyle said,

March 18, 2012 @ 4:19 pm

@UK Lawyer: That's right. I'm the only "Kyle", he's the only "Andy". Though I couldn't disagree more with what Andy wrote.
Kyle said,

March 18, 2012 @ 4:21 pm

@That's what SHE said, I'm quite surprised they responded at all, which serves to dignify unrefereed debate, which they claim to be opposed to. They should fire their publicist.
Eric P Smith said,

March 18, 2012 @ 5:05 pm

Suggested new coinage: qwerty (adjective) quirky, questionable. As in “This new research looks a bit qwerty to me.”

If this coinage took hold, then qwerty would be an example of a negative valence word whose letters mostly appear on the left side of the keyboard. As such, it would increase Casasanto and Jasmin’s p values. But would it genuinely tend to support their thesis?
a George said,

March 18, 2012 @ 5:41 pm

I am very surprised at the hype that a puny, probably insignificant bias has received and at the fact that the authors have not tried to pull the general public back on track (ha, ha, I know, fat luck, but anyway). I was considering that perhaps another mechanism is at play: the need to make headlines. One of the authors might not need it – he is well published, but the other less so. And even bad press makes headlines and a Google presence.

I have seen the following mechanism at work a few times: you invent a causality that it is very difficult to disprove outside a very small specialist community. You baffle academics and the general press, and you get funding and an academic degree. Unfortunately, it takes academia years to mop up after such exploits, laboriously taking publications to task in further, more informed papers.

In one case, a phd student had had a brainwave and suggested that performance of a task was influenced by competition with a machine. The manual performance was well known and it had indeed changed, and the causes for this change were well-known in certain narrow circles. However, nobody had tried to suggest the machine influence. This US phd student made a tour of Europe and in various fora gave the same paper. In at least one forum good arguments against the machine theory were presented and in a write-up of the forum they were repeated. No problem, all protests were in obscure fields, and the phd was awarded – the supervisors had no clue. The doctor went on to other fields, had a good and well-respected academic career, and did nothing further on the subject. However, his early publications are found in literature searches, and the machine theory has to be taken seriously, even if only to be taken down – again and again.

In the other case, a laboratory was threatened by severe reductions. One piece of equipment seemed to be useful in a completely different line of endeavour, and its use was promoted as a solution to a purported problem. The laboratory entered into international cooperation, but only to suck publishable material from its partners. The purported problem consisted of two parts, one of which was already solved. This did not discourage the laboratory: in publications 2/3 of the content was directed to providing the old solutions to the old part problem. As this was very old historical information, peer review had no clue. From the beginning assisted by the press and for next step with hints that the laboratory would be able to provide answers to certain outstanding JFK assassination questions, ample funding was provided to continue research and spread the net. The small circle of researchers who knew better did not dare to oppose this mighty force, because they felt that this would remove any focus there might have been on the field, and the field in general would suffer. The sad fact is that all the barren material that has come out of the puffed-up exercise will remain as dead weight for many years to come. And the funding has essentially been wasted for society.

Many LL readers will know of similar cases, I am sure. Whistle-blowing endangers your prospects. Thank you, MYL, for taking on the task, it is rather thankless!
Rubrick said,

March 18, 2012 @ 5:43 pm

@Eric P: I am madly in love with your suggestion, although I'll admit I'd feel a little bad for C&J if it caught on.
Jeff Carney said,

March 18, 2012 @ 8:00 pm

@Kyle & Mark

Is this, like, weird or all in the spirit of collegiality?
YM said,

March 18, 2012 @ 8:29 pm

Can the QWERTY effect be explained by the relative abundance of vowels on the right (4 including y) vs the left side (2) of the keyboard?
Pharmamom said,

March 18, 2012 @ 8:56 pm

@Perestroika,

I see my assertion that this is not science struck a nerve. I did not mean insult. There is not only "science" and "pseudoscience," if by using those terms one implies that pseudoscience is fakery. Perhaps what I refer to as science is a strict definition where subjective measurements are prohibited. As soon as human emotions and behavior are part of the equation, I would say that correlation and causation become hard to distinguish.

As far as the perils of statistics, I've read the article, thanks. Working in the industry I do, I see first-hand the misuse and misunderstanding of statistics. I have also learned quite a bit about the manipulation of statistics (value-neutral) from a number of econometricians. Analyzing complex systems with an unknown and perhaps unknowable number of variables is…difficult. To make scientific claims about the causes of human behavior is even more difficult. Perhaps my lack of familiarity with the field of study renders me effectively a moron, but I don't buy causation between how a word is typed on the qwerty keyboard and the emotional effect is has on people.
ENKI-][ said,

March 19, 2012 @ 8:47 am

I have a sneaking suspicion that if you replaced the keymap with arbitrary layouts (dvorak, plum, azerty) or even somewhat randomized layouts (to account for human decision-making) you could fudge the data no more so than they did in the original paper and get comparable results. If that's so, then not only is the effect too small to be relevant, but it is arguably too small to be said to actually exist.
KathrynM said,

March 19, 2012 @ 8:55 am

As I read through all this, I found myself blinking over one fact about the data Casasanto and Jasmin used. It appears that the affective norms for 150 of the 600 words in ANEW were assigned in 1974, using mixed groups of male and female subjects. How many males old enough to participate in such studies in 1974 would even have been acquainted with the QWERTY keyboard, much less sufficiently proficient in its use to be influenced by it? Indeed, in order to reach the conclusions they reach, wouldn't you need to be working with affective norms assigned by a group ALL of whom knew how to touch type? Or am I missing something about their conclusion?

[(myl) Although the original Semantic Differential Scale dates from 1974, and the SAM affective rating system dates from 1980, the ANEW norms were based on data collected from undergraduate psychology students in the late 1990s (the reference is Bradley & Lang 1999). I suspect (though I don't know) that most undergraduates could type at that time — though what fraction used touch-typing vs. hunt-and-peck is less clear to me.]
Brett said,

March 19, 2012 @ 10:24 am

This discussion has reminded me that the terminology of typing ability is rather lacking. People tend to distinguish "touch typing" from "hunt and peck," and while the latter term is still quite relevant, the former is much less so. "Touch typing" refers to the ability to type without looking at the keyboard. This was a necessary skill when most typing involved the duplication of (others') handwritten documents in typewritten form. However, most typing is now done by people as they are composing the material in their heads.

My mother wanted me to learn to type as a pre-teen without looking at the keyboard (simply because she thought that was the "correct" way to type, even though it made no sense with word processing technology). I wasn't able to learn that way, and I couldn't type properly until I decided to ignore her completely and to look at the keyboard while I was typing, at which point it became very easy. I don't need to look at the keys any more, but it took me several years of regular computer work to reach that point; and to this day, I am a faster typist when I'm looking at the keyboard.

I don't know how typical my experience is, but there are certainly people know who can type quite competently but still cannot touch type; they need to look at the keyboard with some frequency to keep going. We really ought to have an updated piece of terminology for the currently relevant standard for competence in keyboarding.
KathrynM said,

March 19, 2012 @ 12:06 pm

Thank you, Mark–I have to agree that by the late 90s there was probably little or no gender-based distinction between fluent two-handed use of a keyboard (whether by touch-typing, or while looking at the keys-fascinating point, Brett!) and hunt-and-peck. Or some variation of hunt-and-peck. . .
John Ross said,

March 19, 2012 @ 6:40 pm

@Pharmamom
Wikipedia: "Science (from Latin scientia, meaning "knowledge") is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe."
What does the field of linguistics lack for it to fit that definition, or any other day-to-day definition of "science"? I suspect you are confusing the word "science" with the idea of "scientific method" (and if you think the latter is not used in linguistics, you might be in the wrong place).
@ Brett
I understand it immediately, but I had never heard the expression "hunt and peck" before, though I did teach myself to touch type as a teenager, so I think you could have their relative relevance the wrong way round.
Rod Johnson said,

March 19, 2012 @ 8:44 pm

I'm a little bemused by Katherine's question. I was a college freshman in 1974, and I was certainly familiar with the qwerty keyboard on my trusty Smith-Corona. Everyone I knew in college (and most in high school) had used a typewriter, male or female. Is that surprising?
Colin Danby said,

March 19, 2012 @ 10:16 pm

Pharmamom: Almost everyone posting here doesn't buy that particular causation *either*, in part because the correlation is so extraordinarily weak. You seem to be completely missing the point of Mark's post and the subsequent discussion. You're also apparently unaware that you're condescending to people who understand the issues around the use of statistics in social science much better than you, judging by your last post.
Daniel Casasanto said,

March 20, 2012 @ 9:05 am

In his March 17th post, Mark Liberman explained that his main concern about our article was the way it has been represented in the mass media:

“If [J&C’s] QWERTY paper had been published in Psychonomic Bulletin and Review without any public fanfare, I wouldn't have written a word about it.”

We’re happy to know that this was his main concern. It’s unfortunate that some readers got the impression that our study’s credibility was at issue.

This reply also suggested that running permutation tests on the slopes of the regressions would be an effective way to demonstrate the reliability of the QWERTY effect across corpora. We agree. A feature of permutation testing is that the results do not depend on which test is being iterated, so permutation tests on the slopes of the regressions yield, in principle, the same results as the permutation tests on the r-values we reported previously (p-values may differ minutely, due to the nature of a randomization procedure).

The slopes – which we reported in our paper (i.e., B-value estimates for the regressions) – provide a measure of the QWERTY effect that is intuitive and robust. As we have seen, r-squared values for raw, un-averaged data can be very small, even for a highly reliable effect. But the r-squared values for the *same data* can appear much larger when the data are averaged. Most r-squared values that get reported for behavioral experiments are on grand-averaged data, not on raw data. It is difficult to base intuitions about an effect on a metric that can change, often by orders of magnitude, depending upon the authors’ decision to average or not to average. Unlike the r2, however, the slope is robust to decisions about averaging.

Below are the results of 10,000-iteration permutation tests on the slopes of the simple regressions for the 10 analyses we reported previously (i.e., 6 corpora in 4 languages, their relevant aggregates and subdivisions). These analyses are on the raw data, controlling for nothing:

EXPERIMENT 1
a. All ANEW words (combining English, Dutch, & Spanish):
N=3099 words, Observed slope=0.044, p=.002

b. Dutch ANEW only:
N=1031 words, Observed slope=0.051, p=.028

c. Spanish ANEW only:
N=1034 words, Observed slope=0.035, p=.095

d. English ANEW only:
N=1034 words, Observed slope=0.043, p=.039

EXPERIMENT 2
e. All AFINN words (Pre- and Post-QWERTY):
N=2477 words, Observed slope=b=.037, p=.001

f. Pre-QWERTY AFINN words:
N=2414 words, Observed slope=.029, p=.009

g. Post-QWERTY AFINN words:
N=63 words, Observed slope=.36, p=.003

EXPERIMENT 3
h. All nonce words:
N=1600 words, Observed slope=0.026, p=0.0000004

NEW EXPERIMENT (On a corpus published after J&C 2012 went to press)
i. European Portuguese ANEW:
N=1034 words, Observed slope=0.055, p=.004

NEW ANALYSIS (On AFINN words that are not in ANEW)
j. Words unique to AFINN:
N=2178 words, Observed slope=0.034, p=.002

To summarize, the permutation tests on the slopes show nearly the same results as the permutation tests on the r-values we reported previously — for all of the data reported in our paper, and for the new Portuguese ANEW, published recently. In 9 out of 10 permutation tests, the QWERTY effect is significant at p<.05; in 7 out of 10 tests, the QWERTY effect is significant at p<.01.

The bottom line: If permutation tests on raw data come out significant time and time again, across several corpora and languages, this is strong evidence of a reliable effect.

So, the QWERTY effect is real. The data so far are correlational, so its causes remain open for debate and further investigation. Further studies, whether replications or non-replications, may help to determine the mechanisms by which the effect arises, and the conditions that determine its strength.

Our primary motivation for this study was theoretical – investigating a non-arbitrary relationship between form and meaning, which we predicted on the basis of a body of previous work, from our lab and others’ (e.g., Sian Beilock’s, Gordon Logan’s). The theoretical interest of this relationship between culture and language does not depend on the size of the measured effect or its practical applicability.

Possible real-world consequences of the QWERTY effect remain to be explored. We agree that the suggestions made by some reporters are over the top, and go far beyond any reasonable interpretation of the one sentence in our paper that we devoted to speculating about possible practical implications. Most of the journalists who reported on this study never contacted us, and even if they had, we are not responsible for what they’ve written. The quality of reporting about our paper in the media (some of which has been thoughtful and moderate) is not a valid index of the quality of our paper.

Daniel Casasanto and Kyle Jasmin
Catanea said,

March 20, 2012 @ 9:49 am

Perhaps there are many more touch-typers out there than we are hearing about.

If people start SAYING "teh" as well as typing it, will that show the influence of technology on language?

NB – Anecdotal evidence: I find ALL words that must be typed with a [either] single hand to be somewhat unpleasant. Really agreeable words are those that are typed with letters from each hand alternately are the most fun.
Brett said,

March 20, 2012 @ 2:53 pm

@ Catanea: I know numerous people who say "teh" ironically.
Random Bystander said,

March 20, 2012 @ 3:51 pm

From Psychonomic Bulletin to the press, and from the press to Wikipedia.

"People tend to give more negative connotations to words which are "harder" to type on a QWERTY keyboard (for instance, words which have more letters from the left hand side of the keyboard), and more positive connotations to words which are easier to type; this is similar to the discovery that people are judged more positively if their name is easier to pronounce.[17]"

(from Wikipedia's "QWERTY" page, under "Effects", citing the Economist article)
BobN said,

March 20, 2012 @ 4:35 pm

@Daniel Casasanto

The primary issue as to why many people are questioning this is that, in this case, the r values are important. Just because you can do a simple linear regression through a globular cluster and get a good p value doesn't meant you should. the fact that the r value is essentially zero means the RSA and the valence ranking are essentially uncorrelated; the good p values don't mean anything.
Barbara Phillips Long said,

March 21, 2012 @ 12:08 am

@ pharmamom:

In a reply to Perestroika, you say, "Perhaps what I refer to as science is a strict definition where subjective measurements are prohibited. As soon as human emotions and behavior are part of the equation, I would say that correlation and causation become hard to distinguish."

Your comment made me wonder whether some drugs can be tested with complete objectivity. If patients have to be asked if the anti-hallucinogen or the painkiller reduce the patient's symptoms, then there is no objective measure of efficacy by your standards.

I'm in favor of using the scientific method, appropriate statistical methods and objective evidence, but how do you measure the anti-hallucinogen when the patient says "most" of the hallucinations have gone away, but their eyes ache and their toes tingle until the dose wears off? Aches and tingling are hard to measure and "most" is undefined — the report is subjective. My impression is also that we still don't know enough about the brain to objectively monitor medications without some (necessarily subjective) input from the patient.

So, someone has to decide how efficacious the drug was for one patient in one trial or the patient has to rank it on a scale, and after hundreds or thousands of those judgements are made, is the data laughable or scientific or just the best approximation available? I think analyzing data appropriately is constantly being refined in many areas of scientific endeavor, and debating methods in public is the only way to learn how to draw more reliable conclusions.

Looking at the range of posts on this site's archive would give you an idea of the varied topics linguists and linguistics consider. It's not all subjective.
Sam said,

March 21, 2012 @ 2:06 pm

Doesn't the right hand carry more responsibility on the keyboard and wouldn't that extra work lead to less happiness for the typist? Wouldn't this create enough of a balancing effect to shut down their silly argument entirely? With the right hand being solely responsible for hitting return, delete, printing parentheses of all kinds, typing all the punctuation (except for the exclamation point), hitting the arrow keys and the number pad? Are there happy and sad sentences, paragraphs and entire papers? I certainly hope they aren't getting paid a lot of money to do this research.

Hope this wasn't redundant.
KathrynM said,

March 21, 2012 @ 10:09 pm

How can you reach conclusions about the effect of a keyboard layout on word usage/values, without having detailed information about the keyboarding habits of the folks whose value judgments you are relying on? Possibly I've misunderstood–but if the conclusion reached is that the use of the keyboard to reproduce the word influences the user's affective perception of that word, I don't see how you can reach any defensible conclusion unless you can show how your test subjects use the keyboard.

Assuming, of course, that you can first get past the question of just how statistically significant the reported effect is. But, if someone were thinking of trying to replicate the effect or. . .

Nevermind.
chh said,

March 22, 2012 @ 8:01 am

KathrynM,

Page 5 of Jasmin and Casasanto's 6 page paper, top of the second column- they address your point specifically. I suppose if people type with one hand, or type T with their right hand, this kind of thing could add noise to the data, but I can't imagine that those typers would be a large proportion of the hundreds of participants giving valence judgments in studies within the past 10 years, and I'm not sure how it would lead to a spurious significant correlation. (I'm not saying I'm convinced about the correlation.)

I really expected that J&C would say that the correlation in word valence would emerge in communities of speakers (as change in language usually does), so that it wouldn't matter if informants were skilled, normal typers or typers at all, so long as they were literate and interacted with a community that does lots of typing. This sort of comes up at the top of page 6, but as a tangent. It seems surprising to me to think of this effect, if it exists, would only obtain in individual speakers as a function of their experience with the keyboard, as if people don't learn the meaning/distributions of words from their environment.
Another Andrew Gersick said,

March 22, 2012 @ 1:12 pm

How odd,

If this comment thread is to believed, there's another Andrew Gersick (improbable but not grossly so) in the world who also has a reason to read Mark Liberman's writing (now things are getting gross). So for the record – this is the Andrew Gersick who works in animal behavior at Penn, knows Mark a (very little) bit and has no opinion whatsoever on the QWERTY effect.

To the other "Andrew Gersick" – if you're real, I'm very curious to know your family history; if you're some kind of strange imposter who has stolen my identity for use in linguistics fora – well, that is very very strange.

To Mark Liberman – I hope that the next time I come to ask for your help tracking the pitch of hyena whoops, you won't hold Andrew Gersick's views against me.

Best,

Andrew "Andrew" Gersick

[(myl) Hi Andy — you'd be welcome to take C&J's side if you chose to; but the Other Andrew Gersick posted his comment from an IP address near Youngstown, Ohio; and the Penn directory knows no other Gersicks besides you; so either there really is another Andrew Gersick out there, or this was a limited episode of inter-dimensional permeability, or else it was a peculiar kind of trolling.]
Kyle said,

March 22, 2012 @ 1:25 pm

I have been informed that the "Andy Gersick" above is not the Andy Gersick of IRCS. Let the record show.
The Bad Science Reporting Effect - Lingua Franca - The Chronicle of Higher Education said,

March 24, 2012 @ 1:08 pm

[…] You can read Casasanto and Jasmin responding to Liberman here, and also Liberman's rejoinder to their response here — a response that Casasanto and Jasmin insist is still in error.] This entry was posted in […]
Semantics At Your Fingertips « Lexicon Blog said,

December 12, 2012 @ 11:34 am

[…] As it happens, the QWERTY study has been questioned by other experts. One challenger argues that any effect is tiny (on the order of .1%) and not statistically significant. (For a summary and other references, see Mark Liberman’s post on Language Log.) […]

RSS feed for comments on this post

Response to Jasmin and Casasanto's response to me

51 Comments

Jeremy Wheeler said,

Jeremy Wheeler said,

D.O. said,

Jeff Carney said,

The Ridger said,

Andrew Gersick said,

Colin Danby said,

Aaron Toivo said,

Chance said,

Pharmamom said,

Michael Johnson said,

Jeff Carney said,

marie-lucie said,

Matthew Stephen Stuckwisch said,

Jens Fiederer said,

That's what SHE said,

Gene Callahan said,

Theo Vosse said,

Zé said,

Kyle said,

Jeff Carney said,

UK Lawyer said,

Perestroika said,

Kyle said,

Kyle said,

Eric P Smith said,

a George said,

Rubrick said,

Jeff Carney said,

YM said,

Pharmamom said,

ENKI-][ said,

KathrynM said,

Brett said,

KathrynM said,

John Ross said,

Rod Johnson said,

Colin Danby said,

Daniel Casasanto said,

Catanea said,

Brett said,

Random Bystander said,

BobN said,

Barbara Phillips Long said,

Sam said,

KathrynM said,

chh said,

Another Andrew Gersick said,

Kyle said,

The Bad Science Reporting Effect - Lingua Franca - The Chronicle of Higher Education said,

Semantics At Your Fingertips « Lexicon Blog said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta