Kevin Dutton, "Psychopathy's Double Edge", Chronicle of Higher Education 10/22/2012:

[I]n a survey that has so far tested 14,000 volunteers, Sara Konrath and her team at the University of Michigan's Institute for Social Research has found that college students' self-reported empathy levels (as measured by the Interpersonal Reactivity Index, a standardized questionnaire containing such items as "I often have tender, concerned feelings for people less fortunate than me" and "I try to look at everybody's side of a disagreement before I make a decision") have been in steady decline over the past three decades—since the inauguration of the scale, in fact, back in 1979. A particularly pronounced slump has been observed over the past 10 years. "College kids today are about 40 percent lower in empathy than their counterparts of 20 or 30 years ago," Konrath reports.

As is all too often true for stories about results in social psychology — and especially stories about the Problems with Kids Today — this one is misleading in almost every particular.

The work described is Sara H. Konrath, Edward H. O'Brien and Courtney Hsing, "Changes in Dispositional Empathy in American College Students Over Time: A Meta-Analysis", Personality and Social Psychology Review 8/5/2010. Below I list four of the issues with Dutton's presentation of this research.

1. The trend is based on 72 data points, not 14,000.

As the title indicates, this is a meta-analysis, combining the results of 72 prior studies by many authors, which collectively involve 13,737 college students. As usual in meta-analyses, the studies are combined at the level of averaged results (means and standard deviations), not the responses of individual students.

For some of the problems with this, see Kali Trzesniewski and M. Brent Donellan, "Rethinking 'Generation Me': A Study of Cohort Effects From 1976-2006", Perspectives on Psychological Science 5(1) 2010:

The results of these meta-analytic studies are provocative; however, the cross-temporal meta-analytic technique for identifying cohort-related changes in psychological characteristics is limited [...]  the generalizability of these findings is simply uncertain because the samples typically included in the meta-analyses are not designed to make population inferences. [...] For instance, it is common for researchers in social and personality psychology to use convenience samples in research, such as undergraduates in introductory courses who participate in research in exchange for course credit. These samples provide data quickly and in large numbers, but the individuals in the sample are not selected at random and they are not representative with respect to a defined population of interest. [...]  Increased sample sizes cannot compensate for the limits on inference posed by nonprobability sampling techniques.

2. Declines were observed in only two of four measured dimensions of self-reported empathy, and the decline in these measures was about 10%, not 40%.

Here is how Konrath et al. describe the four "subscales":

[I]n the current study we operationalized empathy as defined by the Davis Interpersonal Reactivity Index (IRI; Davis, 1980, 1983a, 1983c), the only personality scale that follows a multidimensional theory of empathy. The IRI is a 28-item scale that consists of four different 7-item subscales, representing different components of interpersonal sensitivity. Empathic Concern (EC) measures people’s other-oriented feelings of sympathy for the misfortunes of others and, as such, is a more emotional component of empathy (e.g., “I often have tender, concerned feelings for people less fortunate than me”). Perspective Taking (PT) is a more cognitive or intellectual component, measuring people’s tendencies to imagine other people’s points of view (e.g., “I sometimes try to understand my friends better by imagining how things look from their perspective”). The Fantasy (FS) subscale measures people’s tendencies to identify imaginatively with fictional characters in books or movies (e.g., “I really get involved with the feelings of the characters in a novel”). Personal Distress (PD) may be less adaptive in that it measures more self-oriented feelings of distress during others’ misfortunes (e.g., “When I see someone who badly needs help in an emergency, I go to pieces”).  [...]

Here's how they characterize the overall results:

Overall, American college students scored lower on EC and PT between the 1979 and 2009 (see Figures 1 and 2). There is a significant negative correlation between year of data collection and EC (b = –.38, p = .002, k = 66) and PT (b = –.27, p = .03, k = 64) when weighted by sample size. There were no significant changes in either the FS subscale (b = –.19, p = .26, k = 37) or PD (b = .09, p = .55, k = 46).

In crude observational terms, the "Empathic Concern" data as plotted spans a range of a bit more than 20 years (the midpoint of the first bin is 1985 and the midpoint of the last bin is 2007), and declines by  about 100*(3.9-3.5)/3.9 = 10.3%. Konrath et al. take a more sophisticated route to the same result:

For the EC subscale, the regression equation (EC mean = –0.0140 × year + 31.771) yields a score of 4.06 for 1979 and 3.64 for 2009.

These numbers from their regression equation yield a decline of 100*(4.06-3.64)/4.06 = 10.3%, which they attribute to the whole span of 30 years from 1979 to 2009.

Here's their graph for the "Perspective Taking" subscale:

In this case, the crudely-observed decline is 100*(3.65-3.3)/3.65 = 9.6%. The result from their more sophisticated analysis is a bit less than that:

For the PT subscale, the regression equation (PT mean = –0.0099 × year + 23.349) yields a score of 3.66 for 1979 and 3.36 for 2009.

100*(3.66-3.36)/3.66 = 8.2%.

So how do we get from four estimated subscale declines of 10.3%, 8.2%, Not Significant, and Not Significant, to ""College kids today are about 40 percent lower in empathy than their counterparts of 20 or 30 years ago"? I invite you to read the paper to figure this out — my point here is just that the actual self-report measurements decline by 10% or less, not by 40%.

3. Self-reported empathy may be more about self-presentation than about either genuine internal response or genuine behavior in context.

Konrath et al. note that

On average, females tend to score higher than males on each of the subscales (Davis, 1983c).

But as I observed a few years ago, there's reason to worry that rather than measuring how empathetic people actually are, techniques that rely on self-report may measure how empathetic they want others to think think they are. Thus according to the literature review in Nancy Eisenberg and Randy Lennon, "Sex Differences in Empathy and Related Capacities", Psychological Bulletin 94(1): 100-131, 1983:

In general, sex differences in empathy were a function of the methods used to assess empathy. There was a large sex difference favoring women when the measure of empathy was self-report scales; moderate differences (favoring females) were found for reflexive crying and self-report measures in laboratory situations; and no sex differences were evident when the measure of empathy was either physiological or unobtrusive observations of nonverbal reactions to another's emotional state.

Another survey (Richard A. Fabes and Nancy Eisenberg, "Meta-Analyses of Age and Sex Differences in Children's and Adlescents' Prosocial Behavior", 1998) came to a similar conclusion:

Sex differences were greatest when demand characteristics were high (i.e., it was clear what was being assessed) and individuals had conscious control over their responses (i.e., self-report indices were used); gender differences were virtually nonexistent when demand characteristics were subtle and study participants were unlikely to exercise much conscious control over their responding (i.e., physiological indices). Thus, when gender-related stereotypes are activated and people can easily control their responses, they may try to project a socially desirable image to others or to themselves.

And a recent dissertation by Jessica Calvi ("The relationship between self-report and behavioral measures of empathy", Oklahoma State University 2011) concluded that

Results showed that empathy self-report scales were not consistently correlated with the behavioral measure of empathy, and moderation analyses revealed significant differences between males and females on self-reported versus behavioral measures of empathy. Additional analyses indicated that empathy may also be understood in the context of other dispositional traits such as temperament. As a multidimensional construct, the study of empathy may be better understood with measures of empathy that are behaviorally based in order to correct for potential issues with self-report measures.

So when we evaluate the (rather modest) changes in self-reported empathy among college students over time, we should be concerned that what we're measuring is not changes in what they really feel, but rather changes in what they think that they should want us to think that they feel.

4. Konrath's research explicitly contradicts Dutton's proposed explanation for the alleged trend.

Dutton thinks that society is becoming "more psychopathic", and he has a theory about the cause — it's all because of an alleged decline in reading fiction:

Precisely why this downturn in social values has come about is not entirely clear. A complex concatenation of environment, role models, and education is, as usual, under suspicion. But the beginnings of an even more fundamental answer may lie in a study conducted by Jeffrey Zacks and his team at the Dynamic Cognition Laboratory, at Washington University in St. Louis. With the aid of fMRI, Zacks and his co-authors peered deep inside the brains of volunteers as they read stories. What they found provided an intriguing insight into the way our brain constructs our sense of self. Changes in characters' locations (e.g., "went out of the house into the street") were associated with increased activity in regions of the temporal lobes involved in spatial orientation and perception, while changes in the objects that a character interacted with (e.g., "picked up a pencil") produced a similar increase in a region of the frontal lobes known to be important for controlling grasping motions. Most important, however, changes in a character's goal elicited increased activation in areas of the prefrontal cortex, damage to which results in impaired knowledge of the order and structure of planned, intentional action.

Imagining, it would seem, really does make it so. Whenever we read a story, our level of engagement is such that we "mentally simulate each new situation encountered in a narrative," according to one of the researchers, Nicole Speer. Our brains then interweave these newly encountered situations with knowledge and experience gleaned from our own lives to create an organic mosaic of dynamic mental syntheses.

Reading a book carves brand-new neural pathways into the ancient cortical bedrock of our brains. It transforms the way we see the world—makes us, as Nicholas Carr puts it in his recent essay, "The Dreams of Readers," "more alert to the inner lives of others." We become vampires without being bitten—in other words, more empathic. Books make us see in a way that casual immersion in the Internet, and the quicksilver virtual world it offers, doesn't.

Which is worrisome, to say the least, given the current slump in reading habits.

But Konrath et al. found no significant trend in the Fantasy (FS) subscale, which "measures people’s tendencies to identify imaginatively with fictional characters in books or movies (e.g., 'I really get involved with the feelings of the characters in a novel')". And surely the overall level of young people's consumption of narratives, via movies and television as well as books, has not declined significantly over the past 30 years.

I should note that Dutton expresses even greater alarm about the allegedly rampant narcissism of Kids Today:

More worrisome still, according to Jean Twenge, a professor of psychology at San Diego State University, is that, during this same period, students' self-reported narcissism levels have shot through the roof. "Many people see the current group of college students, sometimes called 'Generation Me,' " Konrath continues, "as one of the most self-centered, narcissistic, competitive, confident, and individualistic in recent history."

But my breakfast time is over, so rather than evaluate this aspect of his piece in detail, I'll just point you to the related discussion in "What does this graph mean?", 7/15/2012.



  1. mike said,

    November 12, 2012 @ 10:12 am

    >"Self-reported empathy may be more about self-presentation"

    This can probably be generalized to:

    Self-reported [anything] may be more about self-presentation.

  2. Michael Johnson said,

    November 12, 2012 @ 1:53 pm

    I admit that the article is technically wrong about the study looking at 14,000 individuals, but it seems more misleading to say "the trend was based on 72 data points"– there were 14,000 (roughly) data points, and the trend was indeed based on them. Sure, they were mediated by means and standard deviations or whatever when they were put together, but those data points were there. There's a clear difference between meta-analyzing 72 surveys, each given to 1 person, for a total of 72 persons, and analyzing 72 surveys given to a total of 14,000 people. I recognize that you know this, but you rhetoric doesn't.

    [(myl) Except that the results of studies like this are known to be affected to a serious degree by the context and mode of administration, the population of subjects studied, and so on. So each study has its own sources of variation; and (because these are typically one-time efforts, not the equivalent of "tracking polls") each time period is affected by random variation in the characteristics of the small number of studies that take place in it.

    This is especially clear in thinking about where the points in the graph come from -- each of them corresponds to 72/5 = about 14 studies on average, and so randomly changing biases in a few studies could shift one of the points by a meaningful amount. The authors' regression analysis treats each study as an independent data point, but again, a few studies might have significant leverage in changing the slope of the fitted line.

    You can see this in the fact that the p values for slopes significantly different from 0 are .02 and .03 for the two dimensions where there was a statistically significant effect. With 14,000 data points and a nominal change of half a percent a year over twenty years, that would be a strikingly weak effect; with 72 data points, it's fine -- except that there might well be some non-sampling error involved in the mix.

    Update -- for a better explanation of this point, see here.]

  3. Rob said,

    November 12, 2012 @ 2:24 pm

    Dutton's argument and reference to Zacks et all seems to ignore what Mark points out: people of all ages are still consuming vast quantities of narratives, but via the medium of film and video rather than reading.

    For Dutton's argument about a connection between a decline in reading fiction and a decline in empathy to be valid, Zacks et all would need to prove that the same brain regions are NOT activated when people watch tv and movies, listen to songs with lyrics, and play video games.

    This is not to mention that Dutton would need to prove that people really do read fewer narratives. Maybe this IS a trend among teens, but from what I've read, there are more books being written, published, and read than ever before (but this doesn't account for possible changes to genre and how we may or may not process a fictional narrative differently from a biography or say a non-fictional work by Simon Winchester or Temple Grandin).

  4. Linda said,

    November 12, 2012 @ 2:25 pm

    Have researchers accepted that self reporting subjects may not be telling the truth?

    The reason I ask is that I took part in a survey when I was a student about 40 years ago. I answered with the socially acceptable answers, so I appeared to be more outgoing and sociable than I really was. In the debrief afterwards the researcher told me how accurate this test was and I told him I had, in effect, lied. He said this wasn't possible as the same question was asked in different ways to show this up, because you wouldn't give consistent answers. I replied that I'd noticed the repetition and had answered consistently from a view point not my own. But he wouldn't have it.

    Perhaps I should reassure the researchers out there that this was a student on a training exercise and not real research. I hope he did learn from the experience.

  5. D.O. said,

    November 12, 2012 @ 3:26 pm

    For the kind of comment I'm going to make, I'd better read the paper, but I didn't and will comment anyways. Sorry.
    7 point scale, I would expect, has to have the standard deviation of about 1.5, maybe 2. If roughly 14000 people participated in 72 studies, it makes about 200 people per study. Now let's take last point on Figure 1. I guess k=13 means that 13 studies were aggregated for the total number of participants of roughly 2500. Reported SE is about 0.1. Scaling back to the distribution of individual answers gives 0.1*sqrt(2500)=5. This is way too much. What gives?

    [(myl) They say that the average standard deviation of EC across the studies was 0.65. That seems quite low to me, given that the test-retest reliability is only 0.62 to 0.71. A test-retest correlation of 0.7 implies a standard deviation of about 1.7 just for repeated administrations of the test to an individual subject! So I'm puzzled.]

  6. naddy said,

    November 12, 2012 @ 4:33 pm

    It strikes me that "Perspective Taking" is a valuable trait for (1) ruthlessly exploiting others' weaknesses or (2) being expressly cruel to them. Talk about a double edge.

  7. D.O. said,

    November 12, 2012 @ 5:00 pm

    Re: myl reply to Michael Johnson comment. With about 14 studies in the "super-sample" it should be possible to figure out whether in-study variance and between-studies variance are in a rough statistical agreement. Of course, one can perform Bayesian hierarhical analysis to get a better handle on these things, but given the significance of the result it's hardly worth an effort.

  8. Yuval said,

    November 12, 2012 @ 5:57 pm

    What kind of scale is being used here? Because if it's based at 1, the slopes are indeed larger than you calculated (all denominators should be smaller by 1).

    [(myl) It's true that the range of possible values is only 6. But another way to think about the effects would then be as a drop in reconstructed mean values of 0.42 out of 6, or 7% of the scale, and 0.3 out of 6, or 5% of the scale.

    Or based on the asserted average EC sd of 0.65, you could see the drop of 0.42 as 0.42/0.65 = 0.65 standard deviations, which is a moderate to large effect; given the average PT sd of 0.68, the drop of 0.3 is about 0.44 standard deviations, which is a small to moderate effect. (Assuming that averaging the standard deviations is appropriate here, and that we can make sense of average standard deviations of 0.65 or so given that the test-retest correlations of 0.7 imply that we should see a standard deviation of about 1.7 just for repeated administration to the same subject...) But anyhow, I don't think that the results warrant a reaction of "OMG psychopaths!", especially because I'm not very confident that extraneous influences are reliably controlled across the set of studies.]

  9. Rubrick said,

    November 12, 2012 @ 7:56 pm

    "Many people see the current group of college students, sometimes called 'Generation Me,' " Konrath continues, "as one of the most self-centered, narcissistic, competitive, confident, and individualistic in recent history."

    And hey, Many People surely can't be wrong!

  10. Matt McIrvin said,

    November 12, 2012 @ 8:22 pm

    "Generation Me" reminds me of the way the 1970s were the Me Decade until the 1980s came along, and then everyone forgot about the 1970s and the 1980s were the Me Decade.

    (Al Franken claimed they were the Al Franken Decade, but he was clearly mistaken; the actual Al Franken Decade was over twenty years in the future.)

    [(myl) Tom Wolfe, "The ‘Me’ Decade and the Third Great Awakening", New York Magazine 8/23/1976; Christopher Lasch, "The Culture of Narcissism", 1979.]

  11. Keith M Ellis said,

    November 13, 2012 @ 3:53 am

    I wonder what a pop-science story on the social psychology of pop-science stories about the Problems with Kids Today would look like and how it would be received.

    For what it's worth, though, my entirely intuitive and otherwise unfounded theory is that empathy is strongly encouraged by fiction and, furthermore, much more so with written narratives which (normally) take a subjective POV as compared to filmed narratives which have a much greater pretense of objectivity with the audience positioned more as observer. To be sure, I am not arguing that the former forces full identification while the latter disallows it but, rather, that each generally has opposing tendencies in this regard.

    The tricky part about empathy and theories of mind is the apparent paradox contained in the statement "if I were you". That is, it's not either us wholly imagining others to be ourselves, but merely in different circumstances; nor could it be, necessarily, a magical appropriation of someone's else's inner experience. Rather, it's a bridge built between self and other, made by constructing portions from both directions, meeting somewhere in the middle. Admittedly, that's sort of magical, too. In more concrete terms, we partly imagine that others are like us and tend to think like us, but we also find ways to imagine states of mind that would otherwise be alien to us. Perhaps we construct them almost from whole cloth, internally. More often, we probably build them out of the detritus of daily life, the remains of others' thoughts revealed to us to consider.

    But storytellers more explicitly present to us the inner lives of people who are not us. While we tend to read stories written by people like us which are about people like us, they are, nevertheless, written by and about people who aren't, precisely, us.

    I think that this explicit attempt to construct the experience of someone else's inner-life is a profound function of fictional narratives — it's relevant to both individual psychological development and social psychology.

  12. Theo Vosse said,

    November 13, 2012 @ 6:58 am

    Given the description, this might be the whole questionnaire:

    Seven questions per "dimension" is a very, very weak basis, and many of the questions are a rephrasing of others. If society has managed to stress the importance of one of the questions over time, or gotten more relaxed about another, the difference would be wholly explained.

    Another, more linguistic, way of looking at it might be to check for frequency or valence of the words in the test over time. Here is a quick Google n-gram search for 5 terms that seem central to the PT dimension:


    [(myl) See Davis, M. H. ,"A multidimensional approach to individual differences in empathy", Catalog of Selected Documents in Psychology, 1980, for psychometric validation of the test instrument. Note that the test-retest correlation is about 0.7, which means that it's a somewhat blunt instrument at best, though not any blunter than most of its peers.

    I think that meta-analysis of diverse opportunistic samples is a bigger issue --see Kali Trzesniewski and M. Brent Donellan, "Rethinking 'Generation Me': A Study of Cohort Effects From 1976-2006", Perspectives on Psychological Science 5(1) 2010 [discussed and linked here], for a general critique of these issues.]

  13. Brett said,

    November 13, 2012 @ 10:21 am

    @Matt McIrvin: Actually, Franken declared that we are now in the "Al Franken Millennium," which would be even greater than his decade, so…

  14. Ikiru said,

    November 13, 2012 @ 11:34 am

    Given that self reports of empathy are strongly gendered, I wonder if the observed decline in EC and PT measures have a gendered aspect as well. Obviously I'd have to see the actual data, but I'd hypothesize that young women feel significantly different pressures on their self-presentations today than they did in 1979.

  16. the other Mark P said,

    November 13, 2012 @ 6:29 pm

    (myl) Except that the results of studies like this are known to be affected to a serious degree by the context and mode of administration, the population of subjects studied, and so on.

    I would say that the "and so on" would be include what the researchers are trying to prove.

    How many of the researchers are trying to prove that people today are better? Given that, what is the chance that the results skew towards what people expect to find?

  17. fiona hanington said,

    November 21, 2012 @ 12:43 pm

    Mark, I particularly enjoy the posts in which you analyze research findings as you've done here. So, thanks! I included you among my favorite bloggers who do the important work of holding mainstream journalists to account:

