The inclusion epidemic

« previous post | next post »

Last week, a journalist asked me a question in connection with the recent flurry of stories on changes in childhood obesity percentages in the period from 2001 to 2012. When I looked into it, what struck me was that a category defined as "BMI at or above the 95th percentile" applied to about 15-17% of the population throughout the period discussed.

This sounds like a statistical approximation to Garrison Keillor's joke about his home town, where "all of the children are above average". But the normative percentiles are based on data from an earier time, and so it's perfectly logical that 17.1% of the age-2-19 sample in the 2003-2004 period should be at or above the 95th percentile for the 1963-1994 period. This is just a symptom, after all, of the famous "obesity epidemic".

Still, I remained curious about just when this large change really took place. (Most of) the raw data is available on line from the CDC, and I decided to spend an hour or so satisfying my curiosity about what is going on here: has there actually been a gradual climb over 50 years, which looks steep when a threshold derived from 1963-1994 is used for data from 2003 to the present? Or was there a steeper climb over a narrower stretch of time?

I found a clear answer to this question. But when I looked into it further, I found some additional information that made me wonder whether there has really been any change over time at all.

Here's the punch line, in graphical form, with the data from each of 11 CDC surveys plotted against the temporal mid-point of the survey years (so that a survey spanning 1988-1994 is counted as taking place in 1991):

On the face of it, the answer seems to be that there was one regime in the 1963-1980 time period, and a different regime in the 1999-2009 time period, with a transitional value around 1991 (representing the 1988-1994 survey). However, we'll see that the two stable regimes correspond to two quite different approaches to sampling the population, with the transitional value coming from a survey with a transitional sampling procedure.

Before we speculate further about this story, let's take the time to see where the numbers came from.

The values in the graph are based on on  "body mass index" (BMI) measurements in 11 national surveys carried out by the Centers for Disease Control (CDC). BMI is defined as "weight in kilograms divided by height in meters squared". In each case, the value plotted on the vertical axis is the proportional of the sample aged from 24 to 240 months, inclusive, with BMI above a threshold determined by the CDC to represent "obesity" for the individual's sex and age. This threshold was calculated as the 95th percentile (for sex and age) in an elaborately smoothed model of the distribution of BMI values in a set of samples collected between 1963 and 1994.

The first five values in my graph come from the surveys that were used in developing the normative quantiles: the Health Examination Surveys cycles II and III, and the National Health and Nutrition Examination Surveys I through III:

HES2 1963-65
HES3 1966-70
NHANES1 1971-74
NHANES2 1976-80
NHANES3 1988-94

The other six values come from NHANES surveys run on a two-year cycle since 1999-2000. (The publicly-available data from the 2011-2012 and 2013-2014 surveys still mostly lack the relevant data fields, so my graph stops with 2009-2010).

The procedure for deriving the normative percentiles is documented in R.J. Kuczmarski et al.. "2000 CDC Growth Charts for the United States: Methods and Development",  Vital and health statistics. Series 11 (246), Data from the national health survey, 2002.  The numerical outcomes are listed in a .csv file linked here, which yields this table of 95th-percentile BMI-by-age numbers for males and females between 2 years (24.5 months) and 20 years (240.5 months) old . In graphical form:

In order to calculate the plot of changes over the period 1963-2009, I downloaded the raw data files for the eleven surveys, and determined what proportion of the the relevant subjects in each survey had a BMI measurement above the 95th-percentile threshold specified by those CDC 2000 norms.

The HES2, HES3, NHANES1, NHANES2, and NHANES3 numbers are all available in one big file growthch.xpt, documented here. The numbers for NHANES 1999-2000 through 2009-2010 are in individual files accessible via the links starting from this point. For each of these later six surveys, it's necessary to download and combine two .xpt files: an "examination" file which gives measurements including BMI, and a "demographics" file which gives information including age in months. The two tables are (partly) tied together by a "Respondent Sequence Number" SEQN.

So let's look at the results again:

Either there was a big change in the high end of the BMI distribution over the 1980-2000 period, or there was a big change in the CDC's measurement or sampling procedures.

A big enough change in measurement procedures seems unlikely. At least from the 1980s onward, all (?) measurements were carried out in a "mobile examination center" (MEC), using an elaborately prescribed procedure that includes regular calibration of the apparatus. I believe that the earlier studies were carried out in a similar way, and while the apparatus and procedures evolved, I don't see any reason to think that there could have been a big enough change to triple the proportion of the population above the specified 95th-percentile thresholds.

But the sampling issue is different. In the early 1980s, it was recognized that "comparable data were not available for many of the ethnic groups within the United States", and a therefore a "Hispanic HANES" was carried out in 1982-84. Its results confirmed the view that  "the health status of minority groups is often different than the health status and characteristics of nonminority groups, so black Americans and Mexican Americans were selected in large proportions for NHANES III. Each group comprised 30 percent of the sample."

In 1999, the CDC began a continuously-running study, with new sampling procedures: in each two-year-cycle, "Approximately 40,000 individuals of all ages in households across the U.S. will be randomly selected to participate in the survey. The study respondents include whites as well as an oversample of blacks and Mexican-Americans. The study design also includes a representative sample of these groups by age, sex, and income level."

So NHANES III began shifting the sampling procedure towards an over-sample of previously under-sampled groups; and the subsequent surveys, begun on a regular basis in 1999-2000, have used a consistent sampling procedure that is different from NHANES III as well as from the earlier NHES and NHANES I-II surveys.

With those facts in mind, let's take a look again at my graph of the results over time, with the transitional NHANES III clearly marked:

So maybe the change over time is really a change over sampling procedures — and the increase from 5% to 17% in the proportion of the population above a normative 95th-percentile BMI-for-age measure is likewise a sampling change, not a population change.

The NHANES surveys include demographic information that can be used to re-weight the results. I didn't do any re-weighting, or pay any attention to the sampling-weight field — but I don't believe that this affects the conclusions in a material way. As a simple check, note that Table 6 from C.L. Ogden et al., "Prevalence of Childhood and Adult Obesity in the United States, 2011-2012", JAMA 2014, give the 2003-2004 obesity percentage for the age range 2-19 as 17.1%; my calculation from the unweighted raw data, for the age range 2-20, is 16.8% for females and 17.3% for males.

None of this changes the fact that the population would be healthier if people had lower BMIs. And it's plausible that there are have been population-level changes in obesity rates, though maybe on a longer time scale. But apparently what's going on in the CDC data, at least among people aged 2-20, is less of an "obesity epidemic" than an "inclusion epidemic".

My quotes about the sampling procedures come from The National Health and Nutrition Examination Survey's "Physician Examination Procedures Manual" from 2003 — documenting what was released as the 1999-2000 and 2001-2002 datasets — which tells the overall history this way (emphasis added):

This NHANES is the eighth in a series of national examination studies conducted in the  United States since 1960.

The National Health Survey Act, passed in 1956, gave the legislative authorization for a
continuing survey to provide current statistical data on the amount, distribution, and effects of illness and disability in the United States. In order to fulfill the purposes of this act, it was recognized that data collection would involve at least three sources: (1) the people themselves by direct interview; (2) clinical tests, measurements, and physical examinations on sample persons; and (3) places where persons received medical care such as hospitals, clinics, and doctors’ offices.

To comply with the 1956 act, between 1960 and 1984, the National Center for Health  Statistics (NCHS), a branch of the U.S. Public Health Service in the U.S. Department of Health and
Human Services, has conducted seven separate examination surveys to collect interview and physical examination data.

The first three national health examination surveys were conducted in the 1960s:

1. 1960-62 – National Health Examination Survey I (NHES I)
2. 1963-65 – National Health Examination Survey II (NHES II)
3. 1966-70 – National Health Examination Survey III (NHES III)

NHES I focused on selected chronic disease of adults aged 18-79. NHES II and NHES III
focused on the growth and development of children. The NHES II sample included children aged 6-11, while NHES III focused on youths aged 12-17. All three surveys had an approximate sample size of 7,500 individuals.

Beginning in 1970 a new emphasis was introduced. The study of nutrition and its relationship to health status had become increasingly important as researchers began to discover links between dietary habits and disease. In response to this concern, under a directive from the Secretary of the Department of Health, Education and Welfare, the National Nutrition Surveillance System was instituted by NCHS. The purpose of this system was to measure the nutritional status of the U.S. population and monitor nutritional changes over time. A special task force recommended that a continuing surveillance system include clinical observation and professional assessment as well as the recording of dietary intake patterns. Thus, the National Nutrition Surveillance System was combined with the National Health Examination Survey to form the National Health and Nutrition Examination Survey (NHANES). Four surveys of this type have been conducted since 1970:

1. 1971-75 – National Health and Nutrition Examination Survey I (NHANES I) 
2. 1976-80 – National Health and Nutrition Examination Survey II (NHANES II) 
3. 1982-84 – Hispanic Health and Nutrition Examination Survey (HHANES) 
4. 1988-94 – National Health and Nutrition Examination Survey (NHANES III)

NHANES I, the first cycle of the NHANES studies, was conducted between 1971 and 1975. This survey was based on a national sample of about 28,000 persons between the ages of 1-74. Extensive data on health and nutrition were collected by interview, physical examination, and a battery of clinical measurements and tests from all members of the sample.

NHANES II began in 1976 with the goal of interviewing and examining 28,000 persons between the ages of 6 months to 74 years. This survey was completed in 1980. To establish a baseline for assessing changes over time, data collection for NHANES II was made comparable to NHANES I. This means that in both surveys many of the same measurements were taken in the same way, on the same age segment of the U.S. population.

While the NHANES I and NHANES II studies provided extensive information about the health and nutritional status of the general U.S. population, comparable data were not available for many of the ethnic groups within the United States. Hispanic HANES (HHANES), conducted from 1982 to 1984, produced estimates of health and nutritional status for the three largest Hispanic subgroups in the United States—Mexican Americans, Cuban Americans, and Puerto Ricans—that were comparable to the estimates available for the general population. HHANES was similar in design to the previous HANES studies, interviewing and examining about 16,000 people in various regions across the country with large Hispanic populations.

NHANES III, conducted between 1988 and 1994, included about 40,000 people selected from households in 81 counties across the United States. As previously mentioned, the health status of minority groups is often different than the health status and characteristics of nonminority groups, so black Americans and Mexican Americans were selected in large proportions for NHANES III. Each group comprised 30 percent of the sample. NHANES III was the first survey to include infants as young as 2 months of age and to include adults with no upper age limit. To obtain generalizeable estimates, infants and young children (1-5 years) and older persons (60+ years) were sampled at a higher rate than previously. NHANES III also placed an additional emphasis on the effects of the environment upon health. Data were gathered to measure levels of pesticide exposure, presence of certain trace elements in the blood, and amounts of carbon monoxide present in the blood. A home examination was incorporated for those persons who were unable or unwilling to come to the exam center but would agree to an abbreviated examination in their homes.


This NHANES follows in the tradition of past NHANES surveys, continuing to be a keystone in providing critical information on the health and nutritional status of the U.S. population.

The major difference between the current NHANES and previous surveys is that the current NHANES is conducted as a continuous, annual survey. Each single year and any combination of  consecutive years of data collection comprises a nationally representative sample of the U.S. population. This new design allows annual statistical estimates for broad groups and specific race-ethnicity groups as well as flexibility in the content of the questionnaires and exam components. New technologic innovations in computer-assisted interviewing and data processing result in rapid and accurate data collection, data processing, and publication of results.

The number of people examined in a 12-month period will be about the same as in previous NHANES, about 5,000 a year from 15 different locations across the nation. The data from the NHANES are used by government agencies, state and community organizations, private researchers, consumer groups, companies, and health care providers.

Data collected on the current NHANES survey began early in 1999 and will continue for approximately 6 years at 88 locations (stands) across the United States. The survey was preceded by a pretest in the spring of 1998 and a dress rehearsal was conducted in early 1999. Approximately 40,000 individuals of all ages in households across the U.S. will be randomly selected to participate in the survey. The study respondents include whites as well as an oversample of blacks and Mexican-Americans. The study design also includes a representative sample of these groups by age, sex, and income level. Adolescents, older people, and pregnant women are also oversampled in the current NHANES.




  1. Alex said,

    March 9, 2014 @ 5:49 am

    Does the 95th percentile from the 1960's represent a clear split between two groups in terms of morbidity or mortality? Has the percentage of children who are undernourished changed since the 1960's? Since puberty has been occurring earlier and earlier throughout the 20th century in developed countries, how much of the change in weight distribution (the part that isn't accounted for by methodological changes) has to with body fat and not muscle, breast, etc., development?

    Choosing the 95th percentile from the 1960's just seems plain arbitrary if there's nothing special about that percentile and nothing particularly valuable about the weight distribution of that time period.

    That said, it's not surprising that folks would grasp to a methodological change that resulted in increased obesity rates and not look below the surface to see what caused it. It's a modern day moral panic, and what's a moral panic without a few sky-is-falling statistics?

  2. Kevin Roust said,

    March 9, 2014 @ 6:37 am

    I'm not convinced that the oversampling is enough to explain the difference. In my long ramble below, I conclude that at least 10% (at more likely at least 12%) of the 2009-2010 population must be above the 95th percentile of the older surveys.

    [(myl) An interesting and relevant argument. But the history I quote suggests that the sampling procedures for non-Hispanic whites might have changed as well: "The study respondents include whites as well as an oversample of blacks and Mexican-Americans. The study design also includes a representative sample of these groups by age, sex, and income level." It seems suspicious that the measure's distribution makes such a radical shift, precisely during the period when new sampling procedures were adopted.]


    Per the 2010 census, 63.7% of the US population is non-Hispanic white. The 2009-2010 NHANES survey included 4420 non-Hispanic white in their sample of 10537 — 41.9%.

    17% of the sample is above the 95th percentile of older samples. If this excess is purely due to oversampling, then we can estimate the obesity rate of the oversampled population:

    41.9% / 63.7% is 65.8%, so roughly 66% of the sample matches the population race/ethnicity distribution. The remaining 34% of the sample represents the oversampling of Hispanic or non-white populations.

    If the 66% sample matches the pre-1980s base data, 5% of this sample should be above the 95th percentile of the old data.

    If the whole sample is 17% obese and 66% of the sample is only 5% obese, then the remaining 34% of the sample must be 41% obese.

    But, we can then apply the same math to the population-matched sample. If the whole population-matched sample is only 5% obese and the Hispanic or non-white sample is 41% obese, then the non-Hispanic white sample (63.7% of the 66% population-matched sample) must be some -13% obese (negative 13%).

    As a negative population percentage is unsound, oversampling cannot be the sole contributor to the excess obesity.


    It looks like the true rate of obesity must be at least 10% in the 2009-2010 era, and that requires a huge differential in the rates of obesity between the non-Hispanic white population and the Hispanic or non-white populations.
    10% population-matched obesity and 17% sample obesity => 31% oversample obesity.
    31% oversample obesity and 10% population-matched obesity => 0% non-Hispanic white obesity.
    Not plausible, but not mathmatically unsound.

    The population rate needs to be about 12% to permit the non-Hispanic white obesity rate to be 5%, with the Hispanic or non-white rate then being 27%. Note that this outcome could mean that the (measured) obesity epidemic reflects both a measurement artifact (oversampling) and a demographic change (growth of "minority" populations).

    If the population rate is above 12%, then the non-Hispanic white obesity rate must have increased above the 5% historical definition.
    For example, a 14% population obesity could reflect as low as a 10% white obesity and as high as 23% minority obesity.

  3. M Warshaw said,

    March 9, 2014 @ 3:00 pm

    You wrote: BMI is defined as "weight in kilograms divided by height in meters squared". I don't know where you got that, but height cannot possibly be in meters squared – that would be area.

    [(myl) "Weight in kilograms divided by height in meters squared" is a definitional quotation from the authoritative source, C.L. Ogden & K.M. Flegal, "Changes in terminology for childhood overweight and obesity", National Health Statistics Reports 2010. What they (and I) obviously mean by this phrase is, adding parentheses to help you parse it, "(weight in kilograms) divided by ((height in meters) squared)". If you don't like the left-branching construction "height in meters squared", please take it up with the authors of the 15,700 scientific papers that Google Scholar indexes as containing this phrase.]

    That said, your post illustrates one of the many problems that plague epidemiology. Too many pronouncements are made on flawed data or ignorance of confounding factors.

  4. Jonathan Mayhew said,

    March 9, 2014 @ 4:22 pm

    That's simply the formula. You take the height measurement first and then square it, then divide the weight by it.

  5. M Warshaw said,

    March 9, 2014 @ 4:50 pm

    That's why formulae are so much clearer than words

  6. David Eddyshaw said,

    March 10, 2014 @ 5:33 pm

    The fact that the actual concept behind BMI is itself pretty incoherent probably doesn't help efforts to remember or understand the definition.
    The wikipedia article

    gives quite a good summary. It's a kludge, rather than a thought-through scientific assessment.

    Those Belgian polymaths have a lot to answer for.

  7. blahedo said,

    March 11, 2014 @ 5:02 pm

    M Warshaw: he, and the people who designed the definition, mean that as "(weight in kilograms) divided by ((height in meters) squared)".

    The possibility that this BMI shift is a statistical artifact is certainly troubling. How could we confirm it? Would it be possible to, say, run a modern study using the 1960s sampling technique, to try to isolate the time variable from the sampling-technique variable?

    [(myl) I believe that there's pretty good demographic metadata for each subject, and therefore it ought to be possible to distinguish genuine trends from changes in the sampling procedures. It's possible and even likely that CDC biostatisticians have done exactly that, but I have been able to find documentation yet.]

  8. Per said,

    March 18, 2014 @ 1:09 pm

    At the very least it's implausible that this is entirely due to a change in racial sampling techniques: the CDC actually does release breakdowns by race, and within each group we see increases from 1988-94 to the 2000s. Of course it's possible that other changes in sampling contributed — but a roughly 5-percentage-point increase among white kids and a roughly 8-percentage-point increase among kids of color does still need to be explained somehow.

RSS feed for comments on this post