Language Log

Olympic overfitting?

March 1, 2010 @ 5:50 am · Filed by Mark Liberman under The language of science

According to William Heuslein, "The man who predicts the medals", Forbes Magazine, 1/19/2010

Daniel Johnson makes remarkably accurate Olympic medal predictions. But he doesn't look at individual athletes or their events. The Colorado College economics professor considers just a handful of economic variables to come up with his prognostications.

The result: Over the past five Olympics, from the 2000 Summer Games in Sydney through the 2008 Summer Games in Beijing, Johnson's model demonstrated 94% accuracy between predicted and actual national medal counts.

First question: what do you think it means to "demonstrate 94% accuracy between predicted and actual national medal counts"?

If you guessed that it means "the predicted number of medals matched the actual number of medals 94 times out of 100", I'm sorry, you're wrong.

The next sentence of the Forbes article suggests what it really means:

For gold medal wins, the correlation is 87%.

And we can confirm from reading one of the papers cited on Johnson's web site (Daniel Johnson and Ayfer Ali, "A Tale of Two Seasons: Participation and Medal Counts at the Summer and Winter Olympic Games", Social Science Quarterly 85(4) 2004) that "94% accuracy between predicted and actual national medal counts" means that "the vector of predicted numbers of medals by country has a correlation of r=0.94 with the actual numbers of medals by country". (N.B. the text between the double quotes there is my summary, not Johnson & Ali's wording. And if you're a bit rusty on what correlation actually means, the wikipedia page is not bad; or just keep in mind that if you can turn one sequence of numbers into another by some combination of adding constants and multiplying by constants, their correlation will be "100%".)

This still sounds like pretty impressive predicting, but it could be true without any of the medal-count numbers actually coinciding. I wonder what proportion of Forbes' readers understood that? In fairness, Heuslein did slip in that "correlation", but then at the bottom of the piece, he lists what he calls the "Accuracy rate of Johnson's predictions" for the summer and winter games from 2000 to 2008, which (for total medals) vary from "93%" to "95%".

Anyhow, how well did "the man who predicts the medals" do this time around?

Forbes gives Johnson's "In Depth: Medal Predictions for Vancouver". And now that the Olympics is over, we can compare them with the actual medal counts, as documented by the New York Times. So I entered the data into a table, and calculated the correlations using this trivial R script:

X <- read.table("Olympics.table", header=TRUE)
totalcor <- cor(X[,"PredictedTotal"],X[,"ActualTotal"])
goldcor <- cor(X[,"PredictedGold"], X[,"ActualGold"])

The correlation for total medals? 0.625. The correlation for gold medals? 0.279.

How did 94% and 87% turn into 63% and 28%?

I'm not sure, and I don't have time this morning to nail it down — I've got to finish my laundry, and hike over to the train station to catch the 8:30 Regional Rail for NYC, where I'm giving a talk at noon. But four possible explanations come to mind, all of which might simultaneously be true:

(1) Scribal error. Maybe Forbes' list of Johnson's predictions is wrong, or the NYT list of medal totals is wrong, or I made a mistake copying the numbers into my table. If so, someone will probably tell us in the comments.

(2) Regression to the mean. Maybe Johnson's luck ran out, like a mutual-fund manager whose string of good years was based more on good fortune rather than good information.

(3) False advertising. Maybe the 94% correlation only applies to Johnson's predictions if you include not only the 13 countries in the Forbes list, but also all the other countries in the world, most of whom can trivially be predicted to win no medals at all, or almost none. If so, then r=0.94 may not really be very impressive.

(4) Overfitting. Although the cited paper does claim "out of sample" predictions — that is, they calculate the model parameters on historical data, and then looks at the fit to recent data which was not used in "training" the model — it's possible that they made some adjustments in the model structure in order to get it to work well, and perhaps a different set of adjustments would be needed to make it work well for this year's data.

My prediction: some combination of (3) and (4). With respect to (3), note that just padding the medal vectors with 70 zeros brings the total correlation up from r=0.63 to r=0.91:

padding <- vector(mode="numeric",70)
totalcor1 <- cor(c(X[,"PredictedTotal"],padding),c(X[,"ActualTotal"],padding))

My tentative conclusion: this is more evidence that when an economist is talking about numbers, you should put your hand on your wallet.

And, of course, when a journalist is interpreting a press release about a technical paper, you may need both hands and some help from your friends to avoid getting intellectually mugged.

[Via Phil Birnbaum, "An economist predicts the Olympic medal standings", Sabermetric Research 2/18/2010]

March 1, 2010 @ 5:50 am · Filed by Mark Liberman under The language of science

Permalink

12 Comments

Graeme said,

March 1, 2010 @ 8:36 am

If the Forbes' article implication were true, the economists would have been making a killing on the sports betting markets. And hence been absurdly public spirited to publish their methodology.

Still, the confirmation that surplus income, climate and home-ground advantage almost guarantee athletic success is a nice reminder that there's no level playing field in sport. And as for valourising its 'heroic' qualities or 'glorious uncertainties'…
uberVU - social comments said,

March 1, 2010 @ 10:00 am

Social comments and analytics for this post…

This post was mentioned on Twitter by PhilosophyFeeds: Language Log: Olympic overfitting? http://goo.gl/fb/6V1y…
D.O. said,

March 1, 2010 @ 4:04 pm

It seems to me that because a) total medal count is fixed and b) variance in the predicted vs. actual medal counts is not a reasonable concern, a better measure would be what that Wikipedia article calls uncentered correlation coefficient. It is also stable with respect to irrelevant 0 dimensions.

As for the predictions, they are really way off. U.S. was badly underestimated and Russia (sigh) badly overestimated.
Eugene van der Pijll said,

March 1, 2010 @ 4:05 pm

The problem here is presenting the results as a prediction. There is certainly a relation between the number of medals and the socio-economic factors in a country, and I'm sure David Johnson's study shows that.

But if you want a prediction of the number of medals (and that is no doubt what interests the journalist and the reading public), you can do much better. For the 13 countries included, there is a r=0.85 correlation between the number of medals in 2006 and the number in 2010. It's like the weather: a good guess for tomorrow's weather is that it is the same as today's; for many locations, only the best models do better than that.

But that wouldn't be scientific, and it wouldn't be news.
Aviatrix said,

March 1, 2010 @ 4:28 pm

"climate and home-ground advantage almost guarantee athletic success."

We wish. This is the third time Canada has hosted the Olympic Games and the first time the country has won any gold medals at home.

See also Per Capita Results which makes Norway the runaway winner.
ShadowFox said,

March 1, 2010 @ 5:12 pm

There is a bigger problem here. A correlation coefficient is a degree of linear dependence and is not a percentage of anything tangible. It converts to percentage of perfect correlation (r=1), but that is meaningless. Unfortunately, Mark seems to fall into the same trap–a correlation coefficient is a measure of distribution, not a predictive value. The whole thing could have just as well come from the BBC.

[(myl) A correlation coefficient can be viewed as lots of things — the cosine of the angle between two vectors; the inner product of mean-corrected length-normalized vectors; the square root of the percent of variance accounted for by modeling one set of numbers as a linear function of another set; etc. As a measure of how well you can predict y if you know x, it's better than a poke in the eye with a sharp stick — in fact, if r is close to 1 or -1, a linear model could predict y from x almost perfectly.

And since it's bounded between -1 and 1, it's not totally nuts to express r values as pseudo-percentages, though it's certainly wrong; it would make a bit more sense for r^2, if you were interpreting that as percent of variance accounted for (by a linear model).

It's true that it's not an ideal way to measure the success of Johnson's predictions. But the main point, in my opinion, is that something has gone wrong in that Forbes article, in addition to the misleading description of r=0.94 as "94% accurate prediction", namely that the model's predictions are way worse than that for this year's results, which raises the question of who's zooming who, and how.]
mollymooly said,

March 1, 2010 @ 6:39 pm

@D.O: total medal count is not quite fixed; there are occasional dead heats, and when judging and doping scandals result in medals being swapped around things don't always add up. I'd really be impressed by any ability to predict such events.

[(myl) In fairness to Johnson, he's not claiming to predict such events, or to predict anything about the outcomes of individual competitions, but rather just the overall totals of medals (and gold medals) won by specific countries.]
D.O. said,

March 2, 2010 @ 5:50 am

Well, fairness to Johnson is not on my mind. The answer to the question in the post is (3) False advertising. The whole Beijing data is in this pdf. He did include 79 countries, 15 of which were not supposed to win any medals and 31 actually hadn't. This is a certifiable idiocy. By the way, Johnson does use Pearson's r as his measure.

The story does not end here. A simple summation shows that the Gerald L. Schlessman Professor of Economics and Director of Innovative Minds Program predicted much more medals in Beijing that were actually in play. And it is not a mollymooly effect (thank you for correction). He predicted a nice round number of 900 medals, while his final roster gives 743. With gold medals he predicted just 10 more 258 vs. 248 real (as you see, there is a mollymooly effect here, at least one non-gold medal took a hike unless, of course, some competition does not award the bronze as a regular matter). And no, I have no desire to go read his paper to find out what possessed him to do such a thing (it is not like he does not care about absolute numbers, his tables show also the difference between predictions and actual counts by the country).

Now, ShadowFox might be right that reporting Pearson's r in terms of percentages is not comme il faut. What would be a better intuitive measure? Probably, a reasonable measure of error can be the sum of absolute differences of medal counts divided by twice the total number of medals (twice in order to be sure that one cannot be more than 100% wrong). This measure is also stable with respect to enlarging the data set with irrelevant zeros. By that measure, G.L.S.P.of E. and D. of I.M.P. Johnson was ~26% wrong about total medals and ~29% wrong about gold medals for Beijing (I took the real medal counts for the denominators). And 18-21% (depending on what exactly goes into denominator) wrong in total medals and 27-36% wrong on gold medals for Vancouver.

Sorry for the long comment.

[(myl) Thanks! But why try to protect against greater than 100% error? If I predict a value of 30, and the truth is 10, then the difference between my prediction and the true value is 200% of the true value. And if the sum of absolute errors is (say) 50% of the sum of true values, why discount that to 25%?

In calculating word error rate for speech recognition, it's traditional to add up substitutions, deletions, and insertions, and divide by the number of words in the correct transcript. It's obviously possible (and indeed easy) to get a WER greater than 100%.]
D.O. said,

March 2, 2010 @ 8:09 am

But why try to protect against greater than 100% error?

I still cannot overcome the idea that total medal count is essentially fixed. So if there is a predicted distribution to be compared with a real one, than the difference can be visualized as taking some medals from certain teams and than reapportioning them to the others. You basically cannot move more than 100% chips. This is, of course, not a mathematical argument, but an aesthetic one. But, after all, all linearly dependent measures are telling us the same thing.

By the way, though I personally think of a percent sign as just multiplication by 0.01 so I could leave even with (probably never used) notation like 3% meter (meaning 3cm), its not what most people think. And using percentages more than 100 (though done very frequently) is very error prone. Just approximately a week ago a local radio station reported that certain high school in the area had 143% students more than normal. And another one something like 125% more and so on. And this more has been repeated for each of approximately 5 schools. Well, I honestly hope that in reality the schools are at most 43%, 25% and so on over.
Jerry Friedman said,

March 2, 2010 @ 6:07 pm

@D.O.: I agree: for popular use, 0% error should mean everything exactly right, and 100% error should mean everything exactly wrong. In this case, that would be that none of the countries predicted to win medals won any. of course, no one would seriously have made a prediction like that—everyone knew the U.S., Germany, Canada, etc., were going to win some medals.

@Eugene van der Pijll: Maybe the right metric is some kind of comparison to a "baseline", an obvious simple prediction. The journalists could report results as "such-and-such percent better than the baseline. (The baseline is the simple prediction that each country would win the same number of medals as in the previous Winter Olympics.)" Of course, then a lot fewer predictions would be news.

How do you calculate that percentage? I don't know, but there's probably a clever way.
Army1987 said,

March 3, 2010 @ 7:16 am

A more decent measure for that kind of prediction than the correlation coefficient would be a chi-squared test: Σi (Ni − Pi)2/Pi, where Pi is the number of medals that guy predicted the i-th country to win, and Ni is the number of medals the i-th country actually won.
Ken Grabach said,

March 3, 2010 @ 10:16 am

What I want to know is, how did this alleged prediction work with regard to the Russian Olympic team in Vancouver and Whistler? They have climate, they have much effort and treasure invested in team training, and they have a home field advantage in four years. Are Johnson et al. on President Medvedev's hit list, too? Correlations aside, just askin'?

RSS feed for comments on this post

Olympic overfitting?

12 Comments

Graeme said,

uberVU - social comments said,

D.O. said,

Eugene van der Pijll said,

Aviatrix said,

ShadowFox said,

mollymooly said,

D.O. said,

D.O. said,

Jerry Friedman said,

Army1987 said,

Ken Grabach said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta