Keith Chen at TED

« previous post | next post »

"Saving for a rainy day: Keith Chen on language that forecasts weather — and behavior", TED Blog 2/19/2013:

Back when my first paper on this topic circulated, many linguists were appropriately skeptical of the work. Their concerns are concisely explained in two well-thought out posts (here and here) by the linguists Mark Liberman and Geoffrey Pullum on the blog they founded, Language Log. Mark and Geoffrey also invited me to write a guest post explaining the work. In that post, I discuss which of their possible concerns are unlikely given the patterns I find across the world in people’s savings and health behaviors, and also try to clarify which of their concerns I was not yet able to address.

This exchange prompted a broad set of discussions as to what different types of data, analyses and experiments could, in principle, answer the questions raised by the patterns I find. Cross-disciplinary discussions took place in a subsequent post by Julie Sedivy and followup posts by Mark Liberman, and also at the Linguistic Data Consortium’s 20th Anniversary Workshop. Several new avenues of investigation and work came out of these interactions, three of which are now ongoing projects.


  1. Rob P. said,

    February 20, 2013 @ 11:06 am

    Why the huge difference in FTR (sentence ratio) between Brazilian Portuguese and European Portuguese?

  2. Jason Merchant said,

    February 20, 2013 @ 12:42 pm

    This version of the paper indeed does a much better job of laying out and explaining the linguistic data claims, and of substantiating them—especially the nice work reported in Appendix B: though I note with some surprise that Dutch and German are reported to have 0% of verbs in weather reports using future markers (presumably Du "worden" and Ger "werden"; a quick entirely random googling today yielding 20% (1/5) for Dutch here, and I heard at least three German "werden"s during this oral report today; the Greek here is in line with the paper's claims about that language, though). Another way to independently test the Eurotyp results, other than the clever scraping of weather reports, would be to examine the distribution of the future markers in embedded clauses selected for by verbs meaning "predict", etc. (Pretty straightforward, and a quick google for German and Greek give encouraging and expected results, though German is far from zero for use of "werden".)

    But, as interesting as these distinctions are, I remain unconvinced that the correlations with language type are anything but indirect. The work is careful, and the obvious objections are overcome by using within-country regressions: "Switching to within-country regressions, I compare individuals with identical income, education, family structure, and countries of birth, but who speak different languages". These are much more convincing than across-country comparisons, exactly because there are presumably larger cross-country cultural differences which, while difficult to measure, seem more plausible candidates for explaining the economic behaviors than language is, and, in my view, would have to be eliminated before the conclusion can be considered established.

    My reservation about the within-country comparisons is that, at least in the cases I know a little something about, namely Belgium, Switzerland, and Cyprus, the variables controlled for (listed above) would not in fact capture the substantial, but harder to measure, cultural differences that exist within the populations of these countries. These cultural differences do, however, correspond fairly closely with language: so the Flemish and Walloons do in fact have distinct family cultures, behaviorial norms, etc., and I suspect that these elements influence economic behavior to a much greater degree than the Dutch or French they speak do. Establishing this quantitatively, of course, would be a difficult task, and one that I would look to a quantitative sociologist for.

    A couple other ways to control for the role of language that occur to me (none without difficulties):
    1. Look at bilingual populations whose L1 is the same as an L1 of an equivalent monolingual population (Finland Swedes vs Swedes in Sweden, e.g.).

    2. Look at cases of in situ language shift (hard to think of good examples where confounding factors aren't large: I'm thinking of cases like the Vlach or Arvanite populations in Greece, perhaps).
    3. Look at matched cases of immigration from a single group to two different new L1s (e.g., Ethiopian immigrants to Flanders vs Wallonia).

  3. Jason Merchant said,

    February 20, 2013 @ 12:45 pm

    (That should be Dutch "zullen", of course; in the cited weather report, it appears in its inflected form "zal" in the fourth sentence.)

  4. Keith Chen said,

    February 20, 2013 @ 4:17 pm

    On the question of Brazilian Portuguese vs. European Portuguese:

    That's a question I am not sure I have a good answer to, but the difference in FTR frequency is quite stark in weather forecast data. So, for example, a Brazilian weather forecast will read:

    "Neste sábado, o sol aparece entre muitas nuvens no leste de Santa Catarina e do Paraná…"

    where using "aparece" in an unmarked form is much less common in EU Portuguese forecasts…

  5. Eugene van der Pijll said,

    February 20, 2013 @ 6:46 pm

    @Jason: The German weather forecast does support the quoted percentage; in the entire Wetteraussichte of 20.02.2013, 23:09, I haven't found a single future tense in more than 2 minutes. Note that 'werden' is not only used to mark the future tense; it also means 'to become', and is used that way several times. E.g. at 1:05, "wobei dann allerdings die Wolkenluecken hier im Westen auch kleiner werden", "in which case then, however, the cloud gaps become smaller here in the west as well". If it has been "…kleiner sein werden", it would have been a future tense "will be smaller".

    Your Dutch link goes to a Belgian site; I don't know if any distinction has been made between Dutch and Belgian sources; the Flemish often use different phrases; I can imagine that forecasts by the official weather institute (in Brussels) would be heavily influenced by (or even translated from) the local French language. The savings statistics are on a national level, and so the analysis should keep the Netherlands and Belgium separate.

    At this moment, under the heading "vannacht" (tonight) on the page you link to, there are two future markers: "Het gaat overal vriezen", "It is going to freeze everywhere"; and "Door de matige wind uit oostnoordoost tot noordoost zal de gevoelstemperatuur in het centrum rond -5 graden liggen. ", "Because of moderate wind from ENE to NE, the wind chill temperature will lie around -5 degrees in the center."

    I haven't found any such future marker in the current forecast on the site of the Dutch official institute (, but there is one at one of the largest commercial weather offices (Meteo Consult, I have no idea if these sites are representative of a typical Dutch weather report.

  6. Jason Merchant said,

    February 20, 2013 @ 7:36 pm

    @Eugene: The report you watched is a different (newer) Wettervorhersage: the one I watched had a woman in the studio talking to a weather reporter outside in the wind, and the reporter used "werden" as the future auxiliary at least three times (one was in "Donnerstag wird es Schnee geben" "Thursday there will be snow" and the other two I don't remember, and I didn't watch the bulletin to the end). While I didn't count the total number of sentences or verbs, the percentage wasn't 0%. The Belgian one has also changed since this morning, but the one I remember was the "…zal … liggen." This would be perfectly normal Dutch in the Netherlands, just as well as in Belgium (which is good, since Chen reports that they narrowed the google searches by using google's language filter, which lumps Belgian and Dutch webpages together). In either case, there certainly is a difference between Dutch/German on the one hand and Greek on the other: it's just not as categorical as Chen's numbers might lead someone who doesn't speak all three languages to believe.

  7. David J. Littleboy said,

    February 20, 2013 @ 11:30 pm

    So does all this mean that when a people's savings rate changes significantly (e.g. as Japan's has over the period I've lived here), you'll see a change in how they use future tenses?

    (Personally, this sounds to me like fodder for an Ignoble, but that's just my opinion.)

  8. Adrian said,

    February 23, 2013 @ 7:17 am

    Today's BBC News article: I still find the whole thing quite ridiculous.

RSS feed for comments on this post