Gelman and Picasso

« previous post | next post »

I very much enjoyed Andrew Gelman's post "Bayesian statistical pragmatism" (4/15/2011) on his blog Statistical Modeling, Causal Inference, and Social Science. And one aspect of that post struck me as especially relevant to some recent LL discussions:

I am surprised to see Kass write that scientists believe that the theoretical and real worlds are aligned. It is from acknowledging the discrepancies between these worlds that we can (a) feel free to make assumptions without being paralyzed by fear of making mistakes, and (b) feel free to check the fit of our models (those hypothesis tests again! Although I prefer graphical model checks, supplanted by p-values as necessary). All models are false, etc.

I assume that Kass is using the word "aligned" in a loose sense, to imply that scientists believe that their models are appropriate to reality even if not fully correct. But I would not even want to go that far. Often in my own applied work I have used models that have clear flaws, models that are at best "phenomenological" in the sense of fitting the data rather than corresponding to underlying processes of interest–and often such models don't fit the data so well either. But these models can still be useful: they are still a part of statistics and even a part of science (to the extent that science includes data collection and description as well as deep theories).

I wrote the passage above nearly two months ago, in the middle of April, as the beginning of a post that I never finished. The "recent LL discussions" that I had in mind then were "Word order 'universals' are lineage-specific?", 4/15/2011, and "Phonemic diversity decays 'out of Africa'?", 4/16/2011; but Gelman's post also obviously applies to "Norvig channels Shannon contra Chomsky", 5/31/2011, and "Straw men and bee science", 6/4/2011.

And all of this reminds me of something I wrote seven years ago about an earlier controversy in computational phylogeny — "More on Gray and Atkinson", 4/28/2004:

In thinking about the general problem, an analogy with physics may be helpful. If we assume that the sun, planets and other heavenly bodies are point masses in calculating their orbital dynamics, our model is obviously false to fact. But does this simplification invalidate our conclusions? Well, it might or might not, depending on what calculations we do and what conclusions we want to draw. Any model of orbital dynamics will be simplified — and therefore false — to one extent or another. The question is whether this matters with respect to some specific quantitative or qualitative prediction. Giving a correct answer to that question requires a mixture of detailed mathematical reasoning, relevant empirical testing and luck.

One of Russell Gray's slides made this point by quoting the well-known scientific proverb that "A model is a lie that leads us to the truth". I believe that this was originally adapted (by whom?) from a remark made by Picasso:

"We all know that art is not truth. Art is a lie that makes us realize truth, at least the truth that is given us to understand. The artist must know the manner whereby to convince others of the truthfulness of his lies." (The Arts, Picasso Speaks, 1923)

Yesterday Russell Gray made considerable headway in convincing me of the validity of his approach. His talk, and the discussion around it, clarified for me the nature of the simplifying assumptions that he's making, and the (empirical and logical) questions to be addressed in determining whether those simplifications invalidate his conclusions about the dating of Indo-European. He also convinced me that he's continuing a serious program of efforts to test the effects of his assumptions, and that he's serious about understanding and addressing objections. In other words, he's doing science.


  1. jmmcd said,

    June 9, 2011 @ 8:26 am

    > In other words, he's doing science.

    Exactly. This is why the conflict between Chomsky and Norvig is unnecessary: there's room for both approaches. Science is a broad church.

  2. R. Wright said,

    June 9, 2011 @ 11:05 am

    Although I prefer graphical model checks, supplanted by p-values as necessary.

    I wonder if he meant "supplemented."

  3. Ben Hemmens said,

    June 9, 2011 @ 3:07 pm

    p-values "supplanting" things is genius. That's what they do. Usually they supplant what any fool can see with the naked eye, the unfortunate fact that the experiment has not revealed a clear effect, and is only significant if there is no unaccounted-for confounding effect that could change things by 10% (which of course in the real world, there always is).

    But on the subject of complicated vs. oversimplified hypotheses, the classic from my old field is the Monod-Wyman-Changeux model of allostery / cooperativity and the Koshland (for symmetry the Koshland-Némethy-Filmer) model.

    The phenomenon is about how the binding of one ligand to a protein affects the binding of further ligands (or additional molecules of the same ligand) to other sites on the same protein. There are plenty of examples where a protein flips fairly sharply from empty to full over a very narrow concentration range of the ligand, and this sharp switching is very important for a lot of physiological processes as we know it.

    Long story short: the first model had the subunits of the protein flipping between two states. It had a fairly simple equation such that you actually had a prayer of estimating the constants involved experimentally. The other model allowed for a few intermediate states, which if you multiply them by the number of subunits, each taking its own path between them, leads to a lot of states of the whole protein. In theory it's truer and could cope with a wider range of mechanisms (modeling them more precisely), but in practice you don't have a chance of estimating the plethora of constants involved. 40-odd years later, the MWC model is a workhorse used in labs everywhere and the other one is a footnote in undergraduate textbooks.

    Back to language. Descriptivism all fine and good. But when it comes to people learning (say) English as a foreign language, it's no good at all trying to tell them that well, you know, there's a spectrum of what is acceptable Standard English and normally it's this, but that is quite alright in its way and the other occurs in long-established varieties X and Y. No by god they want RULES and they want them NOW and they want them nice and simple. Which means: consciously telling them things that are, well, just wrong viewed looking down from the Pullumosphere is actually a necessary activity.

    Of course, we shall remember to tell the very advanced students that certain style guides are toxic little compendia of wrong grammatical advice. But the mass of people learning to use English for their everyday business have a long way to go before even picking up a real live style guide written for native speakers, and that way is plastered with many rough rules of thumb that (hopefully) help them for a little while before being outgrown.

    It's great that the learners' dictionaries, etc., now have an empirical basis, but for the learners, they are effectively bibles.

    So here's to the art of simplification.

  4. Ben Hemmens said,

    June 9, 2011 @ 5:29 pm

    for plastered (hic!) read paved

  5. Hermann Burchard said,

    June 9, 2011 @ 5:51 pm

    Regarding MYL quoting Andrew Gelman citing Stephen Cass(?): . . . that scientists believe that the theoretical and real worlds are aligned. It is from acknowledging the discrepancies between these worlds that we can . . .:

    This goes well with quoting myself (with abject apologies): "Reality is not what you think its is," from Found Sci [2005] where I write about the two worlds "L" for language and "U" for Universe. There is a double entendre here, which I still don't fully comprehend.

    Further quoting Stephen Cass referenced above by MYL (in "Norvig channels Shannon contra Chomsky") from "Unthinking Machines", Technology Review 5/4/2011, with an obligatory nod to Noam Chomsky: Some of the founders and leading lights in the fields of artificial intelligence and cognitive science gave a harsh assessment last night of the lack of progress in AI over the last few decades.

    The area of AI is too far out of my own interests to contribute, but a possible suggestion is in the above ideas concerning the two worlds, further sharpened by the observation that our brain has an internal model of the world of U, precisely in the world of L, long known in different neuro-psychological disciplines under various names, "global context" or "world model" for which neural correlates may have become known recently, for a detailed survey see my 2nd paper in Found Sci [Jan 2011]. There I propose the name "noumenal cosmos" for our internal world model in view of Kant's terminology of noumena vs phenomena for mental vs. external things.

    To make further progress in AI such a world model, even a primitive one, would have to be included and programmed, possibly a kind of DB with advanced features of mutual interrelations, perhaps using Reconfigurable computing (FPGAs).

  6. jer said,

    June 9, 2011 @ 8:24 pm

    My first year physics lecturer told us "Everything we teach you this year will be lies. Then next year we will show you how they were lies, then we'll tell you some more lies, but they'll be closer to the truth. And this will continue…" So this approach is explictly acknowledged in some sciences.

    Every model we learned, was presented with a clear analysis of the situations it applied to. No model is complete without its limits.

  7. David Green said,

    June 9, 2011 @ 9:44 pm

    A (perhaps) relevant quotation from Aristotle: ""It is the mark of an amateur to insist on a greater degree of accuracy than the subject permits."

  8. Ben Hemmens said,

    June 10, 2011 @ 4:37 am

    The great majority of engineering works are based on "lies": Newtonian mechanics patched up with a ragbag of heuristic fudges.

    Just think about that next time you get in a plane or drive across a big bridge ;-)

  9. Hey, I’m just like Picasso (but without all the babes)! « Statistical Modeling, Causal Inference, and Social Science said,

    August 7, 2011 @ 8:33 am

    […] just like Picasso (but without all the babes)! Posted by Andrew on 7 August 2011, 9:19 amSo says Mark Liberman. Filed under Art, Bayesian Statistics Comment (RSS)  |  Trackback […]

RSS feed for comments on this post