{"id":429,"date":"2008-08-01T14:55:53","date_gmt":"2008-08-01T18:55:53","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=429"},"modified":"2008-08-02T11:16:32","modified_gmt":"2008-08-02T15:16:32","slug":"pharyngula-minutes","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=429","title":{"rendered":"Pharyngula minutes"},"content":{"rendered":"<p>A graph of the current Google hit counts for \"N minutes\", 2 \u2264 N \u2264 66, expressed as a proportion of the total hits for all 65 searches, looks like this:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes0.gif\"><img decoding=\"async\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes0.gif\" alt=\"\" width=\"475\" \/><\/a><\/p>\n<p>(As usual, click on the image for a larger version.)<\/p>\n<p><!--more--><\/p>\n<p>There seem to be several different things going on here: a preference for multiples of 5, 10, and 15; a preference for smaller numbers; and perhaps some other factors as well.<\/p>\n<p>You could model this sort of distribution with the kind of approach that Tenenbaum and Griffiths used to predict how people generalize from (small) sets of numbers (\"<a href=\"http:\/\/web.mit.edu\/cocosci\/Papers\/tenenbaum_griffiths01.pdf\">Generalization, similarity, and Bayesian inference<\/a>\", <em>Behavioral and Brain Sciences<\/em>, 24: 629-641, 2001). In fact, for exploring the structure of certain kinds of cognitive\/cultural spaces, web-text counts might be better than lab-subject responses, since web text samples are certainly larger and arguably more representative. (There are plenty of obstacles &#8212; certainly Google counts are not really reliable enough for this purpose &#8212; but never mind for now&#8230;)<\/p>\n<p>Distributional patterns of this kind can also be used to explore cognitive and cultural differences. For example, if we compare the overall web frequency of N = 5, 10, 15, &#8230;, 65 in \"N minutes\" with the relative frequency of N in the Google search patterns {X \"N minutes or less\"} for X = \"recipe\", \"learn\", and \"muscles\", we get this:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes1.gif\"><img decoding=\"async\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes1.gif\" alt=\"\" width=\"475\" \/><\/a><\/p>\n<p>The distribution for N in {learn \"N minutes or less\"} (e.g. \"Tech savvy in 15 minutes or less\") and {muscles \"N minutes or less\"} (e.g. \"instant fitness workout in 15 minutes or less\") follow the general web distribution for {\"N minutes\"} (the yellow line) pretty closely, except for a bit of enrichment at N=30 (and maybe a slight muscle bulge at N=20). But {recipe \"N minutes or less\"} is more different from the background: the counts for N=5 and N=10 are quite a bit lower, while N=30 is much higher.<\/p>\n<p>The dearth of recipes at N=5 and N=10, I speculate, reflects a victory of realism over marketing: readers really will try out recipes, and notice if they take a lot more time than advertised. And the recipe sweet spot at N=30 probably reflects the fact that 30 minutes is the largest quantum of time that most people still generally view as small.<\/p>\n<p>But anyhow, it's not because of recipe marketing that I decided to devote this morning's Breakfast Experiment\u2122 to the contextual distribution of N in \"N minutes\" &#8212;\u00a0 instead, you can blame it on PZ Myers. Reading the Pharyngula archives a few days ago, it seemed to me that Prof. Myers (and his commenters) used a few values of \"N minutes\" &#8212; especially <a href=\"http:\/\/www.google.com\/search?hl=en&amp;safe=off&amp;q=site%3Ascienceblogs.com\/pharyngula+\">N=5<\/a> and <a href=\"http:\/\/www.google.com\/search?hl=en&amp;safe=off&amp;q=site%3Ascienceblogs.com%2Fpharyngula+%2210+minutes%22&amp;btnG=Search\">N=10<\/a> &#8212; unusually often and in a somewhat characteristic way.<\/p>\n<p>Now, I've pointed out in the past that the verbal tics that we perceive as \"characteristic\" of particular people are often very low-frequency events (see e.g. \"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/000648.html\">And yet<\/a>\", 3\/28\/2004; \"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/001093.html\">Per usual<\/a>\", 6\/21\/2004;  \"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/002100.html\">Strange bookfellows<\/a>\", 5\/26\/2005; \"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/003123.html\">Deep in the Hookergate weeds<\/a>\", 5\/8\/2006; \"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/005059.html\">Cold comfort for whomever<\/a>\", 10\/26\/2007). On the other hand, we've also seen that subjective estimates of relative frequency can be way off the mark, quantitatively (\"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/001518.html\">What 'a hundred times' means<\/a>\", 10\/2\/2004) and even qualitatively (\"<a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/001778.html\">Near? Not even close<\/a>\", 1\/2\/2005).<\/p>\n<p>In this case, a quick check suggests that my subjective reaction fell into the \"complete crock\" category:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes2.gif\"><img decoding=\"async\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes2.gif\" alt=\"\" width=\"475\" \/><\/a><\/p>\n<p>Overall, Pharyngula's distribution of values for N in  \"N minutes\" seems to be reasonably close to the overall web norm &#8212; certainly closer than the distribution of N for recipes executable in \"N minutes or less\" is. There's a small pharyngular excess at N=5 and N=10, and a small deficit at N=30, but the differences don't seem very impressive.<\/p>\n<p>But wait &#8212; if we plot the data a little differently, the differences look bigger, and it seems more plausible that I might have been picking up on something real:<\/p>\n<p><a href=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes3.gif\"><img decoding=\"async\" src=\"http:\/\/languagelog.ldc.upenn.edu\/myl\/Minutes3.gif\" alt=\"\" width=\"475\" \/><\/a><\/p>\n<p>OK, looking at graphs is all very well, but what's the Right Way to evaluate a hypothesis like \"text at scienceblogs.com\/pharyngula has a different distribution of values for N in 'N minutes' than the web at large does\"? (I mean, besides saying \"who cares?\" and moving on.) This is trickier than you might think, especially if at this point you're trying to remember how to perform a Kolmogorov-Smirnov two sample test, because it's not easy to be precise enough about what question(s) we really want to answer.<\/p>\n<p>But I've already run out of breakfast time for today &#8212; in fact, I had to finish this write-up over lunch &#8212; so my (attempt to provide an) answer will have to wait for another morning.<\/p>\n<p>[Update:\u00a0 PZ Myers attempts that characteristic artefact of the internet age, an overt attempt to elicit the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Observer%27s_paradox\">Observer's Paradox<\/a> (\"<a href=\"http:\/\/scienceblogs.com\/pharyngula\/2008\/08\/its_42_minutes_after_7.php\">It's 42 minutes after 7<\/a>\", 8\/2\/2008).\u00a0 But it's too late, PZ, unless you go back and edit your copious archives! And really, it could be worse: consider the <a href=\"http:\/\/itre.cis.upenn.edu\/~myl\/languagelog\/archives\/000361.html\">sad fate of Ronald Frobisher<\/a>.]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A graph of the current Google hit counts for \"N minutes\", 2 \u2264 N \u2264 66, expressed as a proportion of the total hits for all 65 searches, looks like this: (As usual, click on the image for a larger version.)<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-429","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/429","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=429"}],"version-history":[{"count":0,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/429\/revisions"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=429"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=429"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=429"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}