I'm still mulling over the blockbuster "culturomics" paper published in Science last week and ably addressed here by Geoff Nunberg and Mark Liberman. I'll have more to say about aspects of the paper having to do with the size of the English lexicon, but in the meantime let me direct you to my latest Word Routes column on the Visual Thesaurus, which takes up the more superficial question of nomenclature: both culturomics and ngram (as in the Ngram Viewer) are less than transparent to non-specialists (and even trouble some specialists). An excerpt follows below.
The authors of the Science paper, "Quantitative Analysis of Culture Using Millions of Digitized Books " (free registration required), define culturomics as "the application of high-throughput data collection and analysis to the study of human culture." The culture part of culturomics is straightforward enough, but what about the -omics? Many observers in the wake of last week's publicity barrage have been stymied by that. The esteemed language expert David Crystal, for instance, initially surmised on his blog that culturomics is "presumably based on ergonomics, economics, and suchlike." Dan Clayton, a British language researcher (and friend of the VT) similarly speculated that the new word is "a blend of culture and economics, with a bit of linguistics thrown in."
Full disclosure: I was lucky enough to get a preview of the Science paper a couple of months ago from a presentation by the lead researchers, the young Harvard scholars Jean-Baptiste Michel and Erez Lieberman-Aiden, so by the time the paper was published last week I had advance warning about culturomics. And I already knew that it was intended to be pronounced with a long "o" (cultur-OH-mics), a clue that it has nothing to do with economics or ergonomics. Rather, the model is genomics: the study of organisms in terms of their full DNA sequences, or genomes.
Further disclosure: among my other comments on their paper presentation, I told Jean-Baptiste and Erez that I didn't think culturomics was the most felicitous choice for the new field of study they envisioned. The connection to genomics might be apparent to those in the biosciences who have already seen the proliferation of other words ending in -omics, such as proteomics, the study of the proteome (the full set of proteins encoded by a genome). This Wikipedia page lists a raft of other -omics topics, such as connectomics, interferomics, and transcriptomics. But despite the large number of -omics coinages in biology and allied sciences, a lay audience would not immediately pick up on the meaning of the suffix, especially if they only see the word in print rather than hearing the tell-tale long "o" sound.
Read the rest here.