## Industrial bullshitters censor linguists

A bullshit lie detector company run by a charlatan has managed to semi-successfully censor a peer reviewed academic article. And I don't like it one bit. But first, some background, and then we'll get to the censorship stuff.

Five years ago I wrote a Language Log post entitled "BS conditional semantics and the Pinocchio effect" about the nonsense spouted by a lie detection company, Nemesysco. I was disturbed by the marketing literature of the company, which suggested a 98% success rate in detecting evil intent of airline passengers, and included crap like this:

The LVA uses a patented and unique technology to detect "Brain activity finger prints" using the voice as a "medium" to the brain and analyzes the complete emotional structure of your subject. Using wide range spectrum analysis and micro-changes in the speech waveform itself (not micro tremors!) we can learn about any anomaly in the brain activity, and furthermore, classify it accordingly. Stress ("fight or flight" paradigm) is only a small part of this emotional structure

The 98% figure, as I pointed out, and as Mark Liberman made even clearer in a follow up post, is meaningless. There is no type of lie detector in existence whose performance can reasonably be compared to the performance of finger printing. It is meaningless to talk about someone's "complete emotional structure", and there is no interesting sense in which any current technology can analyze it. It is not the case that looking at speech will provide information about "any anomaly in the brain activity": at most it will tell you about some anomalies. Oh, the delicious irony, a lie detector company that engages in wanton deception.

## Another departure

I learn here that that John McIntyre (whose name has often come up in these parts) has now left the Baltimore Sun. Yet another language writer on a newspaper (who was not merely retailing peeves — quite far from that, in John's case) to bite the dust. I hope that we will hear from him in another venue soon.

[(myl) It didn't take long: as of April 30, 2009, John was blogging again at http://johnemcintyre.blogspot.com/, still under the title "You Don't Say". Welcome back! ]

## A BIG baseball book

A little while back, a representative of the publishers of the third edition of Paul Dickson's Baseball Dictionary wrote to offer me a free copy, in the hope that I would review the book on Language Log. I replied that I was an idiot about baseball — yes, I know, this totally undercuts any claim I might have to being a real American man, but I coped with that long ago — and so was not the person they wanted to take on this task.

But I did buy the book, because I knew that Dickson's dictionary was a work of serious lexicographic scholarship (with careful citations and thoughtful definitions, the sort of thing that could be accommodated in a revision of the OED). Many specialized dictionaries are not like this, and for good reason: in many domains, the evidence for usages in written texts is very hard to come by, and very spotty.

Irving John "Jack" Good, who died on April 5 at the age of 92, is best known to linguists as the author of a paper on mathematical ecology. The paper is I.J. Good, "The Population Frequencies of Species and the Estimation of Population Parameters", Biometrika 40(3-4) 237-264 (1953), and its abstract reads as follows:

A random sample is drawn from a population of animals of various species. (The theory may also be applied to studies of literary vocabulary, for example.) If a particular species is represented r times in the sample of size N, then r/N is not a good estimate of the population frequency, p, when r is small. Methods are given for estimating p, assuming virtually nothing about the underlying population. The estimates are expressed in terms of smoothed values of the numbers nr (r = 1, 2, 3, …), where nr is the number of distinct species that are each represented r times in the sample. (nr may be described as `the frequency of the frequency r'.) Turing is acknowledged for the most interesting formula in this part of the work. An estimate of the proportion of the population represented by the species occurring in the sample is an immediate corollary. Estimates are made of measures of heterogeneity of the population, including Yule's 'characteristic' and Shannon's 'entropy'. Methods are then discussed that do depend on assumptions about the underlying population. It is here that most work has been done by other writers. It is pointed out that a hypothesis can give a good fit to the numbers nr but can give quite the wrong value for Yule's characteristic. An example of this is Fisher's fit to some data of Williams's on Macrolepidoptera.

## Ask Language Log: "The first" ambiguity

James Dreier wrote:

Your posting [about and ambiguity] made me remember that I had a question, also involving ambiguity, though I think this one is quite a bit harder.  "Who was the first president born in the twentieth century?"

JFK was born 5/29/1917
LBJ was born 8/27/1908

Thus JFK was president first, but LBJ was born first.

The sentence is, of course, a trivia question — it appeared in GAMES magazine. A reader wrote in to complain that the magazine had given the wrong answer (they said it was JFK). My view is that the question is genuinely ambiguous, but I don't know how to argue for this conclusion.

## Popular perceptions of lexicography: MADtv edition

Last December, an episode of Comedy Central's "Sarah Silverman Program" revolved around fanciful neologisms, culminating in a scene where the editors of the Oxford English Dictionary anoint their latest entries in a "Word Induction Ceremony." The FOX sketch comedy show "MADtv" (now in its final season) imagines the lexicographers of "Webster's Dictionary" announcing new words in a far less celebratory mood. Here (for the time being, at least) is a YouTube clip bringing together the three-part sketch and one outtake:

## Ask Language Log: an and ambiguity

In this morning's mail:

My friend and I are avid Language Log readers. We were recently conversing over IM, and she was telling me about her boyfriend's great-aunt. Among the things she mentioned:

"She worked when women didn't work very much and never got married."

I interpreted her statement as my friend alluding to a time when women both didn't work and did not get married. After a few moments, I realized she was telling me that the great-aunt had a job and never got married; "when women" only modified "didn't work very much." We are unsure which reading is technically correct and therefore decided to ask.   Any insight you could provide would be greatly appreciated.

I'm not a syntactician, but I usually take the morning shift here at Language Log Plaza, so I'll do my best with this one — luckily, it seems pretty straightforward.

## The syntacticians' hotel

… or possibly the computational complexity theorists'. In any case, the NP Hotel (also known as the N.P. Hotel), on 6th Ave. S. in Seattle:

## Preventing Explanatory Neurophilia

A paper that I've recommended several times: Deena Skolnick Weisberg, Frank C. Keil, Joshua Goodstein, Elizabeth Rawson, & Jeremy R. Gray, "The seductive allure of neuroscience explanation", Journal of Cognitive Neuroscience 20(3): 470-477, 2008.   Popular presentations can be found in an article by Paul Bloom in Seed Magazine, "Seduced by the Flickering Lights of the Brain", 6/27/2006, and in two LL posts, "Blinded by neuroscience", 6/28/2006, and "Distracted by the brain", 6/6/2007.

## Now anyone can watch The Linguists

As I announced on Thursday, David Harrison was just here in the San Diego wing of Language Log Plaza to screen and discuss the film The Linguists, at UC San Diego on Thursday and at San Diego State University on Friday. Both events were hugely successful — a fantastic turnout of around 150 people at each screening. David then headed to Rutgers University (my graduate school alma mater, as it happens) for a similar event during Rutgers Day on Saturday, where I'm sure the turnout was also great.

In case you missed all of these screenings, or if your PBS station didn't air it (or you don't get even have a PBS station!), or if you just want to see it again, the film is streaming for a limited time at Babelgum. Click and watch!

## Comic profanity

Two items: a Rubes cartoon (by Leigh Rubin) on avoidance characters in cartoons, and a story from a while back on taboo vocabulary in a Batman comic.

## Misnegation in the Encyclopedia Britannica

Breffni O'Rourke has contributed a lovely specimen to our growing collection of cases where combinations of negations and scalar predicates leave writers and readers in a state of confusion. This one is from the EB section on the 14th and 15th centuries in Ireland (full path "Ireland:History:First centuries of English rule (1166-1600):The 14th and 15th centuries"):

Although both the Gaels and the Anglo-Irish had supported the Yorkist side in the Wars of the Roses, the Yorkist king Edward IV found them no less easy to subjugate than had his Lancastrian predecessors. Succeeding in 1468 in bringing about the attainder and execution for treason of Thomas, earl of Desmond, Edward was nevertheless obliged to yield to aristocratic power in Ireland. The earls of Kildare, who thereafter bore the title of lords deputy (for the English princes who were lords lieutenant), were in effect the actual rulers of Ireland until well into the 16th century.

## Conditional entropy and the Indus Script

A recent publication (Rajesh P. N. Rao, Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari, and Iravatham Mahadevan, "Entropic Evidence for Linguistic Structure in the Indus Script", Science, published online 23 April 2009; also supporting online material) claims a breakthrough in understanding the nature of the symbols found in inscriptions from the Indus Valley Civilization.

Two major types of nonlinguistic systems are those that do not exhibit much sequential structure (“Type 1” systems) and those that follow rigid sequential order (“Type 2” systems). [...] Linguistic systems tend to fall somewhere between these two extremes [...] This flexibility can be quantified statistically using conditional entropy, which measures the amount of randomness in the choice of a token given a preceding token. [...]

We computed the conditional entropies of five types of known natural linguistic systems [...], four types of nonlinguistic systems [...], and an artificially-created linguistic system [...]. We compared these conditional entropies with the conditional entropy of Indus inscriptions from a well-known concordance of Indus texts.

We found that the conditional entropy of Indus inscriptions closely matches those of linguistic systems and remains far from nonlinguistic systems throughout the entire range of token set sizes.

So proclaims the cover of Michel Brûlé's "Essai sociologique" Anglaid: Une langue irrémédiablement vouée à l’impérialisme et à l’ethnocentrisme ("English: A language irremediably devoted to imperialism and ethnocentrism"), in a photographed scrawl that reminds me of the shots in the movie A Beautiful Mind of John Nash's study walls during his descent into schizophrenia.

