## More clbuttic idiocy from lexical censors on the web

According to Matthew Moore in the Daily Telegraph:

Google searches turn up 3,810 results for "clbuttic", 5,120 for "consbreastution", and 1,450 for "Buttociated Press".

Well, Language Log readers who had already read about the athletic feats of Olympic star Tyson Homosexual will immediately recognize the clbuttic symptoms, and will know what has gone on here. Surely, I was moved to think (but see the update below), surely someone who is being paid for writing filtering software should be able to distinguish instances of ass preceded and followed by other letters from instances flanked by non-letters such as spaces or punctuation. Not to get too nerdy about it, but for those acquainted with Unix editors like vi or sed, shouldn't a programmer know the difference between the s/ass/butt/g command (wrong) and the perhaps slightly more reasonable s/$[^a-z]$ass$[^a-z]$/\1butt\2/g instruction? This much was within the competence of even rank beginners by the sixth week of the linguistically-based freshman course on Unix that I used to teach at UC Santa Cruz.

Yet Moore mentions sites on which you can see discussions of embbutties dealing with pbuttport holders and even unconsbreastutional laws pbutted by a Congress butterting powers to buttbuttinate foreign leaders.

(He also brings up at the end the question of what is more offensive than what. Does it really sound less offensive for me to call these incompetent lexical censors buttholes rather than assholes?)

There is a link at the end of Moore's article where you can cite other cases you have seen. I leave that as a place for your comments rather than open comments below.

Update: I will just append this comment from Roger Lustig, however, since he has done a little investigation that very significantly undercuts Moore's article:

The "clbuttics" story may be a little exaggerated if not actually a web-legend. Sure, Google returns 4,000 hits–but by the time one reaches p.2 (in search of a page that isn't reporting on the silliness, or reporting on the reports, etc.) we're down to 200 hits.

Almost all of those 200 seem to have a "clbuttic mistake" by Apple at their core. Google's redundancy-compacting routines are only invoked when requested, it seems, and even then, the variety of information in 200 hits may be small.

In short, it's an echo chamber. 200 or 4,000 or however many hits today aren't as impressive as the same number last year, etc. All the more so as web sites of all kinds put randomly chosen (even Googled!) words out there just to game Google.

Many thanks to Roger for reminding us that (Daily Telegraph please note) one really must not believe everything one Googles! So much of what is out there turns out on closer examination to be humor, duplicate pages, quotations, mentions of errors in blog posts like this one, and so on. See this page on The Daily WTF (which Rachael Churchill also pointed out to me) for some information on where clbuttic came from. (The phrase "regex scripts", which appears there, means editing scripts that use the device of regular expressions in the manner of sed, mentioned above.) It begins to look like Matthew Moore may have taken the idea for his article from this WTF page (which dates from last February) without acknowledgment, and that he did no investigation of the hits turned up by his web searches. Another case of lazy journalist syndrome. Unfortunately there is now a new thread on The Volokh Conspiracy, triggered by the pre-update version of this very post, on which people are beginning to repeat old and dubious stories from the past, like the apocryphal case of a newspaper talking about getting the budget back in the African American: it seems never to have happened. For a thorough and scholarly post on this and related topics, see Ben Zimmer's "Incorrections in the newsroom: Cupertino and beyond".