I don't understand spell-checkers

« previous post | next post »

Steffi Lewis asked whether this sentence (which, as she says, is attributed to Chico Marx) is well analyzed: Time flies like an arrow; fruit flies like a banana.

I answered as follows (with apologies to syntacticians for the casual low-class nontechnical description):

In the sensical version of the sentence, "time" is a noun phrase and "flies like an arrow" is a verb phrase (with "like an arrow" an adverbial modifier of the verb "flies"), while "fruit flies" is a noun phrase and "like a banana" is a verb phrase (with "a banana" as the object of the verb "like").  In the nonsensical version of the sentence, you just reverse those two analyses.


The system I was typing the response on uses a spell-checker, which objected to sensical — and I can't really blame it for that, because I sort of made it up…although I got three hits for it when I googled it just now, so (as I already knew) I'm obviously not the only person to make up that word, and besides, I find that there's an obsolete word sensical in the Oxford English Dictionary.  Anyway, the spell-checker's complaint about sensical didn't bother me.  But it also objected to analyses, and this seems very weird.  I assume it wanted analysis instead; but can someone more expert in spell-checking than I am tell me why on earth the spell-checker wouldn't be trained to recognize the plural?  What did it expect, analysises?  — Maybe it did: I just googled that, and got 76,100 hits for it.  But at least Google asked if I meant to google analyses instead, and for analyses I got 68,000,000 hits.  So if Google knows about analyses, why doesn't the spell-checker?  (I am sorry to have to report that  that the Language Log spell-checker is also objecting to analyses, which it has underlined in red on every occurrence in my draft of this post.  The shame!)  (It doesn't like analysises either.  So I conclude that spell-checkers don't want you to have more than one analysis.)



40 Comments

  1. Mark Liberman said,

    September 4, 2009 @ 10:53 am

    WordPress is one of the rare pieces of free software that's worth every penny you pay for it.

    At least, that's true of the portions of code that manage the interface for posting and commenting.

  2. Mark P said,

    September 4, 2009 @ 10:56 am

    Do your spell checkers like theses?

  3. Oskar said,

    September 4, 2009 @ 10:58 am

    I can't imagine this being anything other than a word-list issue, the dictionary simply didn't have an entry for "analyses" (which I can happily report that the Google Chrome spell-check dictionary does), and "analysis" was right there at a Levenshtein distance of 1, so that's what it suggested.

    While I don't know this for a fact, I seriously doubt that the spell-checkers try to pluralize a word algorithmically, to check it's correctness. There's so many different exceptions and different ways to do that that it is simpler to just have one big master list, which includes both the singular and plural forms of the nouns.

  4. Bloix said,

    September 4, 2009 @ 11:12 am

    Does your nose run? Do your feet smell? You're built upside-down!

  5. Jonathan Lundell said,

    September 4, 2009 @ 11:19 am

    What Oskar said.

    What spellchecker was this? Does it propose alternative spellings? I tried a couple (the OS X system checker and what I believe is a separate checker in MS Office) and neither objected.

    [(myl) I believe that Sally means the spell-checker built into the WordPress posting interface, which (like pretty much everything else about that posting interface) sucks. ]

  6. Stephen Jones said,

    September 4, 2009 @ 11:30 am

    You haven't mentioned which software you're using. Mozilla Firefox basically uses the same dictionary as Open Office, and the way it works is that there's an affix file which states the most common morphological changes. So there will be a rule that says that you can add s for a plural and one that says when you add 'es', and another one for when you change 'y' to 'ies', and a rule for when you add -ed and another for when you can prefix 'un-' and so on. Now they must remember to add the letter for the affix rule to the dictionary entry.

    However the US dic has a minimum of affix rules, and in general opts for having a separate entry for both words, which is why you have theses and antitheses and syntheses but not chemosyntheses. The UK spell check had a much longer affix file and as a result a much shorter dictionary but the affix file was never seriously completed so in fact the US spell checker is more accurate.

  7. Stephen Jones said,

    September 4, 2009 @ 11:42 am

    What is needed is for someone (any poor sods like to volunteer?) to deal with the US dictionary and affix files. As no one individual will realize enough of the missing words, it would need people from various sites such as language log to post their suggestions as to missing words or rules to a central site, and then the amended file to be uploaded to Open Office and Mozilla.

    The only thing I don't know is the state of play regarding ownership of the dictionaries. Open Office keeps the files more updated than Mozilla does, so maybe there has been some kind of branching. The guy in charge of the dictionaries seemed to throw in the towel in what looked suspiciously like a fit of pique some time ago, so I'm wary of venturing in.

  8. James said,

    September 4, 2009 @ 11:52 am

    If the spellchecker in question is targeting US English, it could be a deliberate omission, with the intention of capturing "analyses" as a misspelling of "analyzes". It's possible that instances of the third person singular present of the verb "analyze" so vastly outnumber instances of the plural for the noun "analysis" that this is a rational strategy, but I haven't looked at any numbers.

  9. Jonathan Badger said,

    September 4, 2009 @ 11:54 am

    I've run into this issue myself. I always get worried that "analyses" is misspelled (and as I type this in Firefox, I see it underlined). It's quite annoying, as it is quite a common word!

  10. Sili said,

    September 4, 2009 @ 12:06 pm

    An argument can be made for not putting a word in the wordlist if it's identical to a common misspelling. As usual my induced Alzheimer's ensures that I can't recall the example I've come across, myself.

    But I doubt that using "analyses" for "analysis" is common. (Particularly for people with the non-spelling pronunciation).

  11. Theo Vosse said,

    September 4, 2009 @ 1:16 pm

    Indeed, low frequency words are usually omitted from less advanced spell checkers, because of the false positives. There are several ways to overcome the problem, but they require more intelligence (which usually brings other problems along, such as computational power) or more user interaction. For this case (morphological variation), a context-aware spell checker could be useful.

  12. Boris said,

    September 4, 2009 @ 1:45 pm

    I always disliked the Time flies like an arrow; fruit flies like a banana example. First, with the intended meaning of the first clause, in what way does time fly in a similar way to an arrow? This always bugged me.

    But the bigger problem is the grammatical structure of the second clause. Is it really grammatical to use it in the sense that fruit flies eat bananas? I know that subject-verb-object agreement is flexible for some people, but even putting that aside, I don't think I would ever say "I like a banana". If I were making a general statement, I would say "I like bananas". I could say "I like this banana" for a specific banana. I could say "I'd like a banana" if I wanted one now. I could even say "I like a [nice?] banana in the morning". The only way I can sort of see it is in a larger context:

    Person 1: "Don't you like bananas?"
    Person 2: "Oh, I like a banana. That one over there."

    or "I like a banana when it's yellow"

    But even this is somewhat contrived.

    Doing a Google search and checking the first 20 of the 85 real results (as opposed to the 3,070,000 fake ones),

    8 are irrelevant: "I like a banana bread made with oil and this one has a delightful texture"
    1 is a discussion similar to one I am posting here: "I like a banana. — This sounds ilke there is a particular banana that you are fond of – this is not a very likely scenario."
    3 are like my second "somewhat contrived" example: "I like a banana when the skin starts to look like a Jackson Pollock painting"
    2 are clearly incoherent: "i like a banana smoothies… do somebody does?"
    1 is going for some sort of poetic feel.
    2 work because of context like my "I like a [nice?] banana in the morning": "fuel 'em! the morning of the race, lots of folks like half a bagel with peanut butter. I like a banana myself."
    3 seem to be of the type I don't like, though it's a bit better in context: "I like a banana b/c it has both fast and slow digesting carbs"
    1 comes from a site trying to teach English and written by a non-native speaker. It's the only one that has "I like a banana" as a complete sentence.

    "I like bananas" has a reported 476,000 hits and I verified up to 400, even after removing the name of a song I hadn't heard of before. The first result is "ilikebananas.com". Six of the top 20 are complete sentences.

  13. Mark P said,

    September 4, 2009 @ 1:54 pm

    @Boris – Physicists talk about the "arrow of time."

  14. Boris said,

    September 4, 2009 @ 2:06 pm

    @Mark P – The arrow of time has nothing to do with the the manner in which time passes (whatever that means). It's just an arrow of the axis.

  15. Ellen said,

    September 4, 2009 @ 2:07 pm

    Time flies like an arrow: Swiftly.

  16. Philip Spaelti said,

    September 4, 2009 @ 2:16 pm

    Fruit flies like a banana. Especially when you throw it.

  17. Ray Girvan said,

    September 4, 2009 @ 2:17 pm

    which, as she says, is attributed to Chico Marx

    Just to be boring: doubtful. It's more commonly attributed to Groucho Marx, but that too is doubtful. The comparison of "time fliies … fruit flies" first appears in the mid-1960s in various computing texts (see Google Books) as an example of syntactic ambiguity in computing. The Yale Book of Quotations tracks the first attribution to Groucho Marx to Usenet's net.jokes in 1982, after which it spread all around the Net until it got set in stone by its inclusion in Stefan Kanfer's The Essential Groucho (2000).

  18. John Roth said,

    September 4, 2009 @ 2:46 pm

    I've seen that example enough that it didn't occur to me until just now that fruit does, indeed, fly like a banana. Of course, that tangles up one of the usual restrictions on metaphor – at least, I find comparing a collection to one of its members a bit doubtful, but at least it's not a syntactic ambiguity.

    John Roth

  19. Mr Fnortner said,

    September 4, 2009 @ 3:05 pm

    It is axiomatic that the very robustness of a spell checker is its weakness. After a point, more words in the lexicon mean more typos accepted as correct.

    I commend this page to you for all your plural needs: http://www.straightdope.com/columns/read/2139/what-is-the-plural-of-penis.

  20. Peter Taylor said,

    September 4, 2009 @ 3:35 pm

    Boris, how is 8 "I like a banana bread made with oil and this one has a delightful texture" irrelevant? If that makes sense then so does "I like a banana bread made with oil"; and here "banana bread made with oil" has exactly the same role as "banana" in the phrase to which you object, doesn't it?

    As for time and arrows: this could be a reference to arrows having a relatively flat trajectory (because they spin fast, and so there's a gyroscopic effect). Or it could be a reference to the speed with which time appears to pass.

  21. Boris said,

    September 4, 2009 @ 3:56 pm

    @Peter:

    It is the "made with oil" part that makes it ok because that defines "banana bread" as a category with "made with oil" a subcategory. It can be expanded to "I like a certain type of banana bread; the type made with oil". Once that's established, even "I like a banana bread" by itself becomes ok if it is interpreted as "I like a certain type of banana bread". I suppose, if we can have subcategories of "banana", that would be ok too. I sort of did that with the "yellow" example, but it's a stretch.

  22. Faldone said,

    September 4, 2009 @ 4:17 pm

    It's the idea of all those little time flies hovering around that one particular arrow that fascinates me. I haven't read Bill Cosby's book but I would hope that he has dedicated a chapter to that arrow.

  23. Ray said,

    September 4, 2009 @ 5:04 pm

    Firefox's default en-US dictionary is also missing "indices", "vertices", "combinations", and "parallelepipeds", judging from the contents of my persdict.dat file.

  24. Stephen Jones said,

    September 4, 2009 @ 6:03 pm

    persdict.dat is your personal dictionary; that is the additions to Firefox you have made to the spell check. That is kept in a separate file to the en-US.dic file and en-US.aff files (just to add to the fun they are called en_US.dic and en_US.aff in Open Office), which are the files used by everybody's spell check. I am presuming you know this and have added them to persdict.dat at some stage.

    The truth is nobody seems to have worked on the spell checks for some years; the quality is abysmal compared to the Word spell checks.

    After a point, more words in the lexicon mean more typos accepted as correct.

    An interesting point, but few or no spell checks take that into account, as the existence of cupertinos makes clear.

  25. Jean-Sébastien Girard said,

    September 4, 2009 @ 6:17 pm

    Regarding sensical, try the quotes I compiled at Wiktionary for size.

  26. Jerry Friedman said,

    September 4, 2009 @ 6:24 pm

    @Boris: Try searching for "I like a banana with" or "I like a banana now and then". They're not the same thing as just plain banana, but they're close enough for my sense of humor. Humor is very individual, though.

  27. Jonathan Lundell said,

    September 4, 2009 @ 8:04 pm

    @myl, wrt WordPress and spellchecking. Does WordPress have a spellchecker? I'm fairly sure that the spell checker I see right now (typing into the WP comment window) is part of Safari (which is to say: it's the OS X spellchecker), not WordPress. Ditto the posting window.

  28. dwight said,

    September 4, 2009 @ 8:15 pm

    @Sili

    I think James meant that an American English spellchecker would assume "analyses" is a misspelling of the present tense, third-person singular of the verb "to analyze" (this "misspelling" is probably common, given that it is correct for the British English verb "to analyse"). I don't think he meant that "analyses" is a common misspelling of "analysis."

  29. Nick Lamb said,

    September 5, 2009 @ 5:20 am

    Firefox isn't necessarily using its own dictionary. On Fedora and presumably some other Linux distributions, it is hooked into the system-wide dictionary and that knows about indices, analyses, vertices and so on. This also means that if you teach the spell-checker in the web browser about the word sensical then it is carried over to your email software and word processor.

  30. chris said,

    September 5, 2009 @ 10:10 am

    Fairly irrelevant but amusing: the line about the arrows and the bananas has been translated at least once into another language with total ignorance of the intended joke—a music journalist for German news magazine Der Spiegel once appreciatively cited a Townes Van Zandt lyric as follows:

    die lyrischen Texte verbinden tiefsinnige Erkenntnisse und ironische Schlichtheit: "Die Zeit fliegt wie ein Pfeil, und Obst fliegt wie eine Banane."

    Translation (somewhat less literal than the journalist's own effort):

    the lyrics combine profound insights with ironic simplicity: "Time flies the way an arrow does, and fruit flies the way a banana does."

    So it seems the nonsensical meaning of the second clause has a certain amount of appeal in its own right.

  31. Pat said,

    September 5, 2009 @ 11:03 am

    The plural of anecdote is not data; the plural of analysis is not analysiseses.

    Also: The arrow metaphor for time is to express its unidirectionality.

    Also also: The animal fruit fly interpretation never occurred to me. Shows my preference for slapstick, I guess, even if it's oblique.

  32. Sili said,

    September 5, 2009 @ 12:17 pm

    Thanks, dwight, but James' post hadn't gone up when I typed mine.

    I couldn't imagine writing "analyses" for "analysis" since I mistakenly have a schwa in the former.

    Once I saw what James had written it clicked into place that I (not surprisingly) hadn't thought about the verb.

  33. Dmajor said,

    September 5, 2009 @ 2:35 pm

    Spell checkers like theses. Spell baggers every couple of hours.

  34. Rijk said,

    September 5, 2009 @ 2:44 pm

    The word 'analyses' is simply not present in the version of the en-US dictionary file used by Firefox, which hasn't been updated since 2004. OpenOffice.org (and anyone else who bothers to keep up-to-date with the OOo version) uses a dictionary that has been updated in 2006, and this version adds a handful of words, including the subject of this blog. It also marks some dirty words so they will not show up as suggestions when correcting misspelled words.

    BTW, is no-one here bothered by the word 'spell-checker'? Shouldn't it be 'spelling-checker', to remove any Harry Potter thoughts?

  35. Tim said,

    September 5, 2009 @ 7:24 pm

    I've been using the word "sensical" for years, and it never occured to me that it might have been a product of my own imagination.

    And now the spellchecker has just informed me that, in fact, it is "occurred". Apparently, it is determined to destroy my worldview piece by piece.

  36. unekdoud said,

    September 5, 2009 @ 11:49 pm

    I am using a current version of the Firefox spellchecker. (spell-checker)
    The british english version accepts all the words mentioned above, and the american english version rejects 'analyses', 'chemosyntheses', 'vertices' and 'indices'. Apparently 'combinations' and 'parallelepipeds' have been fixed.

    I only use 'sensical' in the sense of 'common-sensical'.

    Oh, character-wise word verifiers! (character-group-wise?)

  37. andrew cave said,

    September 6, 2009 @ 9:54 am

    i like a banana with my cereal and after a hard day's work, men like a beer. It's a rather common form in Australia and I presume the quote uses it. BTW I am sure I have heard a recording a Groucho Marx signing off his TV show with the string. I believe it was his rote ending.

  38. Glen Goffin said,

    September 10, 2009 @ 1:24 pm

    "Time flies like an arrow; fruit flies like a banana"

    … and fruit flies follow. :)

  39. J said,

    September 10, 2009 @ 4:45 pm

    Sensical is a perfectly cromulent word, whatever spell-checker says.

  40. Jenn said,

    February 16, 2012 @ 8:36 am

    Sally,

    I ran this through the spell checker at http://www.spellcheckonline.com/ and it couldn't find any errors. Or did I miss something?

    Jenn

RSS feed for comments on this post