Text corrections

« previous post | next post »

Today's xkcd:

Mouseover title: "I like trying to make it as hard as possible. 'I'd love to meet up, maybe in a few days? Next week is looking pretty empty. *witchcraft'"

This is why word embeddings, and other applications of distributional semantics, often work but sometimes don't.



33 Comments

  1. SP said,

    July 25, 2020 @ 7:19 am

    Sometimes I wonder if Munroe makes comics like this hoping they’ll be picked up by Language Log.

  2. Yerushalmi said,

    July 25, 2020 @ 7:11 pm

    Am I the only one who initially slotted "couch" into the "horse" position before realizing?

  3. Philip Taylor said,

    July 26, 2020 @ 4:09 am

    At least you understood that the starred words were intended to be slotted in somewhere, Yerushalmi — until your comment, I had no idea what the cartoon meant at all.

    [(myl) How much texting do you do, and with correspondents of what age?]

  4. Frans said,

    July 26, 2020 @ 4:20 pm

    As a small aside on the age thing above, we already chatted like this 20 years ago.

  5. Andrew Usher said,

    July 26, 2020 @ 7:16 pm

    Even if you have never seen this before, the caption at the bottom pretty much explains it. This is obviously and extreme, invented case, and I too was not sure where the words should go at first.

    However, the order matters – if you think about the corrections in the order they are listed, it's hard to go wrong. This doesn't reflect how it would work in real life, but it follows:

    I'm gonna ride a horse on the beach at dawn ->
    *eat
    I'm gonna eat a horse on the beach at dawn ->
    *3am
    I'm gonna eat a horse on the beach at 3am ->
    *couch
    I'm gonna eat a horse on the couch at 3am ->
    *pizza
    I'm gonna eat a pizza on the couch at 3am.

    The error cited by Yerushalmi can't happen because 'ride' has already been removed before 'couch' arrives.

    k_over_hbarc at yahoo.com

  6. Philip Taylor said,

    July 27, 2020 @ 1:21 am

    Mark — " How much texting do you do, and with correspondents of what age?". None, absolutely none.

  7. GH said,

    July 27, 2020 @ 4:24 am

    An example, possibly fake, that's been going around the Internet for a few years is this text conversation (https://imgur.com/gallery/Awzvr):

    A: Big Dicks in his little brothers bum.
    A: Cat*
    B: what
    B: which word is supposed to be cat

  8. Philip Taylor said,

    July 27, 2020 @ 7:33 am

    Well, the "cat" example simply leaves me even more confused. Not just because there is no possible place where "cat" can be substituted for the existing word in order for the sentence to make sense, but because a Google search suggests that a very large number of people find it funny. Is it possible to explain what is funny about it ?

  9. GH said,

    July 27, 2020 @ 8:53 am

    In text conversations, typos and autocorrect occasionally combine to produce unintentionally inappropriate or absurd sentences. This is a frequent subject of Internet humor.

    The exchange appears at first glance to be an instance of this. We chuckle at the inappropriate, obviously garbled message, and expect the correction to resolve the confusion. Instead, it only adds to it, prompting us to attempt various substitutions, each of which is as absurd or horrific as the original and paints a different mental image. That's what's funny.

    The effect is heightened by the understated hysteria of the response ("which word is supposed to be cat"), where the lack of any punctuation paradoxically acts as a kind of exclamation, and the total lack of further explanation from the initial sender.

    I was reminded of it by the XKCD mouseover text: "I like trying to make it as hard as possible. 'I'd love to meet up, maybe in a few days? Next week is looking pretty empty. *witchcraft'"

  10. Alexander Browne said,

    July 27, 2020 @ 9:49 am

    I believe using an asterisk like this was already au courant on AOL Instant Messenger 15–20 years ago when I used it. I think I usually saw/see it prepended rather than at the end, as it is in the "Cat*" example.

    When I learned the linguistics use of marking reconstructed or ungrammatical words/expressions with an asterisk, I tried to switch to using a caret (^) instead for corrections (so "^Cat"). I thought it looks like an arrow pointing up at the mistake or maybe even an editor's mark. No one seemed to misunderstand me, but it also didn't catch on, even among my group of friends I chatted with from linguistics classes.

  11. Idran said,

    July 27, 2020 @ 10:54 am

    @Phillip Taylor: The humor is that the two obvious (and probably only viable) options are replacing "Big" or "bum", neither of which you would hope is the sentence intended.

  12. Philip Taylor said,

    July 27, 2020 @ 3:16 pm

    Well, my genuine and sincere thanks to those who have tried to explain the "cat" thing to me, but I begin to think that they and I must occupy parallel universes. Not even as a child of five would I have found the idea of replacing either "Big" or "bum" by "cat" funny, so I will add nothing further on this topic.

  13. Michael said,

    July 27, 2020 @ 5:35 pm

    @Frans: The way I remember things, however, it was rarely necessary to do more than minor corrections before the advent of the autocorrect.

  14. Frans said,

    July 28, 2020 @ 3:56 am

    @Philip Taylor
    My reading is that, whether real or not, cat itself is a typo or autocorrect for car.

    Big Dick's in his little brothers car.

    One could come up with additional hypotheses, but "big car's in his little brother's room," for example, stacks additional autocorrects, each of which makes the final result less plausible.

    The joke itself is just five-year-old toilet humor, no matter what's in what.

    @Michael
    Not necessarily, slips of the fingers can include fairly big replacements such as:

    I would of done this and that.
    *have

    But sure, unless it involves numbers you'd generally only do it to signal that you do in fact know how to spell and not for intelligibility. Which is also how just about everyone uses it on their phones in my anecdotal experience. Everyone turns off autocorrect ASAP the first time they create a monstrosity like in the xkcd.

  15. IMarvinTPA said,

    July 28, 2020 @ 11:47 am

    Big Dicks in his little brothers bum
    *Cat
    Candidate words, Dicks, brothers, and bum. (Possibly his?)
    Big Cat is in his little brother's bum. (That's gotta be uncomfortable.)
    Big Dick is in his little cat's bum. (This is alarming.)
    Big Dick is in his little brother's cat. (Nope, not any better than the last one.)
    None of these end well for the cat.
    The humor is that the disturbing error was supposed to make it be not-disturbing by changing a word to cat. The problem is, there is no solution that is not disturbing.

    Bonus tries:
    Big Dick is in cat's little brother's bum. (Now we have two cats involved?)
    Cat Dick is in his little brother's bum. (…)

  16. Philip Taylor said,

    July 28, 2020 @ 12:51 pm

    Frans, thank you, a glimmer of sense starts to emerge. Perhaps those who enjoy cryptic crosswords would have spotted the "cat" -> "car" possibility, but I did not. What is still unclear to me is how the "*" convention arose. I assume that the requirement is to retrospectively amend erroneous content in an SMS message once the possibility of editing it has been lost, but were I to want to do such a thing I would probably write "for 'bum' read 'car'", "'bum' -> 'car'" or even "s/bum/car/". Someone must have invented this new use for the asterisk — is it known who he or she was ?

  17. Gregory Kusnick said,

    July 28, 2020 @ 5:28 pm

    I'm not sure I'd call this a new invention. It basically follows the same convention as footnotes called out by asterisks, except that in this case it's too late to insert an asterisk at the point of reference.

    It also has the virtue of brevity, which in the early days of flip-phone texting mattered a great deal.

  18. Andrew Usher said,

    July 28, 2020 @ 7:32 pm

    None the less there must have _been_ a first person to use asterisks in this way, though we'll never identify who. As I think internet chat predates texting, the origin must have been in the former, of which there are likely to be no records.

    Frans:
    But 'would of' for 'would have' is not a slip of the fingers; no muscle memory can want to put 'of' after 'would' since it never occurs there in correct English. It's just a grammatical error, even if it someday becomes standard.

  19. Gregory Kusnick said,

    July 28, 2020 @ 10:32 pm

    I grant that there must have been an instance of this usage that chronologically preceded all others. But it doesn't follow that it must have causally preceded them. This adaptation of the footnote convention might have occurred to many people independently.

  20. Andreas Johansson said,

    July 30, 2020 @ 9:08 am

    Despite having used text messages, chatrooms, etc. for a long time, I had the apparently erroneous impression this is a fairly recent convention. I'd guestimate I became aware of it sometime in the last 10 years.

  21. Andrew Usher said,

    July 30, 2020 @ 9:48 pm

    And that is evidence that it's far from intuitively obvious to adapt the 'asterisk for note' convention in this way, if most people that use it only learned of it recently.

    So it's still plausible there was a single invention, at least in the 'but for' sense.

  22. Frans said,

    July 31, 2020 @ 2:13 pm

    @Philip Taylor

    I assume that the requirement is to retrospectively amend erroneous content in an SMS message once the possibility of editing it has been lost, but were I to want to do such a thing I would probably write "for 'bum' read 'car'", "'bum' -> 'car'" or even "s/bum/car/". Someone must have invented this new use for the asterisk — is it known who he or she was ?

    Presumably s/original/replacement/ is mostly limited to certain circles.

    I have no idea as to when or where the asterisk convention may have originated. As I said, it was already common when I first started going online 20 years ago, so it could've been brand new or it could've been decades old already. In any case, the online chat (MSN, ICQ, Yahoo) → texting direction seems much more plausible. We had to be quite frugal with our texts.

    @Andrew Usher

    But 'would of' for 'would have' is not a slip of the fingers; no muscle memory can want to put 'of' after 'would' since it never occurs there in correct English. It's just a grammatical error, even if it someday becomes standard.

    I must admit I'm quite puzzled as to what point you're trying to make. There have been numerous posts about examples of this phenomenon on this very weblog, including this one just a few weeks ago: https://languagelog.ldc.upenn.edu/nll/?p=47259

    More formally, it might be worth reading "Rules or regularities? The homophone dominance effect in spelling and reading regular Dutch verb forms" by Nina Verhaert. It uses fancier terms like "homophone dominance effect" instead of "slip of the finger."
    https://doc.anet.be/docman/docman.phtml?file=.irua.1d90f7.131661.pdf

  23. Andrew Usher said,

    August 1, 2020 @ 7:09 am

    I'm not going to read that whole thing – it's 500 pages. My point is that writing "would of" is a 'slip of the brain', not of the fingers. In other words it happens during, not after, the brain's language processing. It's not really a homophone effect, either, since as I pointed out, "would of" _never occurs_ in standard English. It's not like the other homophone examples where both are legitimate words.

  24. Frans said,

    August 1, 2020 @ 9:56 am

    A slip of the tongue or fingers is just an idiom. It's all a slip of the brain.

    However, I think I may understand which distinction you're insisting on, even if I don't understand why you think it's important within this context. It seems you'd prefer if the idiom were used exclusively when someone accidentally typed king instead of kin, for example, perhaps because the suffix -ing is more common. (NB I don't know if that actually happens. Suffice it to say that there are examples that follow this general pattern, regardless if this is actually one of them.) We can consider this a motoric error, post language processing.

    We can regard this as distinct from an orthographic error, as caused by the homophone dominance effect. However, it's not always easily distinguishable from one. Examples include typing its for it's, hair for hare, of for 've, and so forth.

    Assuming I've described the above correctly, where we differ in opinion is the claim that of constitutes a grammatical error here. It seems quite implausible that the mistake would represent a misanalysis of of as a preposition. It's an incorrect orthographic transcription of part of the verb phrase often pronounced as /əv/ or /əf/, which would be correctly rendered as have or 've. By contrast, an actual grammatical error would be something along the lines of the child are.

    Even granting for the sake of argument that of is a grammatical error just like the child are, I'd still be perfectly happy to call it a slip of the fingers. It's not what one intended to type; it just so happened to come out of the fingers.

    Regardless, my point was that such substitutions can occasionally be quite large, whatever you like to call them. And the substitutions needn't be literally what was originally meant. The "correct" substitution would look like:

    I would of done this and that.
    *'ve

    But, at least in my online social circles, that's not how anyone corrects such a slip of the brain. The correction takes the full form.

  25. Philip Taylor said,

    August 1, 2020 @ 1:04 pm

    "Even granting for the sake of argument that [could] of is a grammatical error just like the child are, I'd still be perfectly happy to call it a slip of the fingers. It's not what one intended to type; it just so happened to come out of the fingers" — there I respectfully disagree, Frans. In the main, those who write "could of" do not do so occasionally but consistently — they appear to genuinely believe that "could of" is correct idiomatic English. This, it seems to me, is almost certainly the result of their having acquired language primarily through the oral/aural route, with little or no subsequent correction resulting from reading.

    Furthermore, with the advent of Web 2.0 and the concomitant opening up of "publishing" to all and sundry, the feedback and correction that might have resulted from regular reading pre Web 2.0, when, in the main, authors were literate and wrote good grammatical English, is now increasingly unlikely to to take place because catachreses such as "could of" are ever more frequent in (e.g.,) comments on Youtube videos, comments on newspaper articles and so on. Even contributors to this highly erudite forum have been known to use "could of" where "could have" was required : name redacted, December 18, 2018 @ 3:23 am, "I could of written more, but I'll leave it there, because work"; name redacted, October 22, 2012 @ 8:00 am, "Oh my, WHOM at Reuters' could of missed such a 'istoric subjunctival opportunity?";

    Google 'ngrams' shows an early peak in "I could of done" circa 1943/44 but that peak was passed circa circa 2012 after a fairly deep trough in the intervening period.

  26. Frans said,

    August 1, 2020 @ 1:25 pm

    In the main, those who write "could of" do not do so occasionally but consistently — they appear to genuinely believe that "could of" is correct idiomatic English.

    You may well be correct about that. Contrary to the kin → king error which I made up on the spot, however, this example was drawn from memory. :)

  27. GH said,

    August 2, 2020 @ 7:25 am

    @ Philip Taylor:

    October 22, 2012 @ 8:00 am, "Oh my, WHOM at Reuters' could of missed such a 'istoric subjunctival opportunity?"

    This appears clearly facetious rather than a genuine mistake (whether grammatical or slip of whatever body part).

  28. Philip Taylor said,

    August 2, 2020 @ 7:40 am

    GH — quite possibly, but instances of "could of" here are (fortunately) very sparse, whilst on Youtube and newspaper comments (etc) they are increasingly (and depressingly) ubiquitous …

  29. Frans said,

    August 2, 2020 @ 8:48 am

    Here's a sentence conspired by my brain and my fingers today:

    Ik weet dat het in principe iets is als (i)t wiene moet wezen i.p.v. 't waren.

    The typed sentence contains two functionally equivalent sentences, and instead of choosing one or the other my fingers typed both.

    Ik weet dat het in principe iets is als (i)t wiene
    I know that in principle it is like "it wiene."

    Ik weet dat het in principe iets als (i)t wiene moet wezen
    I know that in principle it has to be like "it wiene."

    In chat, if I opted for a correction it would take the form of:

    Ik weet dat het in principe iets is als (i)t wiene moet wezen i.p.v. 't waren.
    -is

    It's quite clear to me that it's a grammatical error and a slip of the fingers.

  30. Philip Taylor said,

    August 2, 2020 @ 9:17 am

    At a more meta-level, is it invariably the case that corrections such as "*car", "-is", etc., always form (the start of) a new contribution in its own right, or are there actually "chat" systems which do not allow corrections (perhaps before the current line) to be made in real time, and which therefore force their users to add "footnotes" to indicate what they had intended to say whenever an error has been made ?

  31. Andrew Usher said,

    August 2, 2020 @ 9:25 am

    I think you can both be right on the cause.

    Certainly, Philip's is the traditional explanation for 'could of' etc., and must be right for most cases: people do genuinely have it as part of their grammar and this is because of a mistaken mental impression derived from hearing the reduced forms. This would be further strengthened if, as I have heard occurs in British English, one actually hears people speak 'could of' as two distinct words. This was the one I assumed, because I believe I never make homophonic substitution errors, and have a hard time intuitively accepting it over something I do readily understand.

    But I don't doubt that Frans is right that some people do and 'could of' is a possible example – although it is probably not a good example for him to have chosen because of the dominance of the other cause in that case. Also, the reason a correction for it would be *have and not *'ve has nothing to do with the source of the error – it is simply that English doesn't permit something like 've to stand on its own, a complete word is required.

  32. Frans said,

    August 2, 2020 @ 10:53 am

    @Philip Taylor
    I don't think it's necessarily about what the chat system permits, even if being able to edit the message in chat systems is probably a relatively recent phenomenon. Chatting is more like speaking, fairly stream of consciousness-based. It's a quickly typed correction while you're barely even consciously aware of it, as opposed to the comparatively gargantuan effort of taking your hand over to your pointing device and searching for the message edit function. That breaks the flow of conversation.

    There's a car on my table.
    *cat

    There's cat on my table.
    +a

    There's a cat on on my table.
    -on

    @Andrew Usher
    There are two separate points. The one is that the Brit in question clearly knew "have" is the correct form, or they wouldn't have typed and sent "have" in a split second. It was simply the biggest example of a correction that came to mind in a second or two. As such, to me it was clearly a slip of the fingers, a phrase I would apply to pretty much any error so swiftly noticed. Including grammatical errors. Consequently, in my view you could say little more than that it was an unrealistic example even if I'd made it up from scratch. Or is that what you were trying to convey? If so, I must once again emphasize that it was real and not made up.

    More broadly, I don't see an incorrect alternative spelling as a grammatical error. It may be unintentionally suggestive of a grammatical error, but that doesn't actually make it one. I see no substantive difference with substituting, e.g., flower for flour or union for onion, except that out of all of those union is probably the most likely to result in seemingly adequate sentences.

    I could've done more.
    *I could of done more.

    Take a tablespoon of flour.
    *Take a tablespoon of flower.
    ?Take a tablespoon of flowers.

    ?Now cut the union.

    But I'd be hard-pressed to call the underlying sentence ungrammatical. In contrast to, for example:

    *I could've did more.

    *Take a tablespoon of flours.
    ?Take a tablespoon of flowers.

    *Yesterday cut the onion.

  33. Andrew Usher said,

    August 3, 2020 @ 7:31 am

    Frans, I do not disbelieve you. I did not say the example was unrealistic or made up (though I didn't understand that it was real, either). It was simply not a very good choice of example to illustrate what you were getting at (which wasn't obvious from your first post) because of the unrelated grammatical issue with it.

    Whether an error is one of spelling or grammar depends on the user's intent. Slips of the finger in either sense are just orthographic, while for the people that consistenly write 'could of, it's grammar. The difference is usually obvious, as is most of the examples you just came up with; but here it isn't without knowing additional information.

    By the way, I myself think one of the examples, 'Yesterday cut the onion' is grammatical, though nonsensical, in either syntactical reading i.e. as an imperative with 'yesterday' the (impossible) modifier, or a declarative with 'yesterday' the subject.

RSS feed for comments on this post