Big Data vs. Amateur Linguistics

« previous post | next post »

Neil Dolinger sent in the following banner ad that popped up on his computer screen one day:

If you're wondering what this is about, I'll leave it to Neil to tell his own story, because he tells it best:

When I visited a website I frequent, the attached banner ad appeared for Toyota. A banner ad from Toyota would not be strange in of itself, except that it was in Vietnamese. I noticed there was a question mark at the top right, and when I clicked on it, the following message appeared:

This ad was provided to you by Conversant, one of the world's largest integrated online advertising companies. Conversant uses anonymous information about your online activity in order to customize advertisements to be more relevant and useful to you. We do not collect any personal information, nor display more ads than you would normally receive. You simply receive ads that are more relevant to you. You may have arrived from an AdChoices icon on one of our banner advertisements. If so, the icon you clicked is utilized by members of the Digital Advertising Alliance (DAA) such as Conversant to provide you with information and the ability to opt out.

As you might remember from previous discussion on Language Log, I was a student in Penn's Oriental Studies department in the early '80's, where I studied Mandarin. I also consider myself an amateur linguist (duh, I visit Language Log daily!), and have maintained an interest in all of the East Asian languages and language families. It would not surprise me if my various Google searches on these topics led to some data crunch algorithm deciding I must be a native speaker of Vietnamese. That decision would not be even close to correct, but it makes me wonder if that is one of the directions where our interests are being tied to our language use by demographic researchers.

Now, what does the ad say?

"Who says an intelligent choice cannot be the start of a bold journey?"
"Who can say that an intelligent choice is not an audacious beginning?"

In terms of meaning and style, it's not very distinguished, but what would one expect of advertising copy?

[Thanks to Bill Hannas and Eric Henry]


  1. D.O. said,

    December 10, 2014 @ 12:40 pm

    It might be a Bayesian thing. Suppose algorithm thinks that a chance of the viewer being Vietnamese is 35%, but that ads in the native language are twice as effective as in English. Than it should go for Vietnamese. Maybe Vietnamese ad is thought of as more effective because it's more exotic. Or maybe the ad agency figured out that Mr. Dolinger is a curious type and played on that.

  2. blahedo said,

    December 10, 2014 @ 1:37 pm

    It's certainly paid off in terms of getting more free eyeballs on the ad.

  3. AntC said,

    December 10, 2014 @ 1:39 pm

    @D.O. more effective because it's more exotic

    Toyota now have a bazillion LLog eyeballs. Effective enough for you?

  4. David J. Littleboy said,

    December 10, 2014 @ 5:12 pm

    I suspect it's more of a side effect of the advertising industry grasping at straws. My best guess is that internet advertising is failing miserably at having even the slightest effect on consumer behavior. This is in contrast to television advertising, which was (and presumably still is) enormously effective. At least here, if I search for something, I'll see adds for that exact thing for a week, long after I've made my purchase decision. But none of the other ads that appear on my screen are things that I'll ever purchase.

  5. Peter said,

    December 10, 2014 @ 5:53 pm

    D.O. wrote: “Suppose algorithm thinks that a chance of the viewer being Vietnamese is 35%, but that ads in the native language are twice as effective as in English. Than it should go for Vietnamese.”

    Those numbers don’t add up at all. Given the assumptions you set out, the optimal choice is undetermined — it’ll depend on how effective different ads are the other 65% of the time. And it seems safe to bet that for most non-Vietnamese consumers, the English ad will be significantly more effective; so the optimal choice will probably end up as English.

  6. D.O. said,

    December 10, 2014 @ 7:53 pm

    Peter, you are right. I was sloppy and didn't think this through. But it is reasonable to target a less likely target if you expect a much bigger impact in case you hit it. I won't bother everybody with trivial details.

  7. Matt said,

    December 10, 2014 @ 8:04 pm

    Is the copy a riff maybe a riff on a Vietnamese proverb or something?

  8. Dan Lufkin said,

    December 10, 2014 @ 8:35 pm

    For me the most puzzling aspect of Web ads is how many of them cue on something you've just bought. Sure, I spent a few minutes looking at hiking boots, but I bought a pair and am not going to be in the market for years. Yet my screen is garlanded with boot ads. Do they think they've caught a hiking-boot fetishist?

  9. Victor Mair said,

    December 10, 2014 @ 9:41 pm

    @Dan Lufkin and David J. Littleboy:

    Exactly the same thing happens to me when I buy anything where I leave the slightest trace: socks, shoes, vitamins…. You name it. After I buy it, I'll be flooded with ads for the same thing for weeks.

    Can't they be a bit more imaginative? At least send me ads for something related to but different from the thing I already purchased.

  10. Ron said,

    December 11, 2014 @ 1:21 am

    @Victor Mair @Dan Lufkin:

    The simplest (and least expensive) way to do contextual advertising tracks page views, not purchases. If you visit the page for down parkas you will see ads for down parkas for the next 30-90 days whether or not you clicked through and bought the item. I don't think ad networks such as Google can detect when a purchase has been made during the same session as a page visit.

    With "big data" tools a marketer can serve ads for gloves to users who just bought a parka, but they are fairly new and still pretty expensive. I would expect that only large companies (or clients of large agencies) are using them.

  11. Ben Hemmens said,

    December 11, 2014 @ 3:29 am

    It's a new, emerging form of Whorfianism, in which the languages we use dictate the thoughts that the Internet tells us to think.

  12. Victor Mair said,

    December 11, 2014 @ 7:51 am


    What you describe is a much better explanation of what actually happens to me than the unsophisticated understanding outlined in my own comment. Even when I make my purchases in stores (which is almost always the case — I seldom, almost never, buy anything online because I really am not confident that I know how to do it), I do extensive online research on products that I'm interested in learning more about.

  13. Coby Lubliner said,

    December 11, 2014 @ 9:15 am

    I use to make hotel reservations, and after I make one the pages I visit are flooded with ads from for hotels in precisely the cities where I just booked. Recently, after a single visit to the Hebrew site of Haaretz, all these ads have been in Hebrew!

    With Internet advertising so obviously lame, how is it that companies like Google and Facebook make so much money?

  14. Rose Eneri said,

    December 11, 2014 @ 9:35 am

    Even if I see an internet add for something I am interested in, I never click on the ad. I get to the website of interest through a search engine. Even though I know websites must make money, I do not want to encourage adds of any kind.

    And now to ensure my comment has a linguistic element, I recently read a grammatically interesting sentence in P.D. James's "The Children of Men." In Chapter 11 at the end of the 11th (?) paragraph is:"I suspect that Xan finds her useful in ways I can't guess. I also think her extremely dangerous." (The next paragraph begins, "People who bother to cogitate about the personalities of the Council say that Carl Inglebach …")

    I would image that for a person not fluent in English, "I think her dangerous" would sound ungrammatical. Why is it not ungrammatical? Is it a shorthand way of saying "I think of her as being dangerous?"

  15. Dan Lufkin said,

    December 11, 2014 @ 10:19 am

    @ Rose — That's an interesting observation. Note that you can't say "*I suspect her dangerous." I consider her dangerous," is OK, but not "*I fear her dangerous." "I think her dangerous," is questionable.

  16. Bill said,

    December 11, 2014 @ 12:15 pm

    @Dan — I'm used to seeing things like "I think her dangerous" in print, but I've looked at a fair number of older texts. So, I got curious and did a quick search on the Corpus of Historical American English for the sequence "think " ("think [p*] [j*]). There is doubtless noise in the results, but the pattern looks like a clear decrease from 1810 (the early end of that corpus) onwards. Lots of the examples are things like "I think it necessary that we…", but there are plenty of ones like "I know you think him stupid."

    Maybe it's in a state of zombie-grammaticality?

  17. Bill S. said,

    December 11, 2014 @ 12:19 pm

    Apologies — just noticed there are other Bills mentioned in this thread; I should have listed my name as Bill S in the message above to distinguish.

  18. Coby Lubliner said,

    December 11, 2014 @ 12:52 pm

    I don't know where Rose Eneri gets the idea that "for a person not fluent in English, 'I think her dangerous' would sound ungrammatical." On the contrary, la creo peligrosa and je la crois dangereuse are quite idiomatic in Spanish and French, respectively.

  19. Maureen Coffey said,

    December 12, 2014 @ 6:58 am

    Oh, that's nothing to what I get. I have to do a lot of research on varying topics (and when I say varying – I mean it!). Google itself and advertisers constantly try to second-guess me and this has the most annoying and only sometimes hilarious effects. When I am in an Internet cafe and I do not accept cookies or have deleted previous cookies, the search engine and advertising software "guesses" my predilections from the various guests that may have used the same IP address as I (it's all the same cafe …) and tells me, oh boy, something about what other guests frequently search. It is not always scientific interests they pursue. Some look up tourist information and then I get all kinds of localized search results. Other stuff is of a more emotional (to put it broadly) nature including how to marry women by mail from various destinations. But the worst scourge nowadays is remarketing/retargeting: via cookies and other underhanded methods that are difficult to circumvent, the ad agencies "know" what you searched for yesterday and then they see you did not yet buy. So they keep following you around on ALL other websites (that use the same network for displaying ads and earning money from them) and display the SAME nauseous ad for days (in one case it was about cloud storage and it used a full 30 day period/cookies – at an enormous cost to advertisers who pay for each showing!). And had I clicked once, then the 30 days would probably started all over …

  20. Charles N said,

    December 12, 2014 @ 1:09 pm

    @Ron — I think your comment about big data is right on the money. The more sophisticated the algorithm, the more expensive it is. Thus Amazon will recommend items related, but not identical, to the item you just bought, but most search engine advertising doesn't do that yet. But it's only a matter of time.

RSS feed for comments on this post