"Finding a voice"

« previous post | next post »

An excellent article by Lane Greene: "Language: Finding a voice", The Economist 1/5/2017.



  1. Bob Moore said,

    January 5, 2017 @ 5:02 pm

    One sentence in the article makes me cringe:

    "According to Macduff Hughes, chief engineer on Google Translate, a simple approach using vast amounts of data seemed more promising than a clever one with fewer data."

    The first occurrence of "data" is clearly an instance of a singular mass noun, as is now routine. With the second occurrence, however, the author seems to have been hit over the head with a sadly out-of-date style guide and forced to use "fewer data", implying a plural count noun, rather than "less data", which would be consistent with the previous occurrence.

  2. Coby Lubliner said,

    January 5, 2017 @ 7:55 pm

    Geoff Pullum has often written about The Economist stodgy usage rules. This may be an example.

  3. sumelic said,

    January 5, 2017 @ 8:44 pm

    @Bob Moore:

    I don't understand why "data" must be a singular mass noun in the phrase "vast amounts of data". People say things like "amount of taxes", and "taxes" is clearly plural in Standard English, so why wouldn't this construction be possible with plural "data"?

  4. Bob Moore said,

    January 5, 2017 @ 9:18 pm


    Well, yes, people do say things like "vast amounts of taxes", so I guess my analysis won't be clear to those who insist on the outdated rule that "data" must always be a plural count noun. Note, however, that in general "amounts of" is preferred with mass nouns, and "numbers of" is preferred with count nouns, e.g., "vast numbers of birds" vs. "vast amounts of birds". People might say either one, but only the first would generally be regarded as standard English.

    Regarding "data", according to the Google Books ngram corpus, "how many data are" was more common than "how much data is" until the 1950s, but by 2000, the latter was almost 22 times more frequent than the former. Checking a number of such constructions that are diagnostic of the mass/count difference suggests that "data" started being used as a mass noun around the 1950s, and now is overwhelmingly used that way. My own hypothesis is this change came about as a result of the development of information theory, which showed that information (and hence data) is a quantity to be measured, rather than a collection of well-individuated things to be counted.

    [(myl) Hey folks, Lane Greene creates one of the best pieces of science journalism ever, and all we can do is peeve about "fewer data"? Really?]

  5. Jon said,

    January 6, 2017 @ 3:54 am

    One problem in machine translation not mentioned here is caused by people's names (Fish, Carpenter, Miles) being translated as if they were ordinary words. The same problem arises in other areas, such as street names and trade names.
    As machine translation becomes more reliable, there should be no need to provide multiple versions of a web page in different languages, and instead let MT take the strain, with the bonus that all versions will be up to date. In such a world, there is a need for an html tag for the examples above.

  6. Jon said,

    January 6, 2017 @ 3:58 am

    In the above, the words 'html tag' were followed by 'do not translate', inside angled brackets, and therefore omitted by WordPress in presenting my comment.

  7. AntC said,

    January 6, 2017 @ 6:58 am

    I can't help but feel that the 'cleverer' machines get at these tasks formerly requiring human intelligence, the more we'll redefine "intelligence" to exclude those capabilities exhibited by machines.

    I'm in a foreign city and having an alternately amazed and pereplexed battle with Google Maps.

  8. AntC said,

    January 6, 2017 @ 7:20 am

    (And working on dodgy wi-fi that apparently posts my comment when what I wanted to do was correct the spelling.)

    Maps is smart enough to find me the nearest coffee shop; but suggest I need to take a bus to a street overseas. (No: I meant the street with that name in the city where I'm catching the bus.) I suspect that our amazement will wear off and our tolerance for its 'dumbness' will wear thin.

    What the article says about how machines could grok 'common sense' and apply knowledge of 'the real world' is very pertinent.

    On a linguistic point: I'm in Taiwan, whose road signs exhibit a mix of Wade-Giles and pinyin (and the same street/town might be written with a different system in different places). Then there's the Mandarin vs. Hokkien vs. Hakka transcriptions. Maps can't cope with multiple spellings for the same place. I wonder about (say) the mix of place-names in northern West Europe, with its language salad: Cologne/Koln.

  9. Catanea said,

    January 6, 2017 @ 12:41 pm

    Well, if the very next sentence is explaining that "corpora" is the plural of "corpus", the nitpick of data vs data seems understandable.

  10. Gregory Kusnick said,

    January 6, 2017 @ 4:18 pm

    Never mind "data"; I'm pretty sure Groucho Marx didn't spell "pajamas" with a y.

RSS feed for comments on this post