Advances in topic modeling

In the middle to late 1990s, "Topic Detection and Tracking" was an active research area (see also this). And by the early 2000s, the technology was good enough to support the creation of Google News. Twenty years later, these and other innovations have transformed the mass media, for good or ill. I don't know what algorithms the AI in charge of Topic Modeling at Google News is using these days, but I'm happy to see it developing a sense of humor:

If you've been vacationing on Mars, you may need some background, which of course you can get from Google News as long as the developing story is hot…

Update — in case anyone is still puzzled about why it's amusing to see the marijuana legalization story included in the Gaetz coverage, see e.g. "OK, Who's Writing The Matt Gaetz Screenplay, Because This Sh*t Is WILD".


  1. Philip Taylor said,

    April 1, 2021 @ 7:57 am

    I don't think that one needs to have been on holiday to Mars to be blissfully unaware of whatever lies behind these articles — simply residing in a country other than North America is probably sufficient. FWIW, as a not atypical Briton, I have not the faintest idea who Matt Gaetz, Tucker Carlson, Joel Greenberg, Jonathan Turley, or any of the other protagonists named in the story are, nor (to be brutally honest) do I really care …

  2. Cervantes said,

    April 1, 2021 @ 8:47 am

    Well Mr. Taylor, if you are concerned at all about the fate of the colonies, you probably do want to know something about Tucker Carlson, who unfortunately is an influential white supremacist TV ranter. Matt Gaetz is a right-wing extremist member of congress who is famous for spouting insane conspiracy theories. Joel Greenberg is a minor Republican official who is accused of child sex trafficking, and Jonathan Turley is a publicity hogging law professor who gets attention by being absurdly contrarian in the public defense of obviously guilty right wing politicians, such as Donald J. Trump. Weed is the slang term for Cannabis sativa, a plant containing psychoactive chemicals.

  3. John Swindle said,

    April 1, 2021 @ 9:09 am

    Dramatis personae: Matt Gaetz, pro-Trump politician. Tucker Carlson, right-wing television news commentator. Google News, internet news aggregator. Greenberg, Turley, and others, supernumeraries.

    Synopsis: Gaetz is accused of sexual misconduct. Google News thinks the story is about legalization of cannabis..Hilarity ensues, or not.

  4. Philip Taylor said,

    April 1, 2021 @ 9:14 am

    Thank you for enlightening me, Cervantes. Of course, one man's terrorist is another man's freedom fighter, so there may well be some who disagree with your characterisation of the four persons named, but as you and I at least agree that Donald J Trump was probably not the finest president that the US have ever known, I have no reason to doubt your value judgements. "Weed" is a term with which I am familiar (familiar with the term, that is, not with the substance), but it is not immediately what rôle it is alleged to have played in the proceedings.

  5. rpsms said,

    April 1, 2021 @ 10:35 am

    Regarding the role that weed plays in this, I am instantly reminded of the 70s commercial catchphrase "Calgon Take Me Away"

  6. Shad Daly said,

    April 1, 2021 @ 12:09 pm

    I don't get it. I see that the weed story is the one that doesn't belong, but I don't see why that is humorous..

  7. Aaron said,

    April 1, 2021 @ 12:48 pm

    I thought the joke was meant to be that after hearing about sex trafficking white supremacist pols all day, you'll want a hit of the good stuff.

  8. Cervantes said,

    April 1, 2021 @ 1:51 pm

    My interpretation was that the whole Gaetz story is so bizarre, you have to be smoking something to believe it, or perhaps to have imagined it. But no, I didn't really get the joke. How the topic classifier made that mistake is a mystery, but ML is always a black box that occasionally produces bizarre output.

  9. David Morris said,

    April 1, 2021 @ 2:30 pm

    Vacationing on Mars or living in one of the 99.5% of the world's countries which aren't the USA …

  10. Philip Taylor said,

    April 1, 2021 @ 2:56 pm

    Be careful, David — wasn't Galileo Galilei threatened with the most unspeakable tortures for maintaining that the earth did not rotate around Washington, D.C. ? I fear that, even today, he would almost certainly be incarcerated in Guantánamo Bay for holding such indisputably heretical views …

  11. AntC said,

    April 1, 2021 @ 4:22 pm

    @Philip as a not atypical Briton, …

    Speaking as a Brit (who now lives in New Zealand), I find you atypical not only in your frequent attempts to post first and draw attention to yourself and your lack of worldliness on this forum, but also it seems going out of your way to remain unworldly.

    You seldom have points to make of Linguistic observation. You are of course entitled to your ignorance and prejudice; and in the British liberal tradition I defend to the death your right to be a grumpy old fart. You are not entitled to waste everybody's time by telling us merely what you are ignorant of and what you don't care about.

  12. Philip Taylor said,

    April 1, 2021 @ 6:33 pm

    I think you meant "[…] by telling us merely of what you are ignorant, and about which you do not care".

  13. Bathrobe said,

    April 2, 2021 @ 1:47 am

    You are not entitled to waste everybody's time by telling us merely of what you are ignorant, and about which you don't care.

    This is actually grammatically peculiar. At any rate, it has a different structure and meaning from what AntC wrote. Would you really say "You are not entitled to waste everybody's time by telling us about which you don't care"?

    If you believe in rules like "don't end a sentence with a preposition", I suggest that you are not a typical Briton. It's an awfully old-fashioned rule, even when implemented properly.

    PS: if you want to stay up to speed on American issues, there is always that staunch rock of British journalism, The Guardian, to turn to.

  14. John Walden said,

    April 2, 2021 @ 3:31 am

    It's The Guardian which often concludes that I am preternaturally interested in its Australian content. It's not that I'm not, just that I'm not that much. So I wonder what algorithm leads it to think that I am.

    Its US news is why I know who the protagonists of this sordid tale are.

  15. F said,

    April 2, 2021 @ 3:48 am

    Bathrobe: 'If you believe in rules like "don't end a sentence with a preposition"': shouldn't that be "never use a preposition to end a sentence with"?

  16. Doreen said,

    April 2, 2021 @ 4:18 am

    Thank you, AntC. Well said.

  17. Batchman said,

    April 2, 2021 @ 1:18 pm

    Unfortunately, some sentences are hard to remove the ending preposition from.

  18. maidhc said,

    April 3, 2021 @ 4:36 am

    It seems that the UK tabloids may be picking up on this story, so perhaps there will be a properly British presentation of SEX! DRUGS! TEEN PROSTITUTES!

    However legalizing weed does not really seem to have much to do with Gaetz. Not that he wasn't right in on it, but reports suggest he was more into Ecstasy.

  19. Michael Watts said,

    April 3, 2021 @ 8:00 pm

    'If you believe in rules like "don't end a sentence with a preposition"': shouldn't that be "never use a preposition to end a sentence with"?

    Huh? Where is the 'with' coming from? Your "correction" seems to parallel the example sentence "This sentence has cabbage six words".

  20. Batchman said,

    April 4, 2021 @ 5:15 pm

    Of course, the fix for "A preposition is a bad thing to end a sentence with" is "A preposition is a bad thing to end a sentence with, turkey."

  21. Michael 1962 said,

    April 6, 2021 @ 6:26 am

    @Bathroom, @AntC – I think Philip was sending himself up, which is v. typical British. And gently teasing others; likewise so. Because of course it is only we British who understand humour, except in Britain where it is only we Liverpudlians.
    Evidently, I'm the opposite of Philip Taylor – I never comment, and if I do, I do so late. As a non-linguist, and therefore merely a taker not a giver, I do very much appreciate this blog, so I would just like to express my appreciation of Victor Mair and Mark Liberman in being so assiduous in keeping it going.

