AI triumphs… and also fails.

« previous post |

Google has created an experimental — and free — system called NotebookLM. Here's its current welcome page:


So I gave it a link to a LLOG post that I happened to have open for an irrelevant reason: "Dogless in Albion", 9/12/2011.

And here's what it showed me next:

That Summary is OK, though it leaves out the main point of the post, which was to discuss Martin Kay's point about the puzzling role of phrasal stress in disambiguating the sentence "Dogs must be carried".

But one of the three options under "Audio Overview" was

What is the relationship between phrasal stress and the interpretation of signs using the "X must be Y" construction?

So I clicked on that option. The result was an automatically-generated podcast-style discussion:

Both the LM-generated dialog and its audio realization are really impressive. And I'm not the only one who's impressed with NotebookLM's autopodcasts — on ZDNET, David Gewirtz wrote (10/1/2024):

I am not at all religious, but when I discovered this tool, I wanted to scream, "This is the devil's work!"

When I played the audio included below for you to my editor, she slacked back, "WHAT KIND OF SORCERY IS THIS?" I've worked with her for 10 years, during which time we have slacked back and forth just about every day, and that's the first all-caps I've ever seen from her.

Later, she shared with me, "This is 100% the most terrifying thing I've seen so far in the generative AI race."

If you are at all interested in artificial intelligence, what I've found could shake you up as much as it did us. We may be at a watershed moment.

Stunningly lifelike speech and dialog system, yes. Even voice quality variation and laughter at appropriate times.

And some of the content is good — for example the robot podcasters do a good job of explaining the ambiguity under discussion in my blog post:

But there are still problems.  For example, the robots' attempt to explain the phrasal stress issue goes completely off the rails:

Zeroing in on the system's performance of the stress difference:

Where did the system get the weird idea that the way to put phrasal stress on the subject of "Dogs must be carried" is to pronounce "dogs" as /ˈdɔgz.ɛs/? Inquiring minds want to know, but are unlikely ever to learn, given the usual black-box unexplainability of contemporary AI systems.

Still, "podcasters" and similar talking-head roles may be among the jobs threatened by AI, either through complete replacement or a major increase in productivity. (And of course, human talking heads get things wrong a fair fraction of the time…)


Note: The original LLOG post should have included audio examples of Martin Kay's stress distinction, but didn't. So just in case it wasn't clear to you, here's my performance of phrasal stress on the subject:

And on the verb:

This is the only thing I've tried to do with notebookLM so far — future experiment will probably bring additional triumphs and additional failures.



5 Comments »

  1. Jon W said,

    October 3, 2024 @ 2:22 pm

    Folks might be interested in Henry Farrell's impression of the same tech at https://www.programmablemutter.com/p/after-software-eats-the-world-what (the whole thing is worth reading):

    The result was superficially very impressive. Two generic podcast voices with American accents, one female and one male, chatting with each other about this newsletter’s contents! Perhaps you can tell that the voices were artificially generated: I really couldn’t (maybe they were a little too smooth, but I probably wouldn’t have noticed – they dropped in plenty of phatics for camouflage).

    The actual content was an entirely different story. The discussion dropped plenty of generic compliments in amidst the equally generic talk-show-host chitchat, but it got didn’t accurately summarize what I had said in the posts that it talked about. My post on why it would be a disaster if Trump replaced sanctions with tariffs becomes an “argument for replacing sanctions with tariffs,” because sanctions “often hurt citizens more than leaders.” The podcast ‘hosts’’ discussion of “shitposting,” “shitmining” and “shitfarming” defined those terms in wildly different ways than the post did. And so on.

    It was remarkable to see how many errors could be stuffed into 5 minutes of vacuous conversation. What was even more striking was that the errors systematically pointed in a particular direction. In every instance, the model took an argument that was at least notionally surprising, and yanked it hard in the direction of banality. A moderately unusual argument about tariffs and sanctions (it got into the FT after all) was replaced by the generic criticism of sanctions that everyone makes. And so on for everything else. The large model had a lot of gaps to fill, and it filled those gaps with maximally unsurprising content.

  2. David McAlister said,

    October 3, 2024 @ 3:06 pm

    On the issue of dogs being pronounced /ˈdɔgz.ɛs, it seems to me that the word being pronounced is dachshund – the first speaker using a particular dog breed to make their point and the second speaker clearly picks up on that.

  3. Scott P. said,

    October 3, 2024 @ 5:24 pm

    I tried that out a week or two ago, feeding it scholarly articles on a subject my colleague teaches. The summaries and timeline it produced were mediocre at best, missing most of the key ideas. And the podcast it generated was more like a parody of a podcast — I was howling with laughter; it reproduced all of the quirks of diction and commentary that amateur podcasts have, but the content was almost completely vapid.

  4. Rod Johnson said,

    October 3, 2024 @ 5:40 pm

    Tangentially, I was briefly thrown by Gewirtz's "slacked back" thing. I guess Slack has become a verb, but it's less transparent to me than "tweet" for Twitter.

  5. David L said,

    October 3, 2024 @ 5:52 pm

    Somewhat OT, but as an indications of Google AI's fallibility: I started watching the Netflix series Kaos (quite entertaining, IMO), and I wanted to know the name of the actress playing "Riddy" (aka Eurydice). The first thing Google came up with was an AI summary saying that the character is played by Eva Noblezada, along with a picture that didn't look much like the person in the show. Then I found a cast list online and discovered that the actress in question is Aurora Perrineau, an entirely different person.

    It's perplexing how AI can get simple facts wrong — like the number of r's in strawberry.

RSS feed for comments on this post · TrackBack URI

Leave a Comment