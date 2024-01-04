« previous post |

"The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work", NYT 12/27/2023:

The New York Times sued OpenAI and Microsoft for copyright infringement on Wednesday, opening a new front in the increasingly intense legal battle over the unauthorized use of published work to train artificial intelligence technologies.

The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. The lawsuit, filed in Federal District Court in Manhattan , contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.

The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.

The lawsuit includes nearly 30 pages of persuasive examples in which OpenAI programs parrot large chunks of NYT material, essentially verbatim. Here's the start of the first example:

That same example is featured by Gary Marcus in "Things are about to get a lot worse for Generative AI: A full of spectrum of infringement", 12/29/2023, along with a selection of image examples like this one:

He concludes:

In all likelihood, the New York Times lawsuit is just the first of many. On a multiple choice X poll today I asked people whether they thought the case would settle (most did) and what the likely value of such a settlement might be. Most answers were $100 million or more, 20% expected the settlement to be a billion dollars. When you multiply figures like these by the number of film studios, video game companies, other newspapers etc, you are soon talking real money.

A multiple-choice X poll may not be an accurate predictor of settlement value, but it seems clear that copyright infringement will be a serious problem for generative AI in the near future. Effective decelerationism, even.

Ironically, this may turn out to be a Good Thing for Google. According to Myles Kruppa, "Jeff Bezos Bets on a Google Challenger Using AI to Try to Upend Internet Search", WSJ 1/4/2024:

Perplexity, a startup going after Google’s dominant position in web search, has won backing from Jeff Bezos and venture capitalists betting that artificial intelligence will upend the way people find information online.

Started less than two years ago, Perplexity has fewer than 40 employees and is based out of a San Francisco co-working space. The company’s product, which it calls an answer engine, is used by about 10 million people monthly.

Those ingredients were enough to persuade Institutional Venture Partners, Bezos and other tech executives to invest \$74 million in the company, the largest sum raised by an internet search startup in recent years. The investment valued Perplexity at $520 million, including the new money, said Chief Executive Officer Aravind Srinivas. […]

Perplexity’s founders said their advantage is using advances in AI to provide direct answers, instead of website links, in response to search queries, without some of the limitations felt by larger companies.

“If you can directly answer somebody’s question, nobody needs those 10 blue links,” Srinivas said.

But based on today's "Stochastic Parrot" AI technology, those direct answers are likely to be mostly copied from other people's published texts, and so Perplexity may run into the same sort of copyright caltrops that are in OpenAI's pathway.

Of course, Google has been doing the same thing for a while — see "News Publishers See Google’s AI Search Tool as a Traffic-Destroying Nightmare", WSJ 12/24/2023:

Shortly after the launch of ChatGPT, the Atlantic drew up a list of the greatest threats to the 166-year-old publication from generative artificial intelligence. At the top: Google’s embrace of the technology.

About 40% of the magazine’s web traffic comes from Google searches, which turn up links that users click on. A task force at the Atlantic modeled what could happen if Google integrated AI into search. It found that 75% of the time, the AI-powered search would likely provide a full answer to a user’s query and the Atlantic’s site would miss out on traffic it otherwise would have gotten.

What was once a hypothetical threat is now a very real one. Since May, Google has been testing an AI product dubbed “Search Generative Experience” on a group of roughly 10 million users, and has been vocal about its intention to bring it into the heart of its core search engine.

But if "Search Generative Experience" is blocked, that leaves Google where it is today — in control of web search, and safe from an OpenAI-powered Microsoft invasion or guerilla raids by upstarts like Perplexity.

By the way, "perplexity" is an important concept in Information Theory, part of the foundations of "language models" large and small, which is no doubt why the start-up's founders chose the name.

Interestingly, perplexity's technical meaning — 2 raised to the power of the entropy — doesn't seem to have made it into dictionaries yet. As the Wikipedia article informs us, the earliest published example is in the abstract for a presentation at the 1977 Acoustical Society annual meeting — Fred Jelinek, Robert Mercer, Lalit Bahl, and James Baker, "Perplexity -— a measure of the difficulty of speech recognition tasks":

Perhaps there are other examples of words in widespread use that were first published in a conference abstract, but this is the only one that I know of.

