AI plagiarism again

« previous post | next post »

Along with concerns about hallucinations and learned bias, there's increasing evidence that generative AI systems sometimes commit what would obviously be plagiarim if a human did it. One particularly striking example is discussed in a recent article by Randall Lane, editor of Forbes Magazine: "Why Perplexity’s Cynical Theft Represents Everything That Could Go Wrong With AI", 6/11/2024:

For most of this year, two of our best journalists, Sarah Emerson and Rich Nieva, have been reporting on former Google CEO Eric Schmidt’s secretive drone project, including a June 6 story detailing the company’s ongoing testing in Silicon Valley suburb Menlo Park as well as the frontlines of Ukraine. The next day, Perplexity published its own “story,” utilizing a new tool they’ve developed that was extremely similar to Forbes’ proprietary article. Not just summarizing (lots of people do that), but with eerily similar wording, some entirely lifted fragments — and even an illustration from one of Forbes’ previous stories on Schmidt. More egregiously, the post, which looked and read like a piece of journalism, didn’t mention Forbes at all, other than a line at the bottom of every few paragraphs that mentioned “sources,” and a very small icon that looked to be the “F” from the Forbes logo – if you squinted. It also gave similar weight to a “second source” — which was just a summary of the Forbes story from another publication.

I haven't found a systematic comparison between the Forbes stories and the Perplexity version. But if Lane's description is accurate, and if Perplexity were a human, it could be in serious trouble, although the criteria are fuzzy at best. And presumably Perplexity-the-company might also get in legal trouble over behavior like this, though it's not clear to me whether the consequences would be serious enough for them to care. The NYT sued OpenAI last December ("AI plagiarism", 1/4/2024), claiming “billions of dollars in statutory and actual damages”. Perhaps someone can point us to what's happening in that case, or evaluate what's likely to happen.

Some other news coverage:

"New report: 60% of OpenAI model's responses contain plagiarism", Axios 2/22/2024
"AI startup Perplexity accused of ‘directly ripping off’ news outlets like Forbes, CNBC without proper credit", New York Post 6/10/2024
"Perplexity’s Plagiarism Problem", Forbes 6/11/2024
"Perplexity was planning revenue-sharing deals with publishers when it came under media fire", Semafor 6/12/2024



2 Comments

  1. AntC said,

    June 14, 2024 @ 1:56 am

    Forbes stories and the Perplexity version.

    Perplexity's version (I think). Bemusingly, Forbes' version is behind a paywall for me, so I can't compare.

    What is Perplexity AI? [via their Home page, I won't link because wordpress will barf]


    6. Real-Time Information: Perplexity indexes the web daily, allowing it to provide up-to-date information on recent news, game scores, and other timely topics

    So how do you summarise "recent news" without "eerily similar wording"?

    a very small icon that looked to be the “F” from the Forbes logo – if you squinted.

    Methinks Forbes doth protest too much. Click on the 'very small icon' and you get fuller disclosure/full attribution to 2 Forbes articles (and a couple of other sources, which could very well be derivative). And each has a link to the original — behind the same paywall.

    So this is Perplexity's news aggregation service, I'm guessing. I remember Keesings news archive which arrived on dead trees to our school library in the 1970's. How did they avoid plagiarism charges?

    More egregiously, the post, which looked and read like a piece of journalism, didn’t mention Forbes at all, …

    Since Lane/'Forbes staff' claim to be Journalists, I call bullshit! factcheck fail!

    That is, unless Perplexity have revised their attribution since Lane et al first looked at it.

  2. F said,

    June 14, 2024 @ 7:03 am

    I too remember Keesing's; I was the school (student) librarian responsible for filing it, for several years (late 60s). As I recall, it summarised and (therefore) paraphrased its source material, and I have no doubt it complied with copyright law.

RSS feed for comments on this post