Watermarking AI output, again

« previous post | next post »

Deepa Seetharaman & Matt Barnum, "There’s a Tool to Catch Students Cheating With ChatGPT. OpenAI Hasn’t Released It.", WSJ 8/4/2024:

OpenAI has a method to reliably detect when someone uses ChatGPT to write an essay or research paper. The company hasn’t released it despite widespread concerns about students using artificial intelligence to cheat.

The project has been mired in internal debate at OpenAI for roughly two years and has been ready to be released for about a year, according to people familiar with the matter and internal documents viewed by The Wall Street Journal. “It’s just a matter of pressing a button,” one of the people said.

What OpenAI wrote on 5/7/2024 — "Understanding the source of what we see and hear online" ("Update on August 4, 2024") — suggests that the method is relatively easy to circumvent:

  • Our teams have developed a text watermarking method that we continue to consider as we research alternatives.
  • While it has been highly accurate and even effective against localized tampering, such as paraphrasing, it is less robust against globalized tampering; like using translation systems, rewording with another generative model, or asking the model to insert a special character in between every word and then deleting that character – making it trivial to circumvention by bad actors.
  • Another important risk we are weighing is that our research suggests the text watermarking method has the potential to disproportionately impact some groups. For example, it could stigmatize use of AI as a useful writing tool for non-native English speakers.

See also Umar Shakir, "Google’s invisible AI watermark will help identify generative text and video", The Verge 5/14/2024:

Google’s DeepMind CEO, Demis Hassabis, took the stage for the first time at the Google I/O developer conference on Tuesday to talk not only about the team’s new AI tools, like the Veo video generator, but also about the new upgraded SynthID watermark imprinting system. It can now mark video that was digitally generated as well as AI-generated text.

Some details are available on Google Deep Mind's SynthID page:

Finding a robust solution to watermarking AI-generated text that doesn’t compromise the quality, accuracy and creative output has been a great challenge for AI researchers. To solve this problem, our team developed a technique that embeds a watermark directly into the process that a large language model (LLM) uses for generating text.

An LLM generates text one token at a time. These tokens can represent a single character, word or part of a phrase. To create a sequence of coherent text, the model predicts the next most likely token to generate. These predictions are based on the preceding words and the probability scores assigned to each potential token. […]

This process is repeated throughout the generated text, so a single sentence might contain ten or more adjusted probability scores, and a page could contain hundreds. The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark. This technique can be used for as few as three sentences. And as the text increases in length, SynthID’s robustness and accuracy increases.

Obviously this method depends on knowing what release of what generative model was used. And even then, the same "globalized tampering" interventions will presumably work here as well — though if and when these methods are generally accessible and used, naive cheaters will be caught.

Another approach — alas even easier for "bad actors" to circumvent — is the Content Authenticity Initiative and the C2PA protocol. This protocol is analogous to (various countries') food labeling requirements.

Last year's LLOG AI Watermarking posts:

Watermarking text”, 7/25/2023
ROT-LLM?”, 7/28/2023

 



4 Comments »

  1. David Marjanović said,

    August 5, 2024 @ 9:47 am

    For example, it could stigmatize use of AI as a useful writing tool for non-native English speakers.

    Good. LLMs are not a useful writing tool for anyone.

  2. Gregory Kusnick said,

    August 5, 2024 @ 10:43 am

    There's a bill (AB 3211) moving through the California legislature that would effectively mandate watermarking on all generative AI output, even if the best available watermarking schemes are easily defeated. So AI purveyors are busily building watermarking systems they know won't do any good in anticipation of such pointless mandates.

  3. AntC said,

    August 5, 2024 @ 6:06 pm

    @DM Good.

    Seconded. Using AI would be particularly dangerous for non-native speakers: they may well not realise when the tool is producing nonsense/non-English/hallucinating — not that native speakers are necessarily entirely immune.

  4. Wanda said,

    August 6, 2024 @ 10:20 pm

    I think you're under-estimating what types of writing ChatGPT can be useful for. I have a colleague whom I respect very much who is fluent in English but not a native English speaker. He's a professor at an American university in a STEM discipline. He, like all of us, has to generate various kinds of routine writing all day, like emails. He says that previously, these types of writing would take him a long time to produce. However, ChatGPT has apparently been a great help for him for generating grammatically correct drafts for him to edit. He knows English well enough for him to know how to take what ChatGPT spits out and change it appropriately. He says it's made him a lot more productive and allows him to spend more time doing actual work. Honestly, if ChatGPT can let people who aren't good at writing in English write things that aren't supposed to be particularly original or interesting, I don't see the harm.

RSS feed for comments on this post · TrackBack URI

Leave a Comment