Ashley Belanger, "OpenAI, Google will watermark AI-generated content to hinder deepfakes, misinfo", ars technica 7/21/2023:

Seven companies — including OpenAI, Microsoft, Google, Meta, Amazon, Anthropic, and Inflection —- have committed to developing tech to clearly watermark AI-generated content. That will help make it safer to share AI-generated text, video, audio, and images without misleading others about the authenticity of that content, the Biden administration hopes.

The link goes to a 7/21 White House with the title "FACT SHEET: Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI". One of that document's many bullet point:

The companies commit to developing robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system. This action enables creativity with AI to flourish but reduces the dangers of fraud and deception.

Belanger's ars technica article notes that

It's currently unclear how the watermark will work, but it will likely be embedded in the content so that users can trace its origins to the AI tools used to generate it.

There's actually a fair amount of stuff Out These about how textual watermarking might work, eg. Keith Collins, "How ChatGPT Could Embed a ‘Watermark’ in the Text It Generates", NYT 2/17/2023:

Identifying generated text, experts say, is becoming increasingly difficult as software like ChatGPT continues to advance and turns out text that is more convincingly human. OpenAI is now experimenting with a technology that would insert special words into the text that ChatGPT generates, making it easier to detect later. The technique is known as watermarking.

The watermarking method that OpenAI is exploring is similar to one described in a recent paper by researchers at the University of Maryland, said Jan Leike, the head of alignment at OpenAI.

The "recent paper" link goes to John Kirchenbauer et al., "A Watermark for Large Language Models", which arxiv.org give a publication date (6/6/2023) nearly four months after the NYT article that cites it. That's of course due to the magic of archiv.org ,which successively updated the .pdf link from the v1 version of 1/24 to the v2 version of 1/27 and finally to the v3 version of 6/6. Here's the abstract:

Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of “green” tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

Here's their illustrative example:

And its caption:

Figure 1. Outputs of a language model, both with and without the application of a watermark. The watermarked text, if written by a human, is expected to contain 9 “green” tokens, yet it contains 28. The probability of this happening by random chance is ≈ 6×10−14, leaving us extremely certain that this text is machine generated. Words are marked with their respective colors. The model is OPT-6.7B using multinomial sampling. Watermark parameters are γ, δ = (0.25, 2).

Read the paper if you want details. They list the obvious attacks:

Three types of attacks are possible. Text insertion attacks add additional tokens after generation that may be in the red list and may alter the red list computation of downstream tokens. Text deletion removes tokens from the generated text, potentially removing tokens in the green list and modifying downstream red lists. This attack increases the monetary costs of generation, as the attacker is “wasting” tokens, and may reduce text quality due to effectively decreased LM context width. Text substitution swaps one token with another, potentially introducing one red list token, and possibly causing downstream red listing. This attack can be automated through dictionary or LM substitution, but may reduce the quality of the generated text.

They assert, with some empirical support, that "the watermark is computationally simple to verify without access to the underlying model, false positive detections are statistically improbable, and the watermark degrades gracefully under attack."

I'm not entirely convinced that these approaches are safe against substitution and paraphrase attacks. But the most obvious issue is that there will be lots of LLM systems out there in the wild, developed and extended by people and companies whose goal is precisely to provide undetectable automatic ghostwriting services.

This post says nothing about watermarking techniques for image, audio and video data. There are well known watermarking techniques for those applications, but again, there are going to be available systems without any such safeguards.

[And for why watermarks have anything to do with water, see the Wikipedia page.]

