« previous post |

There's a puzzling new proposal for watermarking AI-generated text — Alistair Croll, "To Watermark AI, It Needs Its Own Alphabet", Wired 7/27/2023:

We need a way to distinguish things made by humans from things made by algorithms, and we need it very soon. […]

Fortunately, we have a solution waiting in plain sight. […]

If the companies who pledged to watermark AI content at the point of origin do so using Unicode—essentially giving AI its own character set—we’ll have a ready-made, fine-grained AI watermark that works across all devices, platforms, operating systems, and websites.

What's proposed here is a character-for-character substitution — like ROT13 encryption but using character codes that are digitally different while being visually the same. As Croll explains:

In Unicode, every character has a number. The Latin Capital Letter A, for example, is hexadecimal number 41. But there are plenty of other A’s in Unicode: There’s Fullwidth Latin Capital Letter A (Ａ, number EF BC A1), Mathematical Bold Capital A (, number F0 9D 90 80), Mathematical Sans-Serif Capital A (, F0 9D 96 A0), and plenty of others. Each A has its own name, its own Unicode value, and in some cases, its own font shape. Why not create a letter A just for AI?

If the AI-specific character sets were created — and we'd need many of them, to support all the world's writing systems — then the watermarking process would be a trivial computer program.

Of course, de-watermarking would be an equally trivial program, so what's the point?

Croll's answer:

It’s important to note that this proposed markup is not an enforcement mechanism. Bad actors could easily convert AI text to look like it was written by a human. A recipient still needs to trust a sender in order to believe what is marked up. But that’s one of the strengths of this approach. Once text is marked, a human has to actively remove the AI marker at some stage between the LLM and the consumer. We have legal mechanisms to investigate and deal with negligence or wrongdoing. The proposed protocol simply lets us apply these to AI.

This really puzzles me. It assumes that a student (for example) who's willing to use an LLM as a ghost-writer, despite this being against the rules, will draw the line at running a trivial character-substitution program to disguise their violation.

Permalink