Language Log

Government dampers on AI in the PRC, part 2

July 20, 2024 @ 4:30 pm · Filed by Victor Mair under Artificial intelligence, Censorship, Language and politics

"China deploys censors to create socialist AI: Large language models are being tested by officials to ensure their systems ‘embody core socialist values’", by Ryan McMorrow and Tina Hu in Beijing, Financial Times (July 17 2024)

Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.

The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.

The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping.

The basic premises under which the testing is being carried out ensure that China's AI efforts will end in abject failure:

Two decades after introducing a “great firewall” to block foreign websites and other information deemed harmful by the ruling Communist party, China is putting in place the world’s toughest regulatory regime to govern AI and the content it generates.

The CAC has “a special team doing this, they came to our office and sat in our conference room to do the audit”, said an employee at a Hangzhou-based AI company, who asked not to be named.

“We didn’t pass the first time; the reason wasn’t very clear so we had to go and talk to our peers,” the person said. “It takes a bit of guessing and adjusting. We passed the second time but the whole process took months.”

So you fail but don't know why you failed, you pass but don't know why you passed. Par for the course with anything ideologically imbued in China. That leaves you guessing and eternally hesitant to do anything truly creative.

Self-censorship: that's the name of the game in the PRC.

The filtering begins with weeding out problematic information from training data and building a database of sensitive keywords. China’s operational guidance to AI companies published in February says AI groups need to collect thousands of sensitive keywords and questions that violate “core socialist values”, such as “inciting the subversion of state power” or “undermining national unity”. The sensitive keywords are supposed to be updated weekly.

Users of PRC AI proucts spot their weaknesses immediately:

The result is visible to users of China’s AI chatbots. Queries around sensitive topics such as what happened on June 4 1989 — the date of the Tiananmen Square massacre — or whether Xi looks like Winnie the Pooh, an internet meme, are rejected by most Chinese chatbots. Baidu’s Ernie chatbot tells users to “try a different question” while Alibaba’s Tongyi Qianwen responds: “I have not yet learned how to answer this question. I will keep studying to better serve you.”

Nauseatingly useless.

It gets even worse when you start to look at the hyper-sensitive matter of the mind of Xi Jinping:

…Beijing has rolled out an AI chatbot based on a new model on the Chinese president’s political philosophy known as “Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era”, as well as other official literature provided by the Cyberspace Administration of China.

Then it gets really funny when the authorities try to think of ways to make the system seem not entirely resistant to inquiries regarding political topics:

The CAC has introduced limits on the number of questions LLMs can decline during the safety tests, according to staff at groups that help tech companies navigate the process. The quasi-national standards unveiled in February say LLMs should not reject more than 5 per cent of the questions put to them.

LOL! If, heaven forbid, I had to live in the the PRC, I could defeat the system very easily: I would just keep asking difficult political questions, such as the treatment of Uyghurs and Tibetans and policies regarding languages other than Mandarin. But then the system would undoubtedly report ME for being obstreperous, and I would be brought in to drink tea.

The safest policy, one that has been adopted by some LLM companies, is just to reject all questions that touch upon Xi Jinping. Another is to ensure that their chatbots can only supply answers that are certifiably safe by government censors.

AI with socialist characteristics reminds me of mathematics with socialist characteristics, physics with socialist characteristics, chemistry with socialist characteristics, English literature studies with socialist characteristics… — all bound to fail miserably.

Selected readings

"Government dampers on AI in the PRC", (7/16/24)
"The perils of AI (Artificial Intelligence) in the PRC" (4/17/23) — with extended bibliography

[Thanks to Mark Metcalf]

July 20, 2024 @ 4:30 pm · Filed by Victor Mair under Artificial intelligence, Censorship, Language and politics

Permalink

9 Comments

AntC said,

July 20, 2024 @ 4:56 pm

As Prof Mair has instantiated in many posts, Chinese netizens are joyfully adept at generating euphemisms and oblique circumlocutions for banned phrases. My guess would be that as fast as an AI company can release a 'Newspeak'-rated chatbot, the citizenry will undermine it.
Sergey said,

July 20, 2024 @ 10:28 pm

Well, the self-censorship exists not only in China, just in different forms. Remember when Google's AI misrecognized some black man as a gorilla? Their solution was apparently to remove gorillas from the training data set. Now it would misrecognize gorillas as black men but that's evidently considered acceptable. Or remember the first Microsoft chatbot that learned from the conversations, and was killed because it learned things that went against the religious beliefs of its masters?
Victor Mair said,

July 21, 2024 @ 4:43 am

Are you equating self-censorship in the West with self-censorship in the PRC?
Sergey said,

July 21, 2024 @ 11:25 am

The degree of both self-censorship and censorship are obviously different in PRC and in the West. At least at the moment. But unfortunately both exist in the West as well, and there are is a strong government push in the West for increased censorship under the guise of "fighting the misinformation" (i.e. the information not approved by the government). Europe and Australia already have censorship laws, and there is a strong push for censorship, and as we know from "Twitter papers" and "Facebook papers" successful censorship attempts by the Democratic party in the US.
Sergey said,

July 21, 2024 @ 11:33 am

So what I'm saying if this censorship in the West won't get stopped now, it will gradually grow to the same degree as in PRC. Driven in both cases by socialism, making socialism the state religion, and the desire to eradicate any blasphemy.
Philip Anderson said,

July 22, 2024 @ 1:26 am

@Sergey
What you are saying is ridiculous.
You define misinformation as “the information not approved by the governments”, rather than lies, distortions and conspiracy theories. Misinformation is generally countered by fact-checking, particularly by independent, non-Government organisations.
Attributing this all to “socialism” is equally ridiculous, but suggests where you are coming from. Australia is not remotely socialist, and not even particularly liberal. Unless by Europe you mean Russia and Belarus, European countries are not one-party states full of censorship. And is China really socialist now?
Benjamin E. Orsatti said,

July 22, 2024 @ 8:49 am

Sergey makes an important point. Sure, China is more clunky and ham-handed about it, but AI in the West is "censored" to the extent that the programmers always seem to scramble to "correct" anything that AI spits out that goes against the "secular catechism" of "progressive" ideology, which, as Sergey identifies, ultimately has its roots in Marxist ideology, insofar as it's concerned primarily with issues of "identity" and various identity groups being in or out of power as against each other. So, any AI response adjudged insufficiently "sensitive" to a particular prestige group is axed, and ideas that run contrary to the prevailing philosophy are, um, "nudged" away.

That's a point worth making, and to jump straight to "that's ridiculous" isn't exactly fostering dialogue (amongst homo sapiens, that is), is it?
Sergey said,

July 22, 2024 @ 11:33 am

@Philip Anderson:

"lies, distortions and conspiracy theories" – oh, that's what PRC fights too. It's the universal excuse for censorship. In reality, of course, the way to counter the lies is by telling the truth, the way to counter the consipracy theories is by increaisng the transparency of government to let everyone easily observe what is going on. But that way doesn't work for the governments that hold dear the ideas not grounded in reality and work through conspiracies.

"Misinformation is generally countered by fact-checking, particularly by independent, non-Government organisations." – Unfortunately, no, the Twitter and Facebook files show the government directly demanding to ban and shadow-ban certain information. Australia and Europe have censorship laws, now actively used, and Canada has a bill under consideration that would outdo them all.

But suppose, "countered by fact-checking" were true, who then decides, which non-government organizations to trust? "Fact-checking" might work if it included ALL the viewpoints. But as it is, "fact-checking organizations" are just propaganda outlets. And, to put it mildly, heavily untruthful ones. It's absolutely incredible: I've been making a game, asking people to pick some "fact-checking" article, and then would go through it, and find either an unsubstantiated claim or an outright lie in near every one of the first 10 sentences. Sometimes in EVERY one of the first 10 sentences. Considering that their really favorite method of lying is lie-by-omission, having so many direct lies in the first sentences is absolutely remarkable. Want to play?

Russia is actually a great example of how thing gradually progress from the freest Internet in the world to "we aim to copy China". Of course, only to stop the distribution of "fakes" and "offensive misinformation". But noteworthy, of the ex-USSR countries, Russia is still not the heaviest censored one. For example, Kazakhstan with its "great Kazakh firewall" and ban on HTTPS outdoes it for now. And it might look somewhat surprising, but Ukraine has even more severe censorship than Russia, with more severe punishments, and enacted it earlier.
Tom said,

July 22, 2024 @ 2:05 pm

Sergey: if we violate our own published standards, it is an act of civilization and social progress. If they violate our standards, it's proof of the lack of civilization and progress. It's quite simple. If this blog and the commentators call something out, it's enlightment, if they call something out, it's a threat to democracy. Not hard to grasp, Victor Mair is not a propagandist, he brings enlighment.

Regarding Russia or Ukraine, let alone Europe: If Europe does it, don't talk about, if Russia did it, mention the abducted children. And Ukraine is now the standard bearer.of European enlightment now, whatever is right for West Ukrainians overwrites the constitutions of Europe. Don't be silly, we're already a banana-republic society, and intellectuals are the last ones to figure it out. It's not education after all.

RSS feed for comments on this post

Government dampers on AI in the PRC, part 2

9 Comments

AntC said,

Sergey said,

Victor Mair said,

Sergey said,

Sergey said,

Philip Anderson said,

Benjamin E. Orsatti said,

Sergey said,

Tom said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta