Attribution of the WannaCry ransomware to Chinese speakers

« previous post | next post »

The notorious WannaCry malware infestation began on Friday, May 12, 2017 and spread rapidly throughout the world, infecting hundreds of thousands of computers and causing major damage.  Speculation concerning the identity of the perpetrators focused on North Korea, but the supposed connection was never convincingly demonstrated, and there were no other serious suspects.

Yesterday, Jon Condra, John Costello, and Sherman Chu published a stunning report which suggests that the authors of WannaCry — or someone they hired — spoke fluent Chinese:

Linguistic Analysis of WannaCry Ransomware Messages Suggests Chinese-Speaking Authors” (Flashpoint [5/25/17])

The Flashpoint analysts examined translated ransom messages for the following 28 languages:

1. Bulgarian
2. Chinese (Simplified)
3. Chinese (Traditional)
4. Croatian
5. Czech
6. Danish
7. Dutch
8. English
9. Filipino
10. Finnish
11. French
12. German
13. Greek
14. Indonesian
15. Italian
16. Japanese
17. Korean
18. Latvian
19. Norwegian
20. Polish
21. Portuguese
22. Romanian
23. Russian
24. Slovak
25. Spanish
26. Swedish
27. Turkish
28. Vietnamese

The analysts astutely point out:

Analysis revealed that nearly all of the ransom notes were translated using Google Translate and that only three, the English version and the Chinese versions (Simplified and Traditional), are likely to have been written by a human instead of machine translated. Though the English note appears to be written by someone with a strong command of English, a glaring grammatical error* in the note suggest the speaker is non-native or perhaps poorly educated.

[*VHM:  “But you have not so enough time.”]

Flashpoint determined that the English ransom note was used as the source text for the other languages via Google Translate (100% identical or nearly so in all cases) — except for the Chinese versions.

The two Chinese ransom notes differ substantially from other notes in content, format, and tone. Google Translate fails in both Chinese-English and English-Chinese tests, producing inaccurate results that suggests the Chinese text was likely not have been similarly generated by the English text.

A number of unique characteristics in the note indicate it was written by a fluent Chinese speaker. A typo in the note, “帮组” (bang zu) instead of “帮助” (bang zhu) meaning “help,” strongly indicates the note was written using a Chinese-language input system rather than being translated from a different version. More generally, the note makes use of proper grammar, punctuation, syntax, and character choice, indicating the writer was likely native or at least fluent. There is, however, at least one minor grammatical error which may be explained by autocomplete, or a copy-editing error.

The text uses certain terms that further narrow down a geographic location. One term, “礼拜” for “week,” is more common in South China, Hong Kong, Taiwan, and Singapore; although it is occasionally used in other regions of the country. The other, “杀毒软件” for “anti-virus,” is more common in the Chinese mainland.

Perhaps most compelling, the Chinese note contains substantial content not present in any other version of the note, is lengthier, and differs slightly in format.

I agree with the reasoning of the Flashpoint authors on all their substantive points.  In particular, their noticing the typo of bāngzǔ 帮组 for bāngzhù 帮助 (“help”) is crucial, since it indicates that the malware authors were actively inputting the Chinese text and not relying on machine translation.

Here I would like to offer additional evidence that supports the analysts’ pinpointing the location of the language variety in the south.  Namely, the specific substitution of z- for zh- is a common southernism, which Neil Kubler, a specialist on Sinitic topolects, says is typical of “native speakers of Lower Yangtze Mandarin and native speakers of Southwestern Mandarin; plus native speakers of Wu, Yue, Min, and Hakka topolects when they are speaking their second language ‘Southern Mandarin’.”

Jonathan Smith notes:

I suppose this just involves typing pinyin “bangzu” and then the IME produces “帮组” since that is actually a thing (the p- [bang] group). I tried it with google and indeed 帮组 is the only suggestion for “bangzu”. And this onset merger (zh > z, etc.) is of course ubiquitous in the south… so in my view probably impossible to specify any further.

So I think that the Flashpoint analysts are indeed justified in arguing that — unlike all the other languages of the ransom note — not only was the Chinese not machine translated from the English, it was almost certainly produced in the south of China or by individuals of southern Chinese extraction.

The analysts conclude:

Flashpoint assesses with high confidence that the author(s) of WannaCry’s ransomware notes are fluent in Chinese, as the language used is consistent with that of Southern China, Hong Kong, Taiwan, or Singapore. Flashpoint also assesses with high confidence that the author(s) are familiar with the English language, though not native. This alone is not enough to determine the nationality of the author(s).

I concur.

[h.t. Gábor Ugray; thanks to Julian Wheatley, Anwei Feng, Minglang Zhou, Grace Wu, Melvin Lee, and Brendan O’Kane]



17 Comments

  1. liuyao said,

    May 26, 2017 @ 11:50 pm

    Very interesting, and it seems that the perpetrators were not very sophisticated.

    I wouldn’t give the evidence of 礼拜 much weight. I say it all the time and I’m from Beijing. Only in writing would I switch to 星期 or 周.

  2. Victor Mair said,

    May 27, 2017 @ 12:03 am

    “Only in writing would I switch to 星期 or 周”

    But this IS in writing.

    And why WOULD you switch?

    People in the south do not feel compelled to switch. For them it is quite natural to WRITE 礼拜. Why do you feel it is unnatural to write it?

  3. Bathrobe said,

    May 27, 2017 @ 12:39 am

    I agree with liuyao on 礼拜. This word is used commonly in both north and south China, although in written contexts 星期 is preferred. I have heard Cantonese speakers consciously use 星期 in Mandarin even though 礼拜 would be more “natural” or “expected”.

    I would therefore expect a Mainlander from either north or south to be happy using 礼拜 in speech and 星期 in writing.

    In Taiwan, on the other hand, 禮拜 is also found in writing. Possibly also in Hong Kong /Macao.

    That would suggest that the perpetrators are from Hong Kong or Taiwan. But this is a pretty slender thread to hang any conclusions from. I don’t honestly think that the use of 礼拜 is a terribly reliable pointer to north China, south China, or Taiwan.

  4. Gnoey said,

    May 27, 2017 @ 1:12 am

    I agree with liuyao and Bathrobe. I’m from Singapore and we say 礼拜 all the time, but I would prefer to use 星期 in writing too. 礼拜 feels a bit too colloquial, and I would find it somewhat jarring in formal writing.

  5. B.Ma said,

    May 27, 2017 @ 1:15 am

    There’s a “glaring grammatical error” (or two) in the report text: inaccurate results that suggests the Chinese text was likely not have been similarly generated by the English text.

    On the topic of 禮拜 vs 星期: in Cantonese in Hong Kong, I say 禮拜 in all situations. It’s fine with friends and family, but when talking to people I don’t know personally, I recall several instances where it has been awkward and the other party repeated what I said but changing my 禮拜 into 星期. E.g. “I’ll have it done by Monday” “Great, so you’ll have it done by Monday?”

    I think I would choose to handwrite 星期 just because it feels less cumbersome in traditional, and subconciously this carries over to writing simplified and typing, even though libai is actually one keystroke less in pinyin…

  6. Victor Mair said,

    May 27, 2017 @ 7:54 am

    During the 60s and 70s, my exposure to Chinese was completely in a Taiwan context. I, and everybody around me in that environment, used lǐbài 禮拜 (“week” < "worship" < "ceremony / ritual + worship") in speaking and in writing. It was only when I started to go to the mainland in the 80s that I began to encounter people saying and writing xīngqí 星期 for "week", and I distinctly recall that it was jarring to my ear. I remember inquiring, "What is this 'star period' that people are talking about?" I knew about zhōu 周 with the meaning "week", but that was strictly in literary language.

  7. Lai Ka Yau said,

    May 27, 2017 @ 8:47 am

    @Bathrobe: That’s exactly me! I say 星期 in Mandarin all the time, and I can’t kick the habit even after learning that 禮拜 is accepted in spoken Mandarin. I have also never written 禮拜 in standard written Chinese.

  8. liuyao said,

    May 27, 2017 @ 9:55 am

    Could it be that the communists thought that libai in writing had too much religious association? If one grows up only speaking libai, he may not be aware of it.

    I’m reminded of an érgē (children’s song, but spoken) that goes “Jīntiān libaiyi, shàng jiē mai dàyī, dàyī de jiàqian yi kuai yi mao Yi. Jīntiān libaier…” on and on.

  9. Victor Mair said,

    May 27, 2017 @ 11:07 am

    After I realized that xīngqí 星期 (“star period”) meant “week”, I just chalked it up as one more piece of evidence for the massive language engineering that went on under the Communists.

  10. Alex said,

    May 27, 2017 @ 11:19 am

    “During the 60s and 70s, my exposure to Chinese was completely in a Taiwan context. I, and everybody around me in that environment, used lǐbài 禮拜 (“week” < "worship" < "ceremony / ritual + worship") in speaking and in writing. It was only when I started to go to the mainland in the 80s that I began to encounter people saying and writing xīngqí 星期 for "week", and I distinctly recall that it was jarring to my ear."

    I guess it was the same for me. I only heard and used libai growing up in the US as my parents and Chinese school was Taiwanese and it wasn't until coming to Shenzhen where heard xingqi used much more often. There were many such terms for example word choice for bicycle, Jiaotache vs zixingche.

  11. hanmeng said,

    May 27, 2017 @ 1:38 pm

    I remember back around 1990 my friends from Beijing always used 礼拜 in speaking. Then around 10-15 years ago I started to hear someone from northern China use 周一,周二, 周三 etc. for weekdays.

    As for differences in writing, surely there can be a different register for writing. (You don’t think I talk like this, do you?)

  12. Victor Mair said,

    May 27, 2017 @ 5:54 pm

    The occurrence of written lǐbài 禮拜 for “week” in the Chinese ransom notes by itself is not probative of a southern origin, but it does constitute collateral evidence for such an origin.

  13. Victor Mair said,

    May 27, 2017 @ 9:19 pm

    [VHM: I have had to made some adaptations to post this because WordPress won’t allow the use of left- and right-facing arrowheads, and it strips out underlining. I’ve replaced the arrowheads with | and the underlining with italics.]

    From Jonathan Smith:

    Didn’t see this linked; maybe I missed it? From online, not sure if 100% accurate.

    I don’t think it was linked through Flashpoint

    I just underlined the stuff Flashpoint did in your link.

    电脑 is a mainland word
    付款费用 is highly marked, perhaps TW?
    the sentence with …越长… is arguably ungrammatical.

    我的电脑出了什么问题?

    您的一些重要文件被我加密保存了。

    照片、图片、文档、压缩包、音频、视频文件、exe文件等,几乎所有类型的文件都被加密了,因此不能正常打开。

    这和一般文件损坏有本质上的区别,您大可在网上找找恢复文件的方法,我敢保证,没有我们的解密服务,就算老天爷来了也不能恢复这些文档。

    有没有恢复这些文档的方法?

    当然有可恢复的方法。只能通过我们的解密服务才能恢复。我以人格担保,能提供安全有效的恢复服务。

    但这是收费的,也不能无限期的推迟。

    请点击 | Decrypt | 按钮,就可以免费恢复一些文档。请您放心,我是绝不会骗你的。

    但想要恢复全部文档,需要付款点费用

    是否随时都可以固定金额付款,就会恢复的吗,当然不是,推迟付款时间越长对你不利。

    最好3天之内付款费用,过了三天费用就会翻倍。

    还有,一个礼拜之内未付款,将会永远恢复不了。

    对了,忘了告诉你,对半年以上没钱付款的穷人,会有活动免费恢复,能否轮到你,就要看您运气怎么样了。

    付款方法

    我们只会接收比特币。不懂比特币是什么,请点击查看详情 | About bitcoin |

    不会购买比特币,请点击查看购买方法。| How to buy bitcoin |

    要注意:付款金额不能低于在窗口上显示的金额。

    付款后,请点击 | Check Payment | 按钮。因为比特币的到账,所需要的时间有点长,付款后请耐心等待。

    最好的确认时间为周一到周五,从上午9点到11点

    到账成功后,可立刻开始恢复工作。

    联系方式

    如果需要我们的帮组,请点击 | Contact Us |,发给我们消息吧。

    我强烈建议,为了避免不必要的麻烦,恢复工作结束之前,请不要关闭或者删除该软件,并且暂停杀毒软件。不管由于什么原因,万一该软件被删除了,很可能会导致付款后也不能恢复信息的情况。

  14. Bathrobe said,

    May 27, 2017 @ 11:04 pm

    I don’t want to beat my own drum, but I have a web page about 礼拜, 星期, and 周 at Days of the week in Chinese.

    星期 was coined by an outstanding bureaucrat and scholar during the Qing as a more culturally “acceptable” way of saying week, given that 礼拜 was associated with religion (both Christian and Muslim). 星期 is actually a revival of an old term that originally referred to 七夕. The name 星期 was adopted because the names of the days of the week are based on the names of the planets in many languages. It was adopted as the official word for ‘week’ after the fall of the Qing dynasty.

    If 星期 became the strongly preferred term in Mainland China, presumably that was a result of efforts to enforce standardisation as a part of China’s ongoing language planning.

  15. Bathrobe said,

    May 27, 2017 @ 11:10 pm

    This article in Chinese gives more detailed background: 从七曜说到“礼拜”、“星期”、“周”的语源.

  16. Victor Mair said,

    May 29, 2017 @ 3:30 pm

    While the cumulative linguistic evidence presented by Flashpoint and Language Log commenters indicates that the WannaCry hackers are speakers of Chinese who most likely have a background in south China, Hong Kong, Taiwan, or Singapore, here are a few bits of evidence that would tend to exonerate Taiwan:

    1. The suspected culprit most likely uses pinyin for typing. Taiwanese people prefer to use bopomofo (i.e., 注音符號). A Mandarin speaker typing with Pinyin is more prone to enter zhu (> zu) than is someone using bopomofo ㄓㄨˋ (> ㄗㄨˇ).

    2. Another telltale sign is shādú ruǎnjiàn 殺毒軟件 (“antivirus software”). “Software” is called ruǎntǐ 軟體 in Taiwan, not ruǎnjiàn 軟件. If you use Microsoft Chinese Traditional (Taiwan) for entering Chinese characters and type in ruǎn 軟 (“soft; flexible”), you will get suggested phrases such as ruǎntǐ 軟體 (“software”), ruǎntǐ yè 軟體業 (“software industry”), and ruǎntǐ gōngsī 軟體公司 (“software company”), but not ruǎnjiàn 軟件, etc.

    For the record, Taiwan media is aware of Flashpoint’s major contribution to the identification of the perpetrators of the WannaCry ransomware attack:

    “WannaCry創作者 母語可能是中文”

    http://news.ltn.com.tw/news/world/breakingnews/2082824

  17. Eidolon said,

    May 31, 2017 @ 4:50 pm

    “Very interesting, and it seems that the perpetrators were not very sophisticated.”

    Linguistically, obviously not, but I doubt that was ever the point. Knowing the language they spoke is not equivalent to catching them. This was never designed to be a false flags operation, such that the revelation that it was Chinese hackers, not North Korean hackers, who perpetrated the attack, defeats the purpose. It was either a get rich fast scheme or a proof of prestige in one of the many underground hacking communities. And more likely than not, they got away with it.

RSS feed for comments on this post