Astonishing new Google Translate, with the help of generative AI
« previous post | next post »
Google Translate adds Cantonese support, thanks to AI advancement: “Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models,” Google said. By Tom Grundy, Hong Kong Free Press (June 30, 2024).
The Google Translate app has been expanded to include Cantonese, thanks to generative Artificial Intelligence (AI) advancements.
In 2022, Google began using Zero-Shot Machine Translation to expand its pool of supported languages. The machine learning model learns to translate into another language without ever seeing an example, Google said in a Thursday blog post. Now it is using AI to expand the number of supported languages.
It added 110 new languages this week, in its largest-ever expansion, thanks to its PaLM 2 large language model.
Users of the app may now translate between Cantonese – spoken in Chinese communities across the world, and in Hong Kong – and 243 other languages
“Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models,” it said.
Google aims to support the 1,000 most spoken languages around the world.
It also added Punjabi on Thursday, and African languages such as Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof.
“As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time,” Google added.
When will the PRC ever achieve such wonders? Never mind when the PRC will let its people utilize the wonders of GT.
Selected readings
- "Is this authentic Cantonese?" (2/26/24)
- "Token Cantonese" (5/16/15)
- "The interplay between Cantonese and Mandarin as an index of sociopolitical tensions in Hong Kong" (4/30/23)
- "Google Translate sabotage" (6/14/19)
- "Google Translate Sabotage, part 2" (1/17/21)
- "A Japanese-French Google Translate mixup" (7/13/20)
- "More Google Translate hallucinations on YouTube" (6/3/18)
- "The elegance of Google Translate" (3/10/18)
- "The wonders of Google Translate" (9/22/17)
- "Don't blame Google Translate" (2/4/18)
- "Google Translate is even better now" (9/27/16)
- "Google Translate is even better now, part 2" (5/12/22)
- "Google is scary good" (7/31/17)
- "Google Translate Chinese inputting" (1/27/13)
- "Can't find on Google" (8/12/14)
- "Cantonese novels" ()8/20/13)
- "Spoken Hong Kong Cantonese and written Cantonese" (8/29/13)
- Snow, Don. 2004. Cantonese as Written Language, The Growth of a Written Chinese Vernacular. Hong Kong: Hong Kong University Press. Appendix 1 of this remarkable book gives 14 Cantonese texts, each of which Snow carefully analyzes for the degree to which it adheres to the norms of spoken Cantonese rather than of written Modern Standard Mandarin (MSM). The 14 texts, which cover a wide range of genres, date from around the 17th century to the contemporary period. It is striking that the percentages of overtly marked Cantonese (and Snow is referring here not just to special Cantonese characters) in these 14 texts range from only 3% to 36%: 3, 4, 6, 7, 10, 11, 12, 20, 23, 23, 23, 28, 32, 36, for an average of 17%.
- Kwan-hin Cheung and Robert S. Bauer, "The Representation of Cantonese with Chinese Characters", Journal of Chinese linguistics: Monograph series (18); Project on Linguistic Analysis, University of California, 2002.
- "Colloquial Cantonese and Taiwanese as mélange languages" (3/15/21)
[Thanks to Don Keyser]
John Swindle said,
June 30, 2024 @ 6:39 pm
Their complete list of languages to be added is here:
https://support.google.com/translate/answer/15139004
Cantonese was surely one of the most-spoken languages that they didn't support. It was about time. Several others that they added also have speakers here in Hawaii, most notably Chamorro, Chuukese, Marshallese, and Tongan but also Fijian and Tahitian, probably some of the newly added languages from the Philippines, and no doubt others I've missed.
Jenny Chu said,
June 30, 2024 @ 7:19 pm
An important addition was European Portuguese. All "Portuguese" was Brazilian, leading to many a frustrated student.
Chas Belov said,
June 30, 2024 @ 9:17 pm
¡Yes! Tested with the lyrics (lyrics website, not verified for safety) to 對錯 by Hong Kong hiphop group LMF.
Copied the lyrics into Google Translate with Detect language set. It detected Cantonese and correctly translated "唔好" as "do not". "冇" (not have) in this case was translated as "nothing," which I can accept in the context.
Not so sure about "付出 回報又得唔返" being translated as "Pay in return and get back" due to the lack of a negative in the English. But I can't read Cantonese or MSM, so I could be mistaken.
More pleasingly, nothing in the translation came across as gibberish and very little as awkward.
Chas Belov said,
June 30, 2024 @ 9:22 pm
Additionally, the transcriptions have the necessary final consonant stops and are broken up into words (chih) not characters (jih), such as "得" as "dak1" or "問題" as "man6tai4". I do wish they also added the line breaks as it makes it hard to find my place.
Chas Belov said,
June 30, 2024 @ 9:30 pm
Hmm, I see my first post was deleted. Guess the lyric site had some evil code. Sorry about that, and guess I'd better virus-scan my computer.
Okay, I'll do it without links (at least what I can remember).
I pasted the lyrics for 對錯 by Hong Kong hip-hop group LMF into Google Translate with detect language set. It correctly identified the song as being in Cantonese. It mostly correctly identified 唔 as being negative and while it translated 冇 (not have) as nothing, I can accept it in the context.
Refreshingly, there didn't seem to be any gibberish, and there was a minimum of awkward phrasing.
Chas Belov said,
July 1, 2024 @ 12:30 am
Working my way through Google's list with the goal of adding new languages to my global playlist, with varying results. One technique I'm using is to translate "I love you" into the target language to get words likely to be in song titles. I also try the opposite direction, copying song titles into GT and asking it to translate them in English. This latter has very mixed results, with, for instance, an Indonesian group that sings in Balinese is shown a Javanese, and a song from an Indonesian language playlist is shown as being in Sesotho, which, being an African language, is highly unlikely. Sometimes it will say the title is in English, but not translate anything.
That said, I'm adding quite a few songs to my Incubator playlist, which I will then curate for my Infectious playlist. Thank you for calling my attention to Google's list.
Anyway
Chas Belov said,
July 1, 2024 @ 12:40 am
It does seem to depend on getting a large enough slice, such as a couple lines of lyrics. I gave it a couple lines I found in Google search results for a song (to avoid going to possibly dangerous lyric websites) and it detected that the words were in Batak Toba, which matches that Siantar Rap Foundation has an album Tobanese, and now the catchy Boru Ni Raja is on its way to my mega-playlist.
Isaac Caswell said,
July 2, 2024 @ 1:15 pm
Here are playlists covering the new GoogleTranslate languages over the last two years:
Playlist for the new 110 languages added in 2024: https://www.youtube.com/watch?v=G1QEi6RitCw&list=PLXFtMv-aATMXRyFmX7hw2D2j2LtmFW5un
Playlist for the 24 languages added in 2022: https://www.youtube.com/watch?v=tADgP7GSRMw&list=PLXFtMv-aATMVMzu9LlRHl7YuKLpLFqaah
Chas Belov said,
July 2, 2024 @ 9:06 pm
@Isaac Caswell: ¡Thank you!