Astonishing new Google Translate, with the help of generative AI

« previous post | next post »

Google Translate adds Cantonese support, thanks to AI advancement:  “Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models,” Google said.  By Tom Grundy, Hong Kong Free Press (June 30, 2024).

The Google Translate app has been expanded to include Cantonese, thanks to generative Artificial Intelligence (AI) advancements.

In 2022, Google began using Zero-Shot Machine Translation to expand its pool of supported languages. The machine learning model learns to translate into another language without ever seeing an example, Google said in a Thursday blog post. Now it is using AI to expand the number of supported languages.

It added 110 new languages this week, in its largest-ever expansion, thanks to its PaLM 2 large language model.

Users of the app may now translate between Cantonese – spoken in Chinese communities across the world, and in Hong Kong – and 243 other languages

“Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models,” it said.

Google aims to support the 1,000 most spoken languages around the world.

It also added Punjabi on Thursday, and African languages such as Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof.

“As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time,” Google added.

When will the PRC ever achieve such wonders?  Never mind when the PRC will let its people utilize the wonders of GT.

 

Selected readings

[Thanks to Don Keyser]



9 Comments

  1. John Swindle said,

    June 30, 2024 @ 6:39 pm

    Their complete list of languages to be added is here:
    https://support.google.com/translate/answer/15139004

    Cantonese was surely one of the most-spoken languages that they didn't support. It was about time. Several others that they added also have speakers here in Hawaii, most notably Chamorro, Chuukese, Marshallese, and Tongan but also Fijian and Tahitian, probably some of the newly added languages from the Philippines, and no doubt others I've missed.

  2. Jenny Chu said,

    June 30, 2024 @ 7:19 pm

    An important addition was European Portuguese. All "Portuguese" was Brazilian, leading to many a frustrated student.

  3. Chas Belov said,

    June 30, 2024 @ 9:17 pm

    ¡Yes! Tested with the lyrics (lyrics website, not verified for safety) to 對錯 by Hong Kong hiphop group LMF.

    Copied the lyrics into Google Translate with Detect language set. It detected Cantonese and correctly translated "唔好" as "do not". "冇" (not have) in this case was translated as "nothing," which I can accept in the context.

    Not so sure about "付出 回報又得唔返" being translated as "Pay in return and get back" due to the lack of a negative in the English. But I can't read Cantonese or MSM, so I could be mistaken.

    More pleasingly, nothing in the translation came across as gibberish and very little as awkward.

  4. Chas Belov said,

    June 30, 2024 @ 9:22 pm

    Additionally, the transcriptions have the necessary final consonant stops and are broken up into words (chih) not characters (jih), such as "得" as "dak1" or "問題" as "man6tai4". I do wish they also added the line breaks as it makes it hard to find my place.

  5. Chas Belov said,

    June 30, 2024 @ 9:30 pm

    Hmm, I see my first post was deleted. Guess the lyric site had some evil code. Sorry about that, and guess I'd better virus-scan my computer.

    Okay, I'll do it without links (at least what I can remember).

    I pasted the lyrics for 對錯 by Hong Kong hip-hop group LMF into Google Translate with detect language set. It correctly identified the song as being in Cantonese. It mostly correctly identified 唔 as being negative and while it translated 冇 (not have) as nothing, I can accept it in the context.

    Refreshingly, there didn't seem to be any gibberish, and there was a minimum of awkward phrasing.

  6. Chas Belov said,

    July 1, 2024 @ 12:30 am

    Working my way through Google's list with the goal of adding new languages to my global playlist, with varying results. One technique I'm using is to translate "I love you" into the target language to get words likely to be in song titles. I also try the opposite direction, copying song titles into GT and asking it to translate them in English. This latter has very mixed results, with, for instance, an Indonesian group that sings in Balinese is shown a Javanese, and a song from an Indonesian language playlist is shown as being in Sesotho, which, being an African language, is highly unlikely. Sometimes it will say the title is in English, but not translate anything.

    That said, I'm adding quite a few songs to my Incubator playlist, which I will then curate for my Infectious playlist. Thank you for calling my attention to Google's list.

    Anyway

  7. Chas Belov said,

    July 1, 2024 @ 12:40 am

    It does seem to depend on getting a large enough slice, such as a couple lines of lyrics. I gave it a couple lines I found in Google search results for a song (to avoid going to possibly dangerous lyric websites) and it detected that the words were in Batak Toba, which matches that Siantar Rap Foundation has an album Tobanese, and now the catchy Boru Ni Raja is on its way to my mega-playlist.

  8. Isaac Caswell said,

    July 2, 2024 @ 1:15 pm

    Here are playlists covering the new GoogleTranslate languages over the last two years:

    Playlist for the new 110 languages added in 2024: https://www.youtube.com/watch?v=G1QEi6RitCw&list=PLXFtMv-aATMXRyFmX7hw2D2j2LtmFW5un

    Playlist for the 24 languages added in 2022: https://www.youtube.com/watch?v=tADgP7GSRMw&list=PLXFtMv-aATMVMzu9LlRHl7YuKLpLFqaah

  9. Chas Belov said,

    July 2, 2024 @ 9:06 pm

    @Isaac Caswell: ¡Thank you!

RSS feed for comments on this post