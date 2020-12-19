« previous post | next post »

Some more details, from Manish Singh, "Google expands languages push to serve non-English speakers in India", TechCrunch 12/16/2020:

Google executives also detailed a new language AI model, which they are calling Multilingual Representations for Indian Languages (MuRIL), that delivers more efficiency and accuracy in handling transliteration, spelling variations and mixed languages and other nuances of languages. MuRIL provides support for transliterated text when writing Hindi using Roman script, which was something missing from previous models of its kind, said Partha Talukdar, research scientist at Google Research India, at a virtual event Thursday. […]

Talukdar noted that the previous model Google relied on proved unscalable as the company had to build models for each language separately. “Building such language-specific modeling for each and every task is not resource efficient as we often don’t have training data for tasks like this,” he said. MuRIL significantly outperforms the earlier model — by 10% on native text and 27% on transliterated text. MuRIL, which was developed by Google executives in India and has been in use for about a year, is now open-source.

The MuRIL models are available here, along with some additional discussion and explanations.

