Archive for Artificial intelligence

Artificial Intelligence in Language Education: with a note on GPT-3

Registration is open for Artificial Intelligence in Language Education

Please join us for Penn Language Center's annual Language Educator Symposium, co-sponsored by Educational Linguistics at Penn GSE
 
ARTIFICIAL INTELLIGENCE in LANGUAGE EDUCATION
 
Symposium: Saturday, March 25, 2023 at the Kislak Center, Van Pelt Library
Pre-Symposium Workshop: Friday, March 24, 2023 in the Collaborative Classroom, Van Pelt Library
 
Featured Speakers
  • Eleni Miltsakaki, Department of Computer & Information Science, University of Pennsylvania
  • Gareth Roberts, Department of Linguistics, University of Pennsylvania
  • Per Urlaub, Global Languages, Massachusetts Institute of Technology
  • Eva Dessein, Global Languages, Massachusetts Institute of Technology
  • Iryna Kozlova, Graduate School of Education, University of Pennsylvania
Visit our symposium website for a detailed program and registration information. This is an in-person only event. Space is limited so register today!

Read the rest of this entry »

Comments (4)

GLM-130B: An Open Bilingual Pre-Trained Model

Description of a General Language Model (GLM; also GLaM) project based at Tsinghua University in Beijing, but with users and collaborators around the world.

Homepage (August 4, 2022)

This prospectus is difficult for outsiders to understand because of the large number of unexplained acronyms, abbreviations, initialisms, etc. and other such participants' terminology.

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm1. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:

    • Bilingual: supports both English and Chinese.
    • Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
    • Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
    • Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
    • Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
    • Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.

The model checkpoints of GLM-130B and code for inference are publicly available at our GitHub repo. The code for pre-training and fine-tuning as well as the research paper are coming soon.

Read the rest of this entry »

Comments

The Dead Sea Scrolls: every little dot counts

In a masterful Smithsonian Magazine (January-February 2023) article, Chanan Tigay documents:

How an Unorthodox Scholar Uses Technology to Expose Biblical Forgeries:  Deciphering ancient texts with modern tools, Michael Langlois challenges what we know about the Dead Sea Scrolls

This engrossing account is so rich that I can only touch on a few of the highlights.  It's about a would-be, and to some extent still is, rock musician — looking like the bassist from Def Leppard — named Michael Langlois.  But, at 46, "he is also perhaps the most versatile—and unorthodox—biblical scholar of his generation."

What makes Langlois so special?  Reading through Tigay's article, it is his relentless quest to get to the bottom of puzzles posed by tiny details of the Dead Scrolls, and his creativity in devising unconventional tools and approaches for doing so.

Read the rest of this entry »

Comments (10)

Mirabile scriptu: fake kanji created by AI

Read the rest of this entry »

Comments (1)

ChatGPT writes Haiku

[This is a guest post by Bill Benzon]

I’ve been spending a LOT of time with ChatGPT. So naturally, I decided to have it create some haiku.  [VHM:  See the link to Bill's blogpost after the page break.]  This post is about that, but also about a most remarkable woman, Margaret Masterman (1910-1986). She’d studied with Wittgenstein in the 1930s and then went on to create the Cambridge Research Unit in Linguistics in the 1950s. There she became one of the founders of computational linguistics and had a computer generate haiku in 1969. As far as I know, it’s the first time that’s been done.
 
Take at look at the very end. I’ve taken to closing my dialogs by thanking ChatGPT. I know it’s not conscious, nor sentient, but why not? It’s fun. This time I decided to thank it in Japanese. Except that I neither speak nor read Japanese. But I can use Google Translate. I thought ChatGPT would have no trouble, but I do think its reply was rather clever.
 
Best of the season to you, and the rest of the Log.

Read the rest of this entry »

Comments (15)

Speech to speech translation of unwritten languages: Hokkien

Everybody's talking about it.

"Meta has developed an AI translator for a primarily-spoken language

It only translates between Hokkien and English for now, but offers potential for thousands of languages without official written systems."

By Amanda Yeo, Mashable (October 20, 2022)

If true, this technology could be an enormous boon for illiterates everywhere.  It also has important theoretical and linguistic implications.

Read the rest of this entry »

Comments (3)

"Collapsed" calligraphy, part 2

New article by Nyri Bakkalian in Unseen Japan (9/17/22):

"New App Promises Greater Convenience in Reading Old Japanese Cursive:

Kuzushiji, the 'crushed letters' found in historical Japanese documents, have long been the bane of scholars. A new app may change all that."

The author bemoans:

During my graduate education in Japanese history, interpreting handwritten primary source material from the 19th century and earlier was one of my greatest challenges. Typeset historic documents exist, especially in my period of focus during the Bakumatsu-Meiji transition. But the further back in time one’s research focus is situated, the rarer these documents become. There is a plethora of handwritten documents, written in historic cursive, but learning how to read them is a significant investment of time and resources beyond the means of most people who might otherwise have the inclination to learn.

Read the rest of this entry »

Comments (1)

Google Translate is even better now, part 2

"Google Translate learns 24 new languages"
Isaac Caswell, Google blog (5/11/22)

==========

Illustrated green globe with the word "hello" translated into different languages.

For years, Google Translate has helped break down language barriers and connect communities all over the world. And we want to make this possible for even more people — especially those whose languages aren’t represented in most technology. So today we’ve added 24 languages to Translate, now supporting a total of 133 used around the globe.

Over 300 million people speak these newly added languages — like Mizo, used by around 800,000 people in the far northeast of India, and Lingala, used by over 45 million people across Central Africa. As part of this update, Indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Translate for the first time.

Read the rest of this entry »

Comments (24)

Why is Facebook's Chinese translation still so terrible?

[This is a guest post by Jenny Chu]

Has Language Log been following up on the great sorrow that is Facebook's (Chinese) translation feature? The last reference I found was this one

It came up today when I was reading this somewhat viral post on Facebook

I switched on the auto-translate option to help me understand. The results were not just astonishingly bad, but had a surprisingly medical bent.

 
今天這個主權政府作承諾的時候大辭炎炎,七情上面,結果又是如何?–> "Today, when the private government is working, the weather is colon inflammation, above the sentiment, what is the result?"

Read the rest of this entry »

Comments (11)

AI cat and mouse robot censorship war

Now it's getting interesting:

"China’s internet police losing man-versus-machine duel on social media"

Stephen Chen, SCMP (11/14/21)

    Hordes of bot accounts using clever dodging tactics are causing burnout among human censors, police investigative paper finds
    Authorities may respond by raising a counter-army of automated accounts or even an AI-driven public opinion leader

Read the rest of this entry »

Comments (3)

Difficult languages and easy languages, part 3

There may well be a dogma out there stating that all languages are equally complex, but I don't believe it, especially not if it has to be "drummed" into our minds.  I have learned many languages.  Some of them are exceedingly hard (because of their complexity) and some of them are relatively easy (because they are comparatively simple).  I have often said that Mandarin is the easiest language I ever learned to speak, but the hardest to read and write in characters (though very easy in Romanization).  And remember these posts:

"Difficult languages and easy languages" (3/4/17)

"Difficult languages and easy languages, part 2" (5/28/19)

Read the rest of this entry »

Comments (33)

The implications of Chinese for AI development, part 2

With this post, we are already acquainted with Inspur's Yuan 1.0, "one of the most advanced deep learning language models that can generate coherent Chinese texts."  Now, with the present article, we will delve more deeply into the potentials and pitfalls of Inspur's deep learning language model:

"Inspur unveils GPT-3 equivalent for Chinese language", by Wei Sheng, TechNode (1026/21)

The model is trained with 245.7 billion parameters—the number of weights in an artificial neural network, according to the company. This is more than the Elon Musk-backed GPT-3 language model for English, which has 175 billion parameters. Inspur said the Yuan model was trained with 5 terabytes of datasets.

Read the rest of this entry »

Comments (4)

The implications of Chinese for AI development

New article in EnterpriseAI (October 21, 2021):

"Language Model Training Gets Another Player: Inspur AI Research Unveils Yuan 1.0",  by Todd R. Weiss

From Pranav Mulgund:

This article introduces an interesting new advance in an artificial intelligence (AI) model for Chinese. As you probably know, Chinese has been long held as one of the hardest languages for AI to crack. Baidu and Google have both been trying for a long time, but have had a lot of difficulty given the complexity of the language. But the company Inspur just came out with a model called Yuan 1.0 that shows significant advances from previous companies' AIs.

Read the rest of this entry »

Comments (5)