Language Log

Archive for Artificial intelligence

This is the 4th time I've gotten Jack and his beanstalk

March 15, 2023 @ 7:45 pm· Filed by Victor Mair under Artificial intelligence, Computational linguistics, Writing

Bill Benzon shares the response he got from ChatGPT to the prompt, "Tell me a story."

Read the rest of this entry »

ChatGPT writes VHM

February 28, 2023 @ 5:47 am· Filed by Victor Mair under Artificial intelligence, Computational linguistics, Writing systems

[This is a guest post by ChatGPT]

Write a Language Log blog post by Victor H. Mair

Title: "The Wonders of the Chinese Language"

As one of the world's oldest and most complex languages, Chinese has captivated linguists and language enthusiasts for centuries. From its unique writing system to its vast vocabulary and intricate grammar, Chinese is a fascinating and challenging language that continues to surprise and delight those who study it.

Perhaps the most distinctive feature of Chinese is its writing system. Chinese characters are visual symbols that convey meaning, and they are used in both written and spoken language. Unlike the phonetic alphabets used in many other languages, Chinese characters do not correspond to specific sounds, but rather to concepts or ideas. This makes the learning process much more complex, as students must memorize thousands of characters and learn their meanings and pronunciations.

Read the rest of this entry »

Permalink Comments (22)

Vignettes of quality data impoverishment in the world of PRC AI

February 23, 2023 @ 8:00 am· Filed by Victor Mair under Artificial intelligence, Computational linguistics, Data bases

Some snippets:

Limited data sets a hurdle as China plays catch-up to ChatGPT

Lack of high-quality Chinese texts on Internet a barrier to training AI models.

Ryan McMorrow, Nian Liu, Eleanor Olcott, and Madhumita Murgia, FT, Ars Technica (2/21/23)

…

Baidu struggled with its previous attempt at a chatbot, known as Plato, which analysts said could not even answer a simple question such as: “When is Alibaba co-founder Jack Ma’s birthday?”

Analysts point to the lack of high-quality Chinese-language text on the Internet and in other data sets as a barrier for training AI software.

GPT, the program underlying ChatGPT, sucked in hundreds of thousands of English academic papers, news articles, books, and social media posts to learn the patterns that form language. Meanwhile, Baidu’s Ernie has been trained primarily on Chinese-language data as well as English-language data from Wikipedia and Reddit.

…

Read the rest of this entry »

Permalink Comments (11)

Uh-oh! DeepL in the classroom; it's already here

February 22, 2023 @ 1:22 pm· Filed by Victor Mair under Artificial intelligence, Computational linguistics

Yesterday in my Classical Chinese class, we were reading Ouyang Xiu's (1007-1072) "Discussion on 'Biographies of Eunuchs'" in the New History of the Five Dynasties (written 1036-1039, published 1072). Here's the relevant passage:

Móu zhī ér bùkě wéi. Wéi zhī ér bùkě chéng. Zhì qí shén zé jù shāng ér liǎng bài. ——“Xīn wǔdài shǐ huàn zhě chuán lùn”

謀之而不可為。為之而不可成。至其甚則俱傷而兩敗。 ——《新五代史宦者傳論》

[Because of the special circumstances of this post, I will not adhere to my usual custom of providing Pinyin Romanization, Hanzi transcription, and English translation all three together.]

Read the rest of this entry »

Permalink Comments (8)

ChatGPT: Theme and Variations

February 21, 2023 @ 5:29 am· Filed by Victor Mair under Artificial intelligence, Computational linguistics

[This is a guest post by Conal Boyce]

Here I’ll recount some recent exchanges I had with ChatGPT. Given the scope of ChatGPT, and the fact that it’s in a self‑described intermediate state, our various impressions of it as of February 2023 must be like those of the three blind men examining an elephant — except the elephant is running. In the heart of the professional programmer, ChatGPT creates existential dread since it can spit out in a few seconds a page of code which would have required hours or days for him/her to write and debug — and that only after a lifetime of coding. For the rest of us, for the moment at least, it just provokes curiosity perhaps.

Read the rest of this entry »

Permalink Comments (21)

DeepL Translator

February 16, 2023 @ 10:18 am· Filed by Victor Mair under Artificial intelligence, Translation

I have often sung the praises of Google Translate (see "Selected readings" below for a few sample posts), but now I've learned about an online translator that, for many languages, may be even better. Since we've been discussing phenomenal developments in AI quite a bit lately (see also under "Selected readings" below), now seems as good a time as any to introduce DeepL to the collective Language Log readership.

In truth, we've barely mentioned DeepL before (see comments here, here, here, and here), so I really didn't notice it until this past week when my students and auditors from East Asia told me about it. Seeing what DeepL could do, I was simply overwhelmed. Let me explain how that happened.

Most of the participants in my Middle Vernacular Sinitic (MVS) seminar (all attendees are from China, Japan, and Korea), said that they've been using it regularly for years. They also mentioned that they use OCR apps on their phones. The scanned texts they use can then be fed into various applications for translation. Many of them also use Grammarly to improve the quality of their writing. Lately I myself have noticed that when I write papers, essays, and letters in word processing programs (e.g., Microsoft Word), the processor gives me mostly good suggestions for getting rid of superfluous, redundant, awkward suggestions.

Specifically, what impressed me so much about DeepL in this instance is that we were faced with a Dutch translation of a rare, medieval Chinese text with a lot of esoteric vocabulary. The Dutch translator had done a commendable job of getting from the difficult Chinese to Dutch, but then we had to use OCR on his limited circulation Dutch publication to produce a document to feed into DeepL. When I read the resulting English translation, I was amazed at how faithfully the English conveyed the sense and the feeling of the extremely recondite medieval Chinese text. Of course, the English wasn't perfect, but it made a tremendous contribution toward getting a handle on what was happening in the medieval Chinese text that had seldom been read by anyone (it was lost for more than a thousand years) and had never been translated into any other language beside Dutch.

Read the rest of this entry »

Permalink Comments (7)

Digitization of Babylonian fragments

February 12, 2023 @ 8:42 am· Filed by Victor Mair under Artificial intelligence, Decipherment, Philology

Once again, DH to the rescue:

AI Deciphers Ancient Babylonian Texts And Finds Beautiful Lost Hymn

Eat your heart out, ChatGPT.

Tom Hale, IFLScience (2/7/23)

It used to be that paleographers and philologists labored mightily trying to piece together bits and pieces of old manuscripts, using only their own mental and visual powers. Now they can call on AI allies to provide decisive assistance.

Researchers have crafted an artificial intelligence (AI) system capable of deciphering fragments of ancient Babylonian texts. Dubbed the “Fragmentarium,” the algorithm holds the potential to piece together some of the oldest stories ever written by humans, including the Epic of Gilgamesh.

Read the rest of this entry »

Permalink Comments (10)

Artificial Intelligence in Language Education: with a note on GPT-3

February 4, 2023 @ 7:31 am· Filed by Victor Mair under Announcements, Artificial intelligence, Language teaching and learning, Pedagogy

Registration is open for Artificial Intelligence in Language Education

Please join us for Penn Language Center's annual Language Educator Symposium, co-sponsored by Educational Linguistics at Penn GSE

ARTIFICIAL INTELLIGENCE in LANGUAGE EDUCATION

Symposium: Saturday, March 25, 2023 at the Kislak Center, Van Pelt Library

Pre-Symposium Workshop: Friday, March 24, 2023 in the Collaborative Classroom, Van Pelt Library

Featured Speakers

Eleni Miltsakaki, Department of Computer & Information Science, University of Pennsylvania
Gareth Roberts, Department of Linguistics, University of Pennsylvania
Per Urlaub, Global Languages, Massachusetts Institute of Technology
Eva Dessein, Global Languages, Massachusetts Institute of Technology
Iryna Kozlova, Graduate School of Education, University of Pennsylvania

Visit our symposium website for a detailed program and registration information. This is an in-person only event. Space is limited so register today!

Read the rest of this entry »

Permalink Comments (4)

GLM-130B: An Open Bilingual Pre-Trained Model

January 25, 2023 @ 9:10 am· Filed by Victor Mair under Artificial intelligence, Computational linguistics

Description of a General Language Model (GLM; also GLaM) project based at Tsinghua University in Beijing, but with users and collaborators around the world.

Homepage (August 4, 2022)

This prospectus is difficult for outsiders to understand because of the large number of unexplained acronyms, abbreviations, initialisms, etc. and other such participants' terminology.

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm¹. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:

- Bilingual: supports both English and Chinese.
- Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
- Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
- Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
- Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
- Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.

The model checkpoints of GLM-130B and code for inference are publicly available at our GitHub repo. The code for pre-training and fine-tuning as well as the research paper are coming soon.

Read the rest of this entry »

Permalink Comments off

The Dead Sea Scrolls: every little dot counts

January 11, 2023 @ 12:20 am· Filed by Victor Mair under Artificial intelligence, Language and archeology, Language and religion

In a masterful Smithsonian Magazine (January-February 2023) article, Chanan Tigay documents:

How an Unorthodox Scholar Uses Technology to Expose Biblical Forgeries: Deciphering ancient texts with modern tools, Michael Langlois challenges what we know about the Dead Sea Scrolls

This engrossing account is so rich that I can only touch on a few of the highlights. It's about a would-be, and to some extent still is, rock musician — looking like the bassist from Def Leppard — named Michael Langlois. But, at 46, "he is also perhaps the most versatile—and unorthodox—biblical scholar of his generation."

What makes Langlois so special? Reading through Tigay's article, it is his relentless quest to get to the bottom of puzzles posed by tiny details of the Dead Scrolls, and his creativity in devising unconventional tools and approaches for doing so.

Read the rest of this entry »

Permalink Comments (10)

Mirabile scriptu: fake kanji created by AI

January 7, 2023 @ 9:21 am· Filed by Victor Mair under Artificial intelligence, Language and computers, Writing, Writing systems

A #StableDiffusion model trained on images of Japanese Kanji characters came up with “Fake Kanji” for novel concepts like Skyscraper, Pikachu, Elon Musk, Deep Learning, YouTube, Gundam, Singularity, etc.

They kind of make sense. Not bad! https://t.co/ibegk4XszN pic.twitter.com/qkNcA9AAWb

— hardmaru (@hardmaru) January 6, 2023

Read the rest of this entry »

Permalink Comments (1)

ChatGPT writes Haiku

December 21, 2022 @ 9:36 pm· Filed by Victor Mair under Artificial intelligence, Language and literature

[This is a guest post by Bill Benzon]

I’ve been spending a LOT of time with ChatGPT. So naturally, I decided to have it create some haiku. [VHM: See the link to Bill's blogpost after the page break.] This post is about that, but also about a most remarkable woman, Margaret Masterman (1910-1986). She’d studied with Wittgenstein in the 1930s and then went on to create the Cambridge Research Unit in Linguistics in the 1950s. There she became one of the founders of computational linguistics and had a computer generate haiku in 1969. As far as I know, it’s the first time that’s been done.

Take at look at the very end. I’ve taken to closing my dialogs by thanking ChatGPT. I know it’s not conscious, nor sentient, but why not? It’s fun. This time I decided to thank it in Japanese. Except that I neither speak nor read Japanese. But I can use Google Translate. I thought ChatGPT would have no trouble, but I do think its reply was rather clever.

Best of the season to you, and the rest of the Log.

Read the rest of this entry »

Permalink Comments (15)