Language Log

GLM-130B: An Open Bilingual Pre-Trained Model

January 25, 2023 @ 9:10 am · Filed by Victor Mair under Artificial intelligence, Computational linguistics

Description of a General Language Model (GLM; also GLaM) project based at Tsinghua University in Beijing, but with users and collaborators around the world.

Homepage (August 4, 2022)

This prospectus is difficult for outsiders to understand because of the large number of unexplained acronyms, abbreviations, initialisms, etc. and other such participants' terminology.

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm¹. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:

- Bilingual: supports both English and Chinese.
- Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
- Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
- Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
- Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
- Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.

The model checkpoints of GLM-130B and code for inference are publicly available at our GitHub repo. The code for pre-training and fine-tuning as well as the research paper are coming soon.

I treat the above opening portion of the prospectus as a sort of introduction to the project. Here follow these sections:

…

Figure 1. The performance of GLM-130B vs. models of similar scale on MMLU and LAMBADA.

…

Conceiving GLM-130B (describes the early history of the project and the challenges it faced; choice of pre-training algorithm; sponsorship and computational resources)

…

Figure 2. Major Issues Encountered for Training GLM-130B (month-by-month listing from December, 2021 to July, 2022

…

The Performance of GLM-1308

…

Figure 3. Zero-shot performance on part of CLUE and FewCLUE benchmark datasets.

…

The GLM-139B Model (VHM: this is actually more of an introduction than the prefatory material above, so I give here from it a brief, three sentence excerpt):

GLM-130B is a bilingual (English & Chinese) bidirectional language model with 130 billion parameters trained on over 400 billion text tokens. It is based on the General Language Model (GLM)¹ architecture. GLM-130B leverages autoregressive blanking infilling as its primary pre-training objective.

Figure 4. Example: How GLM-130B is pre-trained on “Like a complete unknown, like a rolling stone

…

Future Work

…

Acknowledgements (click on the arrowhead to expand the list; some essential acronyms (e.g., KEG = Knowledge Engineering Group) are explained here)

…

References (mostly in computational linguistics, machine learning, neural information processing systems, language modeling, scaling, etc.)

War of the machines

Selected readings

"Infinitely malleable electronic brain — software and hardware" (7/29/22)
"Electronic brain" (7/28/22)
"Brain Wars: Tobermory" (3/11/16)
"Reassuring parables" (9/11/13)
"Ideas and actions" (6/22/14)
"My brain hurts" (11/20/08)
"ChatGPT writes Haiku" (12/21/22)
"Translation and analysis" (9/13/04)
"Welcome to China" (3/10/14)
"Alexa down, ChatGPT up?" (12/8/22)
"Detecting LLM-created essays" (12/20/22)

[Thanks to Bill Benzon]

January 25, 2023 @ 9:10 am · Filed by Victor Mair under Artificial intelligence, Computational linguistics

Permalink

Comments are closed.

GLM-130B: An Open Bilingual Pre-Trained Model

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta