GLM-130B: An Open Bilingual Pre-Trained Model
Description of a General Language Model (GLM; also GLaM) project based at Tsinghua University in Beijing, but with users and collaborators around the world.
Homepage (August 4, 2022)
This prospectus is difficult for outsiders to understand because of the large number of unexplained acronyms, abbreviations, initialisms, etc. and other such participants' terminology.
GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the General Language Model (GLM) algorithm1. It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and exhibits the following unique features:
-
- Bilingual: supports both English and Chinese.
- Performance (EN): better than GPT-3 175B (+5.0%), OPT-175B (+6.5%), and BLOOM-176B (+13.0%) on LAMBADA and slightly better than GPT-3 175B (+0.9%) on MMLU.
- Performance (CN): significantly better than ERNIE TITAN 3.0 260B on 7 zero-shot CLUE datasets (+24.26%) and 5 zero-shot FewCLUE datasets (+12.75%).
- Fast Inference: supports fast inference on both SAT and FasterTransformer (up to 2.5X faster) with a single A100 server.
- Reproducibility: all results (>30 tasks) can be easily reproduced with open-sourced code and model checkpoints.
- Cross-Platform: supports training and inference on NVIDIA, Hygon DCU, Ascend 910, and Sunway.
The model checkpoints of GLM-130B and code for inference are publicly available at our GitHub repo. The code for pre-training and fine-tuning as well as the research paper are coming soon.
Read the rest of this entry »
Permalink Comments off