ChatGLM Papers

用户54

2023年10月9日创建

2467

2574

links

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GLM-130B: An Open Bilingual Pre-trained Model

有 2 个模型

ChatGLM-6B

ChatGLM2-6B

•

使用了 GLM 的混合目标函数

•

FlashAttention 把 Context Length 从 2K 扩展到了 32K。对于更长的上下文，我们发布了 ChatGLM2-6B-32K 模型

•

Multi-Query Attention 推理速度提升 42%，INT4 量化下，6G 显存支持的对话长度由 1K 提升到了 8K。

[2023/07/31] 发布 ChatGLM2-6B-32K 模型，提升对于长文本的理解能力。

[2023/07/25] 发布 CodeGeeX2 模型，基于 ChatGLM2-6B 加入代码预训练实现，代码能力全面提升。

[2023/07/04] 发布 P-Tuning v2 与全参数微调脚本，参见 P-Tuning。

对 ChatGLM2 进行加速的开源项目：

•

fastllm: 全平台加速推理方案，单GPU批量推理每秒可达10000+token，手机端最低3G内存实时运行（骁龙865上约4~5 token/s）

•

chatglm.cpp: 类似 llama.cpp 的 CPU 量化加速推理方案，实现 Mac 笔记本上实时对话

•

ChatGLM2-TPU: 采用TPU加速推理方案，在算能端侧芯片BM1684X（16T@FP16，内存16G）上实时运行约5 token/s

基于或使用了 ChatGLM2-6B 的开源项目：

•

Chuanhu Chat: 为各个大语言模型和在线模型API提供美观易用、功能丰富、快速部署的用户界面，支持ChatGLM2-6B。

支持 ChatGLM-6B 和相关应用在线训练的示例项目：

•

Paper Reading

Pretrain model 的 3 个 framework

•
Autoencoding: Bert​

•
Autoregressive: GPT​

•
Encoder-decoder models: T5​

本文是 autoregressive，2 点改进：

1.
Adding 2D positional encodings​

2.
Allowing an arbitrary order to predict spans​

Model architecture，基于 transformer 的 3 个改变

1.
Rearrange the order of layer norm and residual connection. Critical to avoid numerical errors.​

2.
Single linear layer for output token prediction​

3.
Replace ReLU with GeLUs​

2D positional encoding

Each token is encoded with 2 positional ids.

ChatGLM Papers​