Paper | year | author | Notes |
LLaMA: Open and Efficient Foundation Language Models | 2023.02 | Hugo Touvron Meta | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | 2023.07 | Hugo Touvron Meta | 没啥用,不如读 GQA |
| | | |
Fast Transformer Decoding: One Write-Head is All You Need | 2019.11 | Noam Shazeer Google | MQA |
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | 2023.05 | Joshua Ainslie Google | GQA |