GPT-1 论文
Improving Language Understanding by Generative Pre-Training(2018)
GPT-2 论文
Language Models are Unsupervised Multitask Learners(2019)
GPT-3 论文
Lanuage Models are Few-Shot Learners(2020)
简介
各个模型参数量对比:
| 模型 | 参数量 |
|---|---|
| ELMO | 940M |
| BERT-large | 340M |
| GPT2 | 1542M |
| Turing NLG | 17B |
| GPT3 | 175B |
目的

few-shot learning
任务描述,展示少量样例,进行模型预测
one-shot learning
任务描述,只有一个样例, 进行模型预测
zero-shot learning
任务描述,没有样例,进行模型预测
以上统称为in-context learning。 GPT3可以closed book QA,文本生成(新闻生成,造句),还可以图片生成。
训练数据描述,爬取数据和测试数据是严格区分的,但是实际中有个bug,但不影响模型效果









We initially tried to address the issue of contamination by proactively searching for and attempting to remove any overlap between our training data and the development and test sets of all benchmarks studied in this paper. Unfortunately, a bug resulted in only partial removal of all detected overlaps from the training data. Due to the cost of training, it wasn’t feasible to retrain the model. To address this, we investigate in detail how the remaining detected overlap impacts results.