2020-08-21 GPT3简介

GPT-1 论文

Improving Language Understanding by Generative Pre-Training(2018)

GPT-2 论文

Language Models are Unsupervised Multitask Learners(2019)

GPT-3 论文

Lanuage Models are Few-Shot Learners(2020)

简介

各个模型参数量对比:

模型 参数量
ELMO 940M
BERT-large 340M
GPT2 1542M
Turing NLG 17B
GPT3 175B

目的

gpt的目的.png
few-shot learning

任务描述,展示少量样例,进行模型预测

one-shot learning

任务描述,只有一个样例, 进行模型预测

zero-shot learning

任务描述,没有样例,进行模型预测
以上统称为in-context learning。 GPT3可以closed book QA,文本生成(新闻生成,造句),还可以图片生成。
训练数据描述,爬取数据和测试数据是严格区分的,但是实际中有个bug,但不影响模型效果


图1.In context learning

图2.Closed Book QA

图3.SuperGLUE下模型对比

图4.文本生成中人机图灵测试

图5. GPT3造句

图6. GPT3求解数学问题

图7. GPT3求解数学问题

图8.GPT3在NLI问题上的不足

图9.GPT3 bug说明

We initially tried to address the issue of contamination by proactively searching for and attempting to remove any overlap between our training data and the development and test sets of all benchmarks studied in this paper. Unfortunately, a bug resulted in only partial removal of all detected overlaps from the training data. Due to the cost of training, it wasn’t feasible to retrain the model. To address this, we investigate in detail how the remaining detected overlap impacts results.

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容