【NLP】-02-BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

【唐宇迪】深度学习论文精讲系列-BERT模型

B站链接:https://www.bilibili.com/video/BV1vg4y1i7zX?p=2

论文:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

课时1 论文讲解思路概述

Bidirectional(双向) Encoder Representations from Transformers

课时2 Bert模型摘要概述

0 Abstract

BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.>>模型有文本数据即可,不需要打标签;类似根据上下文进行完形填空;根据预训练模型可直接拿来用,需要添加输出层

BERT is conceptually simple and empirically powerful.

课时3 模型在NLP领域应用效果

1 Introduction

The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. Unlike left-to-right language model pre-training, the MLM objective enables the representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional Transformer.

课时4 预训练模型的作用

2 Related Work

3 Bert

We introduce BERT and its detailed implementation in this section. There are two steps in our framework: pre-training and fine-tuning. During pre-training, the model is trained on unlabeled data over different pre-training tasks. For fine-tuning, the BERT model is first initialized with the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the down stream tasks.>>预训练参数可以直接拿来用,fine-tuned 时需要的是有标签的数据

CLS是指分类时所用的向量,是特殊字符,是全局特征表达;SEP是句子间标签;下游任务(可以应用到多方面)

课时5 输入数据特殊编码字符解析

A distinctive feature of BERT is its unified architecture across different tasks. There is minimal difference between the pre-trained architecture and the final downstream architecture.>>在各种应用中模型架构是不变的,预训练与下游任务中只有细微差别

we denote the number of layers(i.e., Transformer blocks) as L(层数), the hidden size as H(维数), and the number of self-attention heads as A(头数).3We primarily report results on two model sizes: BERT_{BASE} (L=12, H=768, A=12, Total Parameters=110M) and BERT_{LARGE} (L=24, H=1024,A=16, Total Parameters=340M).>>BASE生成的768维向量,是由12个小模块生成的,最后拼接而成;使用时基础模型已经足够了

To make BERT handle a variety of down-stream tasks(下游任务), our input representation is able to unambiguously represent both a single sentence and a pair of sentences(e.g., Question, Answer 输入应包括问题与文章) in one token sequence.

课时6 向量特征编码方法

For a given token, its input representation is constructed by summing the corresponding token, segment, and position embeddings. A visualization of this construction can be seen in Figure 2.>>不同句子,不同位置<一个词在不同的位置,意义可能不一样>都可进行编码(第3/4行),不同行业的信息也可以进行编码;编码是进行相加的

课时7 BERT模型训练策略

we pre-train BERT using two unsupervised tasks(两种无监督方法), described in this section. This step is presented in the left part of Figure 1.

Task #1: Masked LM (带Mask的语言模型)Intuitively, it is reason-able to believe that a deep bidirectional model is strictly more powerful than either a left-to-right model or the shallow concatenation of a left-to-right and a right-to-left model.>>可以理解为之前所说的根据上下文进行完形填空

In order to train a deep bidirectional representation, we simply mask some percentage of the input tokens at random, and then predict those masked tokens. We refer to this procedure as a “masked LM” (MLM), >>随机mask掉15%;也有可能遇到mismatch的问题,mask的词需要进行特殊字符的替换,对于这种问题进行了微调

Task #2: Next Sentence Prediction (NSP)(预测两句话是否是一句话)Many important downstream tasks such as Question Answering (QA) and Natural Language Inference (NLI) are based on understanding the relationship between two sentences, 

Specifically, when choosing the sentences A and B for each pre-training example, 50% of the time B is the actual next sentence that follows A (labeled as Is Next),and 50% of the time it is a random sentence from the corpus (labeled as Not Next). >>A与B有50%可能性可以连接在一起,利用CLS进行二分类,从而进行判断

课时8 论文总结分析(Fine-tuning BERT)

For each task, we simply plug in the task-specific inputs and outputs into BERT and fine-tune all the parameters end-to-end.>>直接输入到输出

At the output, the token representations are fed into an output layer for token-level tasks, such as sequence tagging or question answering, and the [CLS] representation is fed into an output layer for classification, such as entailment or sentiment analysis.>>如果进行命名实体识别,判断是地名/人名等,对每个词做判断,所以对每个词做编码然后解码;所以在不同应用中,需要在输出层选择不同的特征表达作为输出结果

All of the results in the pa-per can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre-trained model.>>训练时间大概几个小时

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,875评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,569评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,475评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,459评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,537评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,563评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,580评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,326评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,773评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,086评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,252评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,921评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,566评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,190评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,435评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,129评论 2 366
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,125评论 2 352

推荐阅读更多精彩内容