kggle使用tensorflow的代码,铜牌
该代码编码(tokenizier)使用预训练,bert不使用预训练
dropout并非一定会提点,而是提供一种让部分神经元失活的防治过拟合方法,相当于降低模型拟合力换取泛化能力的方法
关于dropout的知乎专栏
self.transformer = BertModel(bert_config)
self.nb_features = self.transformer.pooler.dense.out_features
self.pooler = nn.Sequential(
nn.Linear(self.nb_features, self.nb_features),
nn.Tanh(),
)
self.logit = nn.Linear(self.nb_features, num_classes)
我选择的是使用tokenizier+bert+dropout+Linear
关于init.norm_可以加速收敛
Fills the input Tensor with values drawn from the normal distribution N(mean,std^2)
使用std==0.02的代码源1
使用std==0.02的代码源2
使用方法 torch.nn.init.uniform_(tensor, a=0.0, b=1.0)
原因:Take the first hidden-state from BERT output (corresponding to CLS token) and feed it into a Dense layer with 6 neurons and sigmoid activation (Classifier). The outputs of this layer can be interpreted as probabilities for each of the 6 classes.
bert使用CLS的原因防止和某个位置的关系过大(self-attention计算原理),应该与整个句子的特征有关,不应该对某个位置关系过大。
函数的参数一定是前面没有值进行初始化,后面的值进行初始化了的
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
This seems also related: spacy-transformers
This was a data issue. I removed all non-alphanumeric data from my examples and managed to train
error:Input, output and indices must be on the current device
使用size_average的错误
UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
使用sequenceclassifier的输出
A SequenceClassifierOutput
(if return_dict=True
is passed or when config.return_dict=True
) or a tuple of torch.FloatTensor
comprising various elements depending on the configuration (BertConfig
) and inputs.
loss (
torch.FloatTensor
of shape(1,)
, <cite style="box-sizing: border-box;">optional</cite>, returned whenlabels
is provided) – Classification (or regression if config.num_labels==1) loss.logits (
torch.FloatTensor
of shape(batch_size, config.num_labels)
) – Classification (or regression if config.num_labels==1) scores (before SoftMax).-
hidden_states (
tuple(torch.FloatTensor)
, <cite style="box-sizing: border-box;">optional</cite>, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) – Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
.Hidden-states of the model at the output of each layer plus the initial embedding outputs.
-
attentions (
tuple(torch.FloatTensor)
, <cite style="box-sizing: border-box;">optional</cite>, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) – Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
.Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.