1、回调函数的使用
from kashgari.corpus import SMP2018ECDTCorpus
import keras
import kashgari
from kashgari.tasks.classification import BiLSTM_Model
from kashgari.callbacks import EvalCallBack
import logging
logging.basicConfig(level='DEBUG')
# 加载内置数据集
train_x, train_y = SMP2018ECDTCorpus.load_data('train')
valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
test_x, test_y = SMP2018ECDTCorpus.load_data('test')
# # 也可以使用自己的数据集,格式如下:
# train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
# train_y = ['a', 'b']
valid_x, valid_y = train_x, train_y
test_x, test_y = train_x, train_y
tf_board_callback = keras.callbacks.TensorBoard(log_dir='./logs', update_freq=1000)
model = BiLSTM_Model()
# 这是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1
eval_callback = EvalCallBack(kash_model=model,
valid_x=valid_x,
valid_y=valid_y,
step=1)
model.fit(train_x,
train_y,
valid_x,
valid_y,
batch_size=100,
epochs = 150,
callbacks=[eval_callback, tf_board_callback])
使用了两个回调函数,eval_callback和tf_board_callback。
1、eval_callback是是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1。
其中step默认值是5,也就是默认每5个epoch训计算一次精确度,召回率和 F1。
2、tf_board_callback回调函数在当前目录下生成TensorBoard日志文件。
在后台运行tensorboard时,指定该目录,即可打开tensorboard页面查看。
# tensorboard --logdir=logs
在浏览器中输入http://localhost:6006/
2、预训练模型的使用
下载预训练模型,先从最有名的bert中文模型开始吧
下载地址:
https://github.com/google-research/bert
找到Bert-Base, Chinese模型:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
"""
@Author : Yang Song
@Time : 2020/5/20 15:52
"""
from kashgari.corpus import SMP2018ECDTCorpus
import keras
import kashgari
from kashgari.tasks.classification import BiLSTM_Model
from kashgari.callbacks import EvalCallBack
import logging
logging.basicConfig(level='DEBUG')
# 加载内置数据集
train_x, train_y = SMP2018ECDTCorpus.load_data('train')
valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
test_x, test_y = SMP2018ECDTCorpus.load_data('test')
# # 也可以使用自己的数据集,格式如下:
# train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
# train_y = ['a', 'b']
valid_x, valid_y = train_x, train_y
test_x, test_y = train_x, train_y
from kashgari.embeddings import BERTEmbedding
#
bert_embed = BERTEmbedding('chinese_L-12_H-768_A-12',
task=kashgari.CLASSIFICATION,
sequence_length=100)
model = BiLSTM_Model(bert_embed)
model.fit(train_x, train_y, valid_x, valid_y)
chinese_L-12_H-768_A-12就是下载的中文bert模型目录,当前代码是处在0_YS_TEST目录下:
模型效果:
Epoch 1/5
30/30 [==============================] - 32s 1s/step - loss: 1.5468 - acc: 0.6119 - val_loss: 0.6305 - val_acc: 0.8628
Epoch 2/5
30/30 [==============================] - 26s 856ms/step - loss: 0.5488 - acc: 0.8767 - val_loss: 0.3349 - val_acc: 0.9335
Epoch 3/5
30/30 [==============================] - 26s 850ms/step - loss: 0.2987 - acc: 0.9389 - val_loss: 0.1907 - val_acc: 0.9670
Epoch 4/5
30/30 [==============================] - 26s 862ms/step - loss: 0.2027 - acc: 0.9607 - val_loss: 0.1159 - val_acc: 0.9841
Epoch 5/5
30/30 [==============================] - 26s 865ms/step - loss: 0.1377 - acc: 0.9761 - val_loss: 0.0758 - val_acc: 0.9947