直接在notebook上执行
!echo $PYTHONPATH
%cd /home/aistudio/Speech-Transformer/egs/aishell
!python /home/aistudio/Speech-Transformer/src/bin/train.py \
--train-json dump/train/deltafalse/data_simplify.json \
--valid-json dump/dev/deltafalse/data_simplify.json \
--dict data/lang_1char/train_chars.txt \
--LFR_m 1 --LFR_n 1 \
--d_input 80 --n_layers_enc 1 --n_head 2 --d_k 64 --d_v 64 \
--d_model 256 --d_inner 512 --dropout 0.1 --pe_maxlen 5000 \
--d_word_vec 256 --n_layers_dec 1 --tgt_emb_prj_weight_sharing 1 \
--label_smoothing 0.1 \
--epochs 60 --shuffle 1 \
--batch-size 64 --batch_frames 0 \
--maxlen-in 800 --maxlen-out 150 \
--num-workers 4 --k 0.2 --warmup_steps 4000 \
--save-folder exp/train_result \
--checkpoint 0 --continue-from "" \
--print-freq 10 --visdom 0 --visdom_lr 0 --visdom_epoch 0 --visdom-id "Transformer Training"
在终端执行
cd Speech-Transformer/egs/aishell/
source path.sh
python /home/aistudio/Speech-Transformer/src/bin/train.py --train-json dump/train/deltafalse/data_simplify.json --valid-json dump/dev/deltafalse/data_simplify.json --dict data/lang_1char/train_chars.txt --LFR_m 1 --LFR_n 1 --d_input 80 --n_layers_enc 1 --n_head 2 --d_k 64 --d_v 64 --d_model 256 --d_inner 512 --dropout 0.1 --pe_maxlen 5000 --d_word_vec 256 --n_layers_dec 1 --tgt_emb_prj_weight_sharing 1 --label_smoothing 0.1 --epochs 60 --shuffle 1 --batch-size 64 --batch_frames 0 --maxlen-in 800 --maxlen-out 150 --num-workers 4 --k 0.2 --warmup_steps 4000 --save-folder exp/train_result --checkpoint 0 --continue-from "" --print-freq 10 --visdom 0 --visdom_lr 0 --visdom_epoch 0 --visdom-id "Transformer Training"
安装调试工具ipdb (彩色显示,默认pdb无彩色显示)
pip install ipdb -i https://pypi.tuna.tsinghua.edu.cn/simple
设置断点使用方法:
import ipdb
...
ipdb.set_trace() # --> 插入断点行
...
进入ipdb调试状态
常用命令:
- n 执行下一行
- q 退出调试
- l 列出当前位于哪里
- p 打印变量值
- a 打印当前函数的参数
2021-03-31 17:28:33,902 - WARNING - DataLoader reader thread raised an exception.
Traceback (most recent call last):
File "/home/aistudio/Speech-Transformer/src/bin/train.py", line 181, in <module>
main(args)
File "/home/aistudio/Speech-Transformer/src/bin/train.py", line 175, in main
solver.train()
File "/home/aistudio/Speech-Transformer/src/solver/solver.py", line 80, in train
tr_avg_loss = self._run_one_epoch(epoch)
File "/home/aistudio/Speech-Transformer/src/solver/solver.py", line 176, in _run_one_epoch
for i, (data) in enumerate(data_loader):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 351, in __next__
return self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:158)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 346, in _thread_loop
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 317, in _thread_loop
batch = self._dataset_fetcher.fetch(indices)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 65, in fetch
data = self.collate_fn(data)
TypeError: 'tuple' object is not callable
由于源码基于pytorch架构,转换到paddle架构之后,需要替代些函数:
问题1 进行单次优化step()方法
self.optimizer.step()
AttributeError: 'Adam' object has no attribute 'step'
查看class paddle.optimizer. Adam
修改对应参数名称
paddle.optimizer.Adam(parameters=model.parameters(), beta1=0.9, beta2=0.98, epsilon=1e-09)
self.optimizer.clear_grad()
# self.optimizer.zero_grad()
修改item() --> numpy().item()
loss.numpy().item()
num_work = 0 和 tgt_emb_prj = 0 情况下已经可以成功执行训练
num_work > 0 仍报错