kaldi的gpu配置
CUDA will not be used! If you have already installed cuda drivers and cuda toolkit, try using --cudatk-dir=... option. Note: this is only relevant for neural net experiments
解析:如果已经安装cuda,按照提示做即可。
./configure --cudatk-dir=CUDA toolkit所在目录 --shared
cmd.sh设置
queue.pl: Error submitting jobs to queue (return status was 256)
解析:kaldi默认设置是集群跑
1.本地跑:把cmd.sh中所有queue.pl改为run.pl
2.集群跑:需要正确设置机器的名称
环境配置问题
utils/prepare_lang.sh: line 502: fstaddselfloops: command not found
ERROR: FstHeader::Read: Bad FST header: standard input
解析:如果kaldi编译过程中没有出现问题,那就是openfst的路径没有添加到egs/s5/path.sh中。添加即可。
IRSTLM
INSTALLATION of IRSLTM finished successfully.
please source the tools/extras/env.sh in your path.sh to enable it.
解析:IRSTLM是做语言模型用的。同样因为IRSTLM是手动下载,需要将tools/extras/env.sh里的内容拷贝到egs/s5/path.sh下面。
chain-tdnn报错
解析:如果跑原始的run.sh也报错的话num-jobs-initial和num-jobs-final可以设置小一些,不能超过集群GPU数目
参考链接:tdnn-chain训练出错
cuda问题
解析:在训练神经网络的时候,出现了报错,在提示的log日志中找到原因是因为gpu的问题。日志中错误情况查看:找ERROR
迭代中报错
解析 :多半是机器的问题。
use-gpu=wait
train-stage可以改为从报错的迭代次数开始
copy-feats
-bash: copy-feats: command not found
解析:kaldi的配置有问题,把相关路径source到s5/path.sh下。
数据准备问题
steps/make_mfcc_pitch.sh --pitch-config conf/pitch.conf --cmd queue.pl --mem 2G --nj 20 data/train exp/make_mfcc/train mfcc
utils/validate_text.pl: The line for utterance IC0007W0001 contains CR (0x0D) character
utils/validate_text.pl: ERROR: text file 'data/train/text' contains disallowed UTF-8 whitespace character(s)
解析:text里文件中包含不合法的空格(全角和半角)
align报错
queue.pl: 1 / 20 failed, log is in exp_new/mono/log/align.1.*.log
wc -l ./*查看每个log的长度
然后选择最小的打开,查看ERROR/error
You provided the "cs" option but are not calling with keys in sorted order
解析
顺序不对,在对应的.scp下输入——:sort
mono align的过程中没有报错但是暂停了
解析
可以尝试更改stage继续运行
内存不够
(nnet3-chain-train[5.5.123~2-d5bd]:AllocateNewRegion():cu-allocator.cc:519) Failed to allocate a memory region of 1992294400 bytes. Possibly this is due to sharing the GPU. Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py. Memory info: free:3798M, used:239M, total:4037M, free/total:0.940803