整理:公众号【深度学习每日摘要】
语音识别的研究历史已经有三十多年了,从上个世纪八十年代的隐马尔可夫模型,到二十一世纪初的帧级别的深度神经网络模型,到2006年的CTC模型,到2012年的深度循环神经网络模型,再到2014年的注意力机制运用到语音识别,2015年基于seq2seq模型的语音识别系统也被提出,再到2016年深度卷积神经网络被用于大规模的语音识别系统。语音识别系统从最初的手动提取特征到如今的端对端的神经网络模型,准确率已经接近97%。
本文列举了自从1982年至今语音识别领域的相关论文,涵盖了以上所有的模型,同时附上第一作者信息以及pdf文件下载链接。
论文清单已经按照发表年份以及首字母排序,完整论文清单以及下载链接请访问:
https://github.com/zzw922cn/awesome-speech-recognition-papers
An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition(1982), S. E. LEVINSON et al. [pdf]
A Maximum Likelihood Approach to Continuous Speech Recognition(1983), LALIT R. BAHL et al. [pdf]
Heterogeneous Acoustic Measurements and Multiple Classifiers for Speech Recognition(1986), Andrew K. Halberstadt. [pdf]
Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition(1986), Lalit R. Bahi et al. [pdf]
Hidden Markov Models for Speech Recognition(1991), B. H. Juang et al. [pdf]
Framewise phoneme classification with bidirectional LSTM and other neural network architectures(2005), Alex Graves et al. [pdf]
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al. [pdf]
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. [pdf]
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. [pdf]
Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. [pdf]
Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. [pdf]
Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
Improving deep neural networks for LVCSR using rectified linear units and dropout(2013), George E. Dahl et al. [pdf]
Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training(2013), Yajie Miao et al. [pdf]
Improvements to deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. [pdf]
Machine Learning Paradigms for Speech Recognition: An Overview(2013), Li Deng et al. [pdf]
Recent advances in deep learning for speech research at Microsoft(2013), Li Deng et al. [pdf]
Speech recognition with deep recurrent neural networks(2013), Alex Graves et al. [pdf]
Convolutional deep maxout networks for phone recognition(2014), László Tóth et al. [pdf]
Convolutional Neural Networks for Speech Recognition(2014), Ossama Abdel-Hamid et al. [pdf]
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition(2014), László Tóth. [pdf]
Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. [pdf]
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. [pdf]
First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. [pdf]
Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. [pdf]
Robust CNN-based speech recognition with Gabor filter kernels(2014), Shuo-Yiin Chang et al. [pdf]
Stochastic pooling maxout networks for low-resource speech recognition(2014), Meng Cai et al. [pdf]
Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. [pdf]
Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. [pdf]
Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. [pdf]
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks(2015), Tara N. Sainath et al. [pdf]
Deep convolutional neural networks for acoustic modeling in low resource languages(2015), William Chan et al. [pdf]
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition(2015), Chao Weng et al. [pdf]
Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al. [pdf]
Listen, Attend and Spell(2015), William Chan et al. [pdf]
Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. [pdf]
Advances in All-Neural Speech Recognition(2016), Geoffrey Zweig et al. [pdf]
Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. [pdf]
End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. [pdf]
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. [pdf]
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. [pdf]
End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. [pdf]
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. [pdf]
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. [pdf]
Latent Sequence Decompositions(2016), William Chan et al. [pdf]
Segmental Recurrent Neural Networks for End-to-End Speech Recognition(2016), Liang Lu et al. [pdf]
Towards better decoding and language model integration in sequence to sequence models(2016), Jan Chorowski et al. [pdf]
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. [pdf]
Very Deep Convolutional Networks for End-to-End Speech Recognition(2016), Yu Zhang et al. [pdf]
Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. [pdf]
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. [pdf]
WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. [pdf]
An enhanced automatic speech recognition system for Arabic(2017), Mohamed Amine Menacer et al. [pdf]
A network of deep neural networks for distant speech recognition(2017), Mirco Ravanelli et al. [pdf]
An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems(2017), Hany Ahmed et al. [pdf]
Building DNN acoustic models for large vocabulary speech recognition(2017), Andrew L. Maas et al. [pdf]
Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. [pdf]
English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. [pdf]
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA(2017), Song Han et al. [pdf]
Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. [pdf]
Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. [pdf]
Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al. [pdf]
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. [pdf]
Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al. [pdf]