利用sphinx4实现中文命令词识别

CMU Sphinx

CMU Sphinx是一个领先的语音识别工具包，具有用于构建语音应用程序的各种工具。 CMU Sphinx包含许多用于不同任务和应用程序的开发包。有时候，选择什么是令人困惑的。下面来介绍一下每个开发包的用途：

Pocketsphinx — lightweight recognizer library written in C (C语言开发的轻量级语音识别引擎)
Sphinxtrain — acoustic model training tools (声学模型训练工具)
Sphinxbase — support library required by Pocketsphinx and Sphinxtrain （Pocketsphinx和Sphinxtrain的基础类库）
Sphinx4 — adjustable, modifiable recognizer written in Java (Java语言开发的可调节、可修改的语音识别引擎)

Sphinx-4

Sphinx-4是一个纯Java的语音识别库。它提供了利用CMUSphinx声学模型进行快速和简单的语音识别的API。除了语音识别，Sphinx-4还可以用于识别发言人，更新模型，以及根据时间戳对音频进行转录等。Sphinx-4不仅支持英语，还支持许多其他语言。

如何在Maven项目中使用Sphinx-4

导入Sphinx-4库

需要在pom.xml中添加：

<project>
...
  <repositories>
  <repository>
        <id>snapshots-repo</id>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
          <enabled>false</enabled>
        </releases>
        <snapshots>
          <enabled>true</enabled>
        </snapshots>
  </repository>
  </repositories>
...
</project>

然后添加sphinx4-core。

<dependency>
  <groupId>edu.cmu.sphinx</groupId>
  <artifactId>sphinx4-core</artifactId>
  <version>5prealpha-SNAPSHOT</version>
</dependency>

示例

导入了Sphinx-4的包之后，我们就可以在项目中使用Sphinx-4了，Sphinx官方提供了一下Demo可以用于参考，如下：

package com.example;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;

public class TranscriberDemo {       

    public static void main(String[] args) throws Exception {

        Configuration configuration = new Configuration();

        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
        configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

    StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
    InputStream stream = new FileInputStream(new File("test.wav"));

        recognizer.startRecognition(stream);
    SpeechResult result;
        while ((result = recognizer.getResult()) != null) {
        System.out.format("Hypothesis: %s\n", result.getHypothesis());
    }
    recognizer.stopRecognition();
    }
}

如果想使用当前设备上的麦克风作为语音的数据源。设置方式如下：

LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);

// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();

// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();

中文语音模型

上面的demo中只能对英文进行识别，网上各种资料里提供的现成的中文声学模型都是在Pocketsphinx使用的半连续的声学模型，而sphinx4的声学模型是连续的，所以如果直接从官网下载中文声学模型到sphinx4中使用是跑不起来的。
通过我与demo中英文声学模型配置的比对，尝试修改了各种配置文件，终于能够在sphinx4 中跑起来一个中文声学模式。
下载地址：https://download.csdn.net/download/jason_yuan123/10943161
下载后可以直接使用，另外中文语言模型和字典，可以通过在线生成语言模型和词典的工具 http://www.speech.cs.cmu.edu/tools/lmtool-new.html进行生成，具体的我就不介绍了，需要注意编码的问题，不然会识别不出来，或者识别出来乱码。