Sphinx-4是一个纯Java的语音识别库。它提供了利用CMUSphinx声学模型进行快速和简单的语音识别的API。除了语音识别,Sphinx-4还可以用于识别发言人,更新模型,以及根据时间戳对音频进行转录等。Sphinx-4不仅支持英语,还支持许多其他语言。
如何在项目中使用Sphinx-4
导入Sphinx-4库
如果使用Apache Maven 或者是Gradle,使用Sphinx-4非常简单。如果是Gradle,在build.gradle
中:
repositories {
mavenLocal()
maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
}
dependencies {
compile group: 'edu.cmu.sphinx', name: 'sphinx4-core', version:'5prealpha-SNAPSHOT'
compile group: 'edu.cmu.sphinx', name: 'sphinx4-data', version:'5prealpha-SNAPSHOT'
}
如果是使用Maven, 需要再pom.xml
中添加:
<project>
...
<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
...
</project>
然后添加sphinx4-core
。
<dependency>
<groupId>edu.cmu.sphinx</groupId>
<artifactId>sphinx4-core</artifactId>
<version>5prealpha-SNAPSHOT</version>
</dependency>
示例
导入了Sphinx-4的包之后,我们就可以在项目中使用Sphinx-4了,Sphinx官方提供了一下Demo可以用于参考,如下:
package com.example;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;
public class TranscriberDemo {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
InputStream stream = new FileInputStream(new File("test.wav"));
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n", result.getHypothesis());
}
recognizer.stopRecognition();
}
}
配置
我们主要看一下其中的配置,这里配置的Sphinx语音识别过程中必需的三个模型。如果是对英语进行识别,可以去官网上下载现成的模型,如果针对特定的文本进行分类,需要自己构件这三个模型,关于这三个模型的构件,Sphinx官方也给出了教程,我们以后会介绍到。
Configuration configuration = new Configuration();
// Set path to acoustic model.
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
// Set path to dictionary.
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
// Set language model.
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
数据源
1. LiveSpeechRecognizer
LiveSpeechRecognizer
使用当前设备上的麦克风作为语音的数据源。设置方式如下:
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(true);
SpeechResult result = recognizer.getResult();
// Pause recognition process. It can be resumed then with startRecognition(false).
recognizer.stopRecognition();
2. StreamSpeechRecognizer
StreamSpeechRecognizer
使用一个输入流作为语音的数据源,可以是一个文件,网络套接字或者一个现有的字节数组。设置方式如下:
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
recognizer.startRecognition(new FileInputStream("speech.wav"));
SpeechResult result = recognizer.getResult();
recognizer.stopRecognition();
3. SpeechAligner
  
用于将应用中的音频和文本同步
SpeechAligner aligner = new SpeechAligner(configuration);
recognizer.align(new URL("101-42.wav"), "one oh one four two");
4. SpeechResult
SpeechResult
提供了获取语音识别结果的方式,例如识别的问题,一系列单词的时序等。
// Print utterance string without filler words.
System.out.println(result.getHypothesis());
// Get individual words and their times.
for (WordResult r : result.getWords()) {
System.out.println(r);
}
// Save lattice in a graphviz format.
result.getLattice().dumpDot("lattice.dot", "lattice");
另外,需要注意的是,音频的编码必须是如下两种格式:
1. RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
2. RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Sphinx的解码器不支持其他格式的音频,如果音频的格式不是以上两种,语音识别可能获取不到任何结果。也就是说在解码之前我们需要将音频转换成以上两种格式的一种。例如:如果想用8000Hz的采样率来解码样本中的语音,我们可以调用如下方法:
configuration.setSampleRate(8000);
我们可以通过如下方式遍历到文件的结果:
while ((result = recognizer.getResult()) != null) {
System.out.println(result.getHypothesis());
}
sphinx4-samples中还提供了如下示例:
- Transcriber - demonstrates how to transcribe a file
- Dialog - demonstrates how to lead a dialog with a user
- SpeakerID - speaker identification
- Aligner - demonstration of audio to transcription timestamping