音频录制和Speech语音识别(ios10)

之前做的项目中有语音识别的需求, 是用讯飞语音SDK实现的。有很多服务商提供声音转文字的服务，有收费的有免费。
2016年苹果在发布的上iOS10增加了一个语音识别功能，并且将其封装到了SpeechFramework库中。苹果手机上Siri的语音识别就是基于Speech实现的。
下面就开始动手敲代码实践录音和Speech的用法吧

1. 引入Speech 和AVFoundation

#import <AVFoundation/AVFoundation.h>  // 录音
#import <Speech/Speech.h>    // 语音识别

2. 申请用户权限

想要使用录音和语音识别功能必须先配置info.plist文件，在其中增加麦克风和语音识别授权属性

<key>NSMicrophoneUsageDescription</key>
<string>App需要您的同意,才能使用麦克风</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>App需要您的同意,才能使用语音识别</string>

加完之后是这样的

3. 录音

激活AVAudioSession

_session = [AVAudioSession sharedInstance];
NSError *categoryError = nil;
//设置为播放和录音状态，以便可以在录制完之后播放录音
[_session setCategory:AVAudioSessionCategoryPlayAndRecord error:&categoryError];
if (_session) {
    [_session setActive:YES error:nil];  // 此处手动激活 
}
else {
    NSLog(@"Error creating session: %@",[categoryError description]);
}

创建录音器, 设置代理(可以监听录制的状态)

- (void)createAudioRecorder
{
    // 实例化录音器对象
    NSError *errorRecord = nil;
    _recorder = [[AVAudioRecorder alloc] initWithURL:[NSURL fileURLWithPath:_filePath] settings:[self getAudioSetting] error:&errorRecord];
    _recorder.delegate = self;
    _recorder.meteringEnabled = YES; //如果要监控声波则必须设置为YES
    
    // 准备录音
    [_recorder prepareToRecord];
}

实例化录音对象, 要设置音频的编码参数

- (NSDictionary *)getAudioSetting
{
    //录音设置
    NSMutableDictionary *recordSettings = [[NSMutableDictionary alloc] init];
    
    //音频质量,采样质量
    [recordSettings setValue:[NSNumber numberWithInt:AVAudioQualityMax] forKey:AVEncoderAudioQualityKey];
    
    //通道数 编码时每个通道的比特率
    [recordSettings setValue:[NSNumber numberWithInt:2] forKey: AVNumberOfChannelsKey];
    
    //录音格式 无法使用
    //    [recordSettings setValue:[NSNumber numberWithInt:kAudioFormatLinearPCM] forKey: AVFormatIDKey];
    //LinearPCM 是iOS的一种无损编码格式,但是体积较为庞大
    
    //采样率
    [recordSettings setValue:[NSNumber numberWithFloat:44100.0] forKey: AVSampleRateKey];//44100.0
    //线性采样位数
    [recordSettings setValue:[NSNumber numberWithInt:32] forKey: AVLinearPCMBitDepthKey];
    
    // 编码时的比特率，是每秒传送的比特(bit)数单位为bps(Bit Per Second)，比特率越高传送数据速度越快值是一个整数
    [recordSettings setValue:[NSNumber numberWithInt:128000] forKey:AVEncoderBitRateKey];
    
    return recordSettings;
}

开始录制

- (void)recorderSoundStart:(NSString *)path
{
    // 停止播放
    [self stopPlayRecorderSound];
    
    // 停止之前的录音
    if ([_recorder isRecording]) {
        [_recorder stop];
    }
    
    // 删除旧的录音文件
    [APPUtil deleteFile:path];
    // 不删除也可以, 同一会路径下会被覆盖
    
    if (!_recorder) {
        
        // 实例化录音对象
        [self createAudioRecorder];
    }
    
    if (![_recorder isRecording]){
        
        [_recorder record];
        
        // 设定 录制 最长时间 60s
        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(60 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
            
            [self recorderSoundEnd];
        });
    }
}

由于设置了最长录制时间60s, 所以要加上下面录制完成代理代码

#pragma mark - AVAudioRecorderDelegate 录音机代理方法

- (void)audioRecorderDidFinishRecording:(AVAudioRecorder *)recorder successfully:(BOOL)flag
{
    NSLog(@"录音完成!");
    
    [self recorderSoundEnd];
}

停止录制 , 在停止录制之前, 可能会有暂停录制,此处略过

- (void)recorderSoundEnd
{
    // 停止录音
    if ([_recorder isRecording]) {
        [_recorder stop];
//        [_recorder pause]; // 暂停录制
    }
    
    // 更新UI按钮
    _recodeSound.selected = NO;
}

开始播放及停止播放

// 播放
- (void)recorderSoundPlay:(NSString *)path
{
    // 先停止录音
    if (_recorder) {
        
        [_recorder stop];
    }
    
    if (!_player) {
        // 创建播放器
        [self createAudioPlayer];
    }
    
    [_session setCategory:AVAudioSessionCategoryPlayback error:nil];
    // 播放
    [_player play];
}
// 停止
- (void)stopPlayRecorderSound
{
    if ([_player isPlaying]) {
        [_player stop];
    }
    
    // 更新UI播放按钮
    _playSound.selected = NO;
}

4. 语音识别

语音识别是iOS10增加的新特性，Xcode8之前的版本没有SpeechFramework库，所以此功能只能在Xcode8以上和iOS10以上运行。

请求语音识别权限

// 请求语音识别权限
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus status) {
    NSLog(@"status %@", status == SFSpeechRecognizerAuthorizationStatusAuthorized ? @"语音识别权限授权成功" : @"语音识别权限授权失败"); }];

所谓语音识别, 就是用户说话然后马上把用户说的话转成文字显示！这才是理想中的语音识别, 当然也是支持识别一个本地音频文件

打开Speech kit库,找到语音识别请求文件 SFSpeechRecognitionRequest.h, 发现识别请求的API有两种 SFSpeechAudioBufferRecognitionRequest 和 SFSpeechURLRecognitionRequest 都继承于SFSpeechRecognitionRequest类

SFSpeechAudioBufferRecognitionRequest 实时识别音频流也就是现说现译
SFSpeechURLRecognitionRequest 识别路径URL的音频文件

既然语音识别配置工作上面都已经做好了, 下面就看下这两种识别请求吧 ()

5. 语音识别--现说现译

激活AVAudioSession

_session = [AVAudioSession sharedInstance];
[_session setCategory:AVAudioSessionCategoryRecord mode:AVAudioSessionModeMeasurement options:AVAudioSessionCategoryOptionDuckOthers error:nil];
[_session setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];

AVAudioSessionCategoryRecord : 设置录音(现说现译, 要录制说的话)
AVAudioSessionModeMeasurement:减少系统提供信号对应用程序输入和/或输出音频信号的影响
AVAudioSessionCategoryOptionDuckOthers: 在实时通话的场景，降低别的声音。比如QQ音乐，当进行视频通话的时候，会发现QQ音乐自动声音降低了，此时就是通过设置这个选项来对其他音乐App进行了压制
AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation:判断当前是否有其他App在播放音频

多媒体引擎的建立

- (void)createAudioEngine
{
    if (!_speechRecognizer) {
        // 设置语言
        NSLocale *locale = [NSLocale localeWithLocaleIdentifier:@"zh-CN"];
        _speechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
    }
    // 初始化引擎
    if (!_audioEngine) {
        _audioEngine = [[AVAudioEngine alloc] init];
    }
}

创建语音识别请求, 创建并开启语音识别任务

//  创建语音识别请求
- (void)createSpeechRequest
{
    if (_speechRequest) {
        [_speechRequest endAudio];
        _speechRequest = nil;
    }
    
    _speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    _speechRequest.shouldReportPartialResults = YES; // 实时翻译
    
    __weak typeof(self) weakSelf = self;
    
    // 建立语音识别任务, 并启动.  block内为语音识别结果回调
    [_speechRecognizer recognitionTaskWithRequest:_speechRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
        
         // 语音识别结果回调
        __strong typeof(weakSelf) strongSelf = weakSelf;
        
        if (error) {
            NSLog(@"语音识别解析失败,%@",error);
        }
        else {
            // 识别的内容
            NSString *text = result.bestTranscription.formattedString;
            
            // 实时打印说话的内容
            NSLog(@"is final: %d  result: %@", result.isFinal, result.bestTranscription.formattedString);
            
            if (result.isFinal) { // 结束时 显示内容
                
                // 显示说话的内容
                strongSelf.content.text = text;
                
                // 多次说话的内容拼接到一起显示
//                strongSelf.content.text = [NSString stringWithFormat:@"%@%@", strongSelf.content.text, text];
            }
        }
    }];
}

语音识别任务及回调结果有2中实现方法, 一种是代理, 一种block, 此处选择了block

// Recognize speech utterance with a request
// If request.shouldReportPartialResults is true, result handler will be called
// repeatedly with partial results, then finally with a final result or an error.
- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request
                                          resultHandler:(void (^)(SFSpeechRecognitionResult * __nullable result, NSError * __nullable error))resultHandler;

// Advanced API: Recognize a custom request with with a delegate
// The delegate will be weakly referenced by the returned task
- (SFSpeechRecognitionTask *)recognitionTaskWithRequest:(SFSpeechRecognitionRequest *)request
                                               delegate:(id <SFSpeechRecognitionTaskDelegate>)delegate;

开始语音识别

- (IBAction)stardRecorder:(UIButton *)sender
{
    // 开始录音前清空显示的内容, 如果需要拼接多次录音的内容,不要清空,
    _content.text = @"";
    
    // 创建新的语音识别请求
    [self createSpeechRequest];
    
    __weak typeof(self) weakSelf = self;
    
    // 录音格式配置 -- 监听输出流 并拼接流文件
    AVAudioFormat *recordingFormat = [[_audioEngine inputNode] outputFormatForBus:0];
    // 创建一个Tap,(创建前要先删除旧的)
    // 文档注释: Create a "tap" to record/monitor/observe the output of the node.
    [[_audioEngine inputNode] installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
     
        __strong typeof(weakSelf) strongSelf = weakSelf;
        // 拼接流文件
        [strongSelf.speechRequest appendAudioPCMBuffer:buffer];
    }];
    
    // 准备并启动引擎
    [_audioEngine prepare];
    
    NSError *error = nil;
    if (![_audioEngine startAndReturnError:&error]) {
        NSLog(@"%@",error.userInfo);
    };
    
    [sender setTitle:@"语音识别中..." forState:UIControlStateNormal];
}

重置多媒体引擎

- (void)releaseEngine
{
    // 销毁tap
    [[_audioEngine inputNode] removeTapOnBus:0];
    
    [_audioEngine stop];
    
    [_speechRequest endAudio];
    _speechRequest = nil;
}

到这里现说现译-语音识别就完成了。

6. 语音识别--本地音频文件

这个需要个音频文件, 上面做的录音功能, 就可以录制语音caf文件, 那就直接在录音功能基础上, 加个语音识别吧。这样就集录制、播放、语音识别于一体了。
直接看代码

- (IBAction)speechSound:(UIButton *)sender
{
    // 识别的录音文件是否存在
    NSFileManager* manager = [NSFileManager defaultManager];
    if (![manager fileExistsAtPath:_filePath]){
        
        NSLog(@"音频文件不存在");
        return ;
    }
    
    _speechContent.text = @"";
   
    //转化过后的MP3文件位置
//    NSString *mp3Path = [NSString stringWithFormat:@"%@/%@", [APPUtil speechPath], @"lame.mp3"];
//    [APPUtil lameCafToMp3:_filePath mp3:mp3Path];
//    [self speechSoundRecord:mp3Path]; // 语音识别失败
    
    // 不转成mp3也可以 识别成功
    [self speechSoundRecord:_filePath]; // 能识别成功
}

- (void)speechSoundRecord:(NSString *)path
{
    // 设置语言中文
    NSLocale *local = [[NSLocale alloc] initWithLocaleIdentifier:@"zh_CN"];
    SFSpeechRecognizer *localRecognizer =[[SFSpeechRecognizer alloc] initWithLocale:local];
    
    NSURL *url = [NSURL fileURLWithPath:path];
    if (!url) return;

    SFSpeechURLRecognitionRequest *res =[[SFSpeechURLRecognitionRequest alloc] initWithURL:url];

    __weak typeof(self) weakSelf = self;
    
    [localRecognizer recognitionTaskWithRequest:res resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {

         __strong typeof(weakSelf) strongSelf = weakSelf;

        if (error) {
            NSLog(@"语音识别解析失败,%@",error);
        }
        else {
            // 显示 识别的内容
            NSString *text = result.bestTranscription.formattedString;

            strongSelf.speechContent.text = text;
        }
    }];
}

历时一天半终于写完了实现录音和语音识别功能的代码 , 现在运行看下效果吧,

直接看图:

录音.png

语音识别.png

可以看到, 录音和语音识别功能, 除了识别内容会有些错别字, 效果还是挺完美的。

本文Demo地址: 音频录制和Speech语音识别

最后编辑于：2018.09.05 15:07:33

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 222,000评论 6赞 515
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 94,745评论 3赞 399
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 168,561评论 0赞 360
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 59,782评论 1赞 298
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,798评论 6赞 397
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 52,394评论 1赞 310
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,952评论 3赞 421
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,852评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 46,409评论 1赞 318
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,483评论 3赞 341
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,615评论 1赞 352
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 36,303评论 5赞 350
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,979评论 3赞 334
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,470评论 0赞 24
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,571评论 1赞 272
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 49,041评论 3赞 377
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,630评论 2赞 359