iOS直播推流实现-音视频编码

将最近学习的直播推流技术做个笔记。
iOS推流的主要流程如下：

采集拿到每一帧的数据CMSampleBufferRef，接下来就是对CMSampleBufferRef进行编码，编码分为软编码和硬编码，软编码使用ffmpeg进行编码，属于CPU编码，效率没有硬编码高，硬编码使用GPU编码，苹果使用AudioToolbox和VideoToobox完成硬编码，下面只介绍硬编码的过程。

视频编码

简单来说使用VTCompressionSessionEncodeFrame函数进行编码，输入原始帧数据，输出编码后的数据，但是在调用VTCompressionSessionEncodeFrame函数前需要进行相关对象初始化，准备编码数据，设置编码参数。具体来说，需要先初始化VTCompressionSessionRef对象，该对象是一个指针。初始化函数如下：

VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH2641, (__bridge void*)self, &compressSession);

didCompressH2641时候编码回调函数，表示每一次编码完成后会回调这个函数。
这里的VTCompressionSessionRef compressSession就是初始化的对象。然后对这个对象设置相关参数，如输入的帧数据的帧率，码率，GOP，视频压缩算法（H.264）等，这样VideoToobox才知道怎么编码，最后调用
VTCompressionSessionPrepareToEncodeFrames(compressSession); 表示设置结束。下面显示一段例子：

/// 这个在当前oc对象初始化时调用，可以定义为一个编码器。
- (void) setupCompressionSession {
    aQuene = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
    // 1. 第几帧数据
    _frameID = 0;
    
    // 2. 视频宽高
    int width = _configuration.width, height = _configuration.height;
 
    // 3.创建CompressionSession对象,该对象用于对画面进行编码
    // kCMVideoCodecType_H264 : 表示使用h.264进行编码
    // didCompressH264 : 当一次编码结束会在该函数进行回调,可以在该函数中将数据,写入文件中
    VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH2641, (__bridge void*)self, &compressSession);
    // 4.设置实时编码输出（直播必然是实时输出,否则会有延迟）
    VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
    // 5.设置期望帧率(每秒多少帧,如果帧率过低,会造成画面卡顿)
    int fps = _configuration.fps;
    CFNumberRef  fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps);
    VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef);
    
    // 6.设置码率(码率: 编码效率, 码率越高,则画面越清晰, 如果码率较低会引起马赛克 --> 码率高有利于还原原始画面,但是也不利于传输)
    int bitRate = _configuration.bitRate;
    CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate);
    VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef);
    NSArray *limit = @[@(bitRate * 1.5/8), @(1)];
    VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_DataRateLimits, (__bridge CFArrayRef)limit);
    // 7.设置关键帧（GOPsize)间隔
    int frameInterval = _configuration.keyframeInterval;
    CFNumberRef  frameIntervalRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval);
    VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRef);
    // 8.基本设置结束, 准备进行编码
    VTCompressionSessionPrepareToEncodeFrames(compressSession);

}

准备工作结束后，再调用编码函数VTCompressionSessionEncodeFrame，先转换原始数据CMSampleBufferRef为CVImageBufferRef，这个封装了每一帧的数据。
设置时间戳，主要为了保证正确的推流和拉流播放顺序。
具体代码如下：

// 每次把当前时间传进来
- (void)encode:(CMSampleBufferRef)sampleBuffer timeStamp:(uint64_t)timestamp
{
    // 放在串行队列，保证编码顺序，
    dispatch_sync(aQuene, ^{
        _frameID++;
        // Get the CV Image buffer
        CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);
        
        // Create properties
        CMTime presentationTimeStamp = CMTimeMake(_frameID, 1000);
        //CMTime duration = CMTimeMake(1, DURATION);
        VTEncodeInfoFlags flags;
        NSDictionary *properties = nil;
       // 设置当前帧是否编码为关键帧，每一轮GOP后设置一次关键帧
        if (_frameID % (int32_t)_configuration.keyframeInterval == 0) {
            properties = @{(__bridge NSString *)kVTEncodeFrameOptionKey_ForceKeyFrame: @YES};
        }
        NSNumber *timeNumber = @(timestamp);
        // Pass it to the encoder
        OSStatus statusCode = VTCompressionSessionEncodeFrame(compressSession,
                                                              imageBuffer,
                                                              presentationTimeStamp,
                                                              kCMTimeInvalid,
                                                              (__bridge CFDictionaryRef)properties, (__bridge void *)timeNumber, &flags);
        // Check for error
        if (statusCode != noErr) {
            NSLog(@"H264: VTCompressionSessionEncodeFrame failed with %d", (int)statusCode);
            return;
        }
        NSLog(@"H264: VTCompressionSessionEncodeFrame Success");
    });
}

编码完成后，会回调对应的函数，即上面说到的didCompressH2641，注意苹果的编码并不是一帧一帧的编码，来一帧数据就编码一帧，为保证编码效率，同时由于IPB各帧相互直接的关联，它可能会累加到一定数量再进行编码。
拿到编码后的数据后，先判断是否是关键帧，如果是关键帧需要在关键帧前面插入sps & pps数据，而sps和pps数据是从编码后的数据取出来的。如果想写入文件需要先写入NALU的header。
例如我们写入pps和sps数据时可以：

- (void)gotSpsPps:(NSData*)sps pps:(NSData*)pps
{
    // 1.拼接NALU的header
    const char bytes[] = "\x00\x00\x00\x01";
    size_t length = (sizeof bytes) - 1;
    NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];

     2.将NALU的头&NALU的体写入文件
    // 拿到文件句柄fileHandle
    [self.fileHandle writeData:ByteHeader];
    [self.fileHandle writeData:sps];
    [self.fileHandle writeData:ByteHeader];
    [self.fileHandle writeData:pps];
}

写入IBP帧：

- (void)gotEncodedData:(NSData*)data isKeyFrame:(BOOL)isKeyFrame
{
    if (self.fileHandle != NULL)
    {
        const char bytes[] = "\x00\x00\x00\x01";
        size_t length = (sizeof bytes) - 1; //string literals have implicit trailing '\0'
        NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];
        [self.fileHandle writeData:ByteHeader];
        [self.fileHandle writeData:data];
    }
}

如果只是推流，可以不进行上面两步。下面是编码完成的回调，代码如下：

// 编码完成回调 sampleBuffer是编码后的数据
void didCompressH2641(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) {
    // 1.判断状态是否等于没有错误
    if (status != noErr) {
        return;
    }

    // 2.根据传入的参数获取对象, 一般是当前对象
    VideoHWEncoder* encoder = (__bridge VideoHWEncoder*)outputCallbackRefCon;
    uint64_t timeStamp = [((__bridge_transfer NSNumber *)sourceFrameRefCon) longLongValue];

    // 3.判断是否是关键帧
    bool isKeyframe = !CFDictionaryContainsKey( (CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true), 0)), kCMSampleAttachmentKey_NotSync);
    // 判断当前帧是否为关键帧
    // 获取sps & pps数据
    if (isKeyframe && !encoder->sps)
    {
        // 获取编码后的信息（存储于CMFormatDescriptionRef中）
        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);

        // 获取SPS信息
        size_t sparameterSetSize, sparameterSetCount;
        const uint8_t *sparameterSet;
        CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0 );

        // 获取PPS信息
        size_t pparameterSetSize, pparameterSetCount;
        const uint8_t *pparameterSet;
        CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0 );

        // 装sps/pps转成NSData，以方便写入文件
        NSData *sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];
        NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];
        encoder->sps=sps;
        encoder->pps=pps;
        // 写入文件
        [encoder gotSpsPps:sps pps:pps];
    }

    // 获取数据块
    CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t length, totalLength;
    char *dataPointer;
    OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);
    if (statusCodeRet == noErr) {
        size_t bufferOffset = 0;
        static const int AVCCHeaderLength = 4; // 返回的nalu数据前四个字节不是0001的startcode，而是大端模式的帧长度length

        // 循环获取nalu数据，因为有可能存在多个nalu数据
        while (bufferOffset < totalLength - AVCCHeaderLength) {
            uint32_t NALUnitLength = 0;
            // Read the NAL unit length
            memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);

            // 从大端转系统端
            NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);

            NSData* data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];
            [encoder gotEncodedData:data isKeyFrame:isKeyframe];
             // 把数据保存到LFVideoFrame *frame中，供后续推流使用
            LFVideoFrame *frame = [LFVideoFrame new];
            // 是否是关键帧
            frame.isKeyFrame = isKeyframe;
            // pps数据
            frame.pps = encoder->pps;
            // sps数据
            frame.sps = encoder->sps;
            // I或P或B帧NAL数据
            frame.data = data;
            // 当前时间戳
            frame.timestamp = timeStamp;
            // 回调给代理对象，一般是VC控制器，控制器进行下一步操作即推流。
            if (encoder.delegate) {
                [encoder.delegate encodedVideo:encoder videoFrame:frame];
            }
            // 移动到写一个块，转成NALU单元
            // Move to the next NAL unit in the block buffer
            bufferOffset += AVCCHeaderLength + NALUnitLength;
        }
    }
}

好了，至此视频编码就完成了，接下来是音频编码。

音频编码

上面视频编码代码封装到一个VideoHWEncoder类中，同样音频编码也封装到另一个类中，这里取名AudioHWEncoder，跟视频编码比较类似，要先设置编码参数，初始化相关对象，然后调用编码函数，在回调函数中处理编码后的数据。
设置编码参数声道数，音频采样率，编码后的格式（AAC），每采样点占用位数，代码如下：

// 配置编码参数
- (void)setupEncoderFromSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
    NSLog(@"开始配置编码参数。。。。");
    // 获取原音频声音格式设置
    AudioStreamBasicDescription inAudioStreamBasicDescription = *CMAudioFormatDescriptionGetStreamBasicDescription((CMAudioFormatDescriptionRef)CMSampleBufferGetFormatDescription(sampleBuffer));
    AudioStreamBasicDescription outAudioStreamBasicDescription = {0};
    // 下面设置输出即编码后的音频参数
    // 采样率
    outAudioStreamBasicDescription.mSampleRate = inAudioStreamBasicDescription.mSampleRate;
    
    sampleRate = (NSInteger)inAudioStreamBasicDescription.mSampleRate;
    channelsCount = (NSInteger)inAudioStreamBasicDescription.mChannelsPerFrame;
    // 格式  kAudioFormatMPEG4AAC  = 'aac' ,
    outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC;
    // 标签格式 无损编码
    outAudioStreamBasicDescription.mFormatFlags = kMPEG4Object_AAC_LC;
    // 每个Packet 的 Bytes 数量 0:动态大小格
    outAudioStreamBasicDescription.mBytesPerPacket = 0;
    // 每个Packet的帧数量，设置一个较大的固定值 1024
    outAudioStreamBasicDescription.mFramesPerPacket = 1024;
    // 每帧的Bytes数量
    outAudioStreamBasicDescription.mBytesPerFrame = 0;
    // 1 单声道 2: 立体声
    outAudioStreamBasicDescription.mChannelsPerFrame = 1;
    // 语言每采样点占用位数
    outAudioStreamBasicDescription.mBitsPerChannel = 0;
    // 保留参数（对齐当时）
    outAudioStreamBasicDescription.mReserved = 0;
   
    // 获取编码器
    //AudioClassDescription * description = [self getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC fromManufacturer:kAppleSoftwareAudioCodecManufacturer];
   
    // 创建编码器
   /*
    inAudioStreamBasicDescription 传入源音频格式
    outAudioStreamBasicDescription 目标音频格式
    第三个参数：传入音频编码器的个数
    description 传入音频编码器的描述，不指定描述则使用系统默认的
    */
   //OSStatus status = AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, description, &_audioConverter);
   OSStatus status = AudioConverterNew(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, &_audioConverter);
   UInt32 value = 0;
   UInt32 size = sizeof(value);
   AudioConverterGetProperty(_audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &value);
    // 初始化编码后数据buffer
   _aacBufferSize = value;
   _aacBuffer = malloc(value);
   if (status != 0) {
       NSLog(@"创建编码器失败");
   }
    
}

之后可以开始编码：

// 编码数据
- (void)encode:(CMSampleBufferRef)sampleBuffer timeStamp:(uint64_t)timeStamp{
    
    CFRetain(sampleBuffer);
    dispatch_sync(_encoderQueue, ^{
        if (!self.audioConverter) {
            // 配置编码参数
            [self setupEncoderFromSampleBuffer:sampleBuffer];
        }
        
        // 获取CMBlockBufferRef
        CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
        CFRetain(blockBuffer);
        
        // 获取_pcmBufferSize 和 _pcmBuffer
        OSStatus status = CMBlockBufferGetDataPointer(blockBuffer, 0, NULL, &self->_pcmBufferSize, &self->_pcmBuffer);
        if (status != kCMBlockBufferNoErr) {
            NSLog(@"获取 pcmBuffer 数据错误");
            return ;
        }
        // 清空
        memset(self->_aacBuffer, 0, self->_aacBufferSize);
        
        // 初始化缓冲列表
        AudioBufferList outAudioBufferList = {0}; // 结构体
        // 缓冲区个数
        outAudioBufferList.mNumberBuffers = 1;
        // 渠道个数
        outAudioBufferList.mBuffers[0].mNumberChannels = 1;
        // 缓存区大小
        outAudioBufferList.mBuffers[0].mDataByteSize = (int)self->_aacBufferSize;
        // 缓冲区内容
        outAudioBufferList.mBuffers[0].mData = self->_aacBuffer;
        
        // 编码
        AudioStreamPacketDescription * outPD = NULL;
        UInt32 inPutSize = 1;
        /*
         inInputDataProc 自己实现的编码数据的callback引用
         self 获取的数据
         inPutSize 输出数据的长度
         outAudioBUfferList 输出的数据
         outPD  输出数据的描述
         */
        status = AudioConverterFillComplexBuffer(self->_audioConverter,
                                                 inInputDataProc,
                                                 (__bridge void*)self,
                                                 &inPutSize,
                                                 &outAudioBufferList,
                                                 outPD
                                                 );
        
        // 编码后完成
        NSData * data = nil;
        if (status == noErr) {
            // 获取缓冲区的原始数据acc数据
            NSData * rawAAC = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize];
            
            // 加头ADTS，如果只做推流这一步是不需要做的，因为推流要求的数据是不需要加头ADTS
            NSData * adtsHeader = [self adtsDataForPacketLength:rawAAC.length];
            NSMutableData * fullData = [NSMutableData dataWithData:adtsHeader];
            [fullData appendData:rawAAC];
            data = fullData;
            // 将rawAAC（不是fullData） 赋值给LFAudioFrame.data以做后续推流处理
            LFAudioFrame *frame = [LFAudioFrame new];
            frame.data = rawAAC;
            frame.timestamp = timeStamp;
            char exeData[2];
            NSInteger sampleRateIndex = [self sampleRateIndex:sampleRate];
            //exeData 这个是后续推流时用到的
            exeData[0] = 0x10 | ((sampleRateIndex>>1) & 0x7);
            exeData[1] = ((sampleRateIndex & 0x1)<<7) | ((channelsCount & 0xF) << 3);
            
            frame.audioInfo = [NSData dataWithBytes:exeData length:2];
            // 回调给控制器
            if (self.delegate) {
                [self.delegate encodedAudio:self audioFrame:frame];
            }
            NSLog(@"Output AAC data length: %li", rawAAC.length);
        } else {
            NSLog(@"数据错误");
            return;
        }
        
        // 回调
        //        if (completionBlock) {
        //            dispatch_async(_callBackQueue, ^{
        //                completionBlock(data, nil);
        //            });
        //        }
        // 写入数据 //TODO:
        //[self.audioFileHandle writeData:data];
        
        CFRelease(sampleBuffer);
        CFRelease(blockBuffer);
    });
}

// 转换采样率
- (NSInteger)sampleRateIndex:(NSInteger)frequencyInHz {
    NSInteger sampleRateIndex = 0;
    switch (frequencyInHz) {
    case 96000:
        sampleRateIndex = 0;
        break;
    case 88200:
        sampleRateIndex = 1;
        break;
    case 64000:
        sampleRateIndex = 2;
        break;
    case 48000:
        sampleRateIndex = 3;
        break;
    case 44100:
        sampleRateIndex = 4;
        break;
    case 32000:
        sampleRateIndex = 5;
        break;
    case 24000:
        sampleRateIndex = 6;
        break;
    case 22050:
        sampleRateIndex = 7;
        break;
    case 16000:
        sampleRateIndex = 8;
        break;
    case 12000:
        sampleRateIndex = 9;
        break;
    case 11025:
        sampleRateIndex = 10;
        break;
    case 8000:
        sampleRateIndex = 11;
        break;
    case 7350:
        sampleRateIndex = 12;
        break;
    default:
        sampleRateIndex = 15;
    }
    return sampleRateIndex;
}
// 按照adts格式要求，加头信息
- (NSData*)adtsDataForPacketLength:(NSUInteger)packetLength {
    int adtsLength = 7;
    char *packet = malloc(sizeof(char) * adtsLength);
    
    int profile = 2;
    int freqIdx = 4;
    int chanCfg = 1;
    NSUInteger fullLength = adtsLength + packetLength;
    packet[0] = (char)0xFF;
    packet[1] = (char)0xF9;
    packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
    packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
    packet[4] = (char)((fullLength&0x7FF) >> 3);
    packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
    packet[6] = (char)0xFC;
    
    NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES];
    return data;
}

至此音频编码也完成了。

iOS直播推流实现-音视频编码

视频编码

音频编码

推荐阅读更多精彩内容