将最近学习的直播推流技术做个笔记。
iOS推流的主要流程如下:
采集拿到每一帧的数据CMSampleBufferRef, 接下来就是对CMSampleBufferRef进行编码,编码分为软编码和硬编码,软编码使用ffmpeg进行编码,属于CPU编码,效率没有硬编码高,硬编码使用GPU编码,苹果使用AudioToolbox和VideoToobox完成硬编码,下面只介绍硬编码的过程。
视频编码
简单来说使用VTCompressionSessionEncodeFrame函数进行编码,输入原始帧数据,输出编码后的数据,但是在调用VTCompressionSessionEncodeFrame函数前需要进行相关对象初始化,准备编码数据,设置编码参数。具体来说,需要先初始化VTCompressionSessionRef对象,该对象是一个指针。初始化函数如下:
VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH2641, (__bridge void*)self, &compressSession);
didCompressH2641时候编码回调函数,表示每一次编码完成后会回调这个函数。
这里的VTCompressionSessionRef compressSession就是初始化的对象。然后对这个对象设置相关参数,如输入的帧数据的帧率,码率,GOP,视频压缩算法(H.264)等,这样VideoToobox才知道怎么编码,最后调用
VTCompressionSessionPrepareToEncodeFrames(compressSession); 表示设置结束。下面显示一段例子:
/// 这个在当前oc对象初始化时调用,可以定义为一个编码器。
- (void) setupCompressionSession {
aQuene = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
// 1. 第几帧数据
_frameID = 0;
// 2. 视频宽高
int width = _configuration.width, height = _configuration.height;
// 3.创建CompressionSession对象,该对象用于对画面进行编码
// kCMVideoCodecType_H264 : 表示使用h.264进行编码
// didCompressH264 : 当一次编码结束会在该函数进行回调,可以在该函数中将数据,写入文件中
VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH2641, (__bridge void*)self, &compressSession);
// 4.设置实时编码输出(直播必然是实时输出,否则会有延迟)
VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
// 5.设置期望帧率(每秒多少帧,如果帧率过低,会造成画面卡顿)
int fps = _configuration.fps;
CFNumberRef fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps);
VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef);
// 6.设置码率(码率: 编码效率, 码率越高,则画面越清晰, 如果码率较低会引起马赛克 --> 码率高有利于还原原始画面,但是也不利于传输)
int bitRate = _configuration.bitRate;
CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate);
VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef);
NSArray *limit = @[@(bitRate * 1.5/8), @(1)];
VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_DataRateLimits, (__bridge CFArrayRef)limit);
// 7.设置关键帧(GOPsize)间隔
int frameInterval = _configuration.keyframeInterval;
CFNumberRef frameIntervalRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval);
VTSessionSetProperty(compressSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRef);
// 8.基本设置结束, 准备进行编码
VTCompressionSessionPrepareToEncodeFrames(compressSession);
}
准备工作结束后,再调用编码函数VTCompressionSessionEncodeFrame,先转换原始数据CMSampleBufferRef为CVImageBufferRef,这个封装了每一帧的数据。
设置时间戳,主要为了保证正确的推流和拉流播放顺序。
具体代码如下:
// 每次把当前时间传进来
- (void)encode:(CMSampleBufferRef)sampleBuffer timeStamp:(uint64_t)timestamp
{
// 放在串行队列,保证编码顺序,
dispatch_sync(aQuene, ^{
_frameID++;
// Get the CV Image buffer
CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);
// Create properties
CMTime presentationTimeStamp = CMTimeMake(_frameID, 1000);
//CMTime duration = CMTimeMake(1, DURATION);
VTEncodeInfoFlags flags;
NSDictionary *properties = nil;
// 设置当前帧是否编码为关键帧,每一轮GOP后设置一次关键帧
if (_frameID % (int32_t)_configuration.keyframeInterval == 0) {
properties = @{(__bridge NSString *)kVTEncodeFrameOptionKey_ForceKeyFrame: @YES};
}
NSNumber *timeNumber = @(timestamp);
// Pass it to the encoder
OSStatus statusCode = VTCompressionSessionEncodeFrame(compressSession,
imageBuffer,
presentationTimeStamp,
kCMTimeInvalid,
(__bridge CFDictionaryRef)properties, (__bridge void *)timeNumber, &flags);
// Check for error
if (statusCode != noErr) {
NSLog(@"H264: VTCompressionSessionEncodeFrame failed with %d", (int)statusCode);
return;
}
NSLog(@"H264: VTCompressionSessionEncodeFrame Success");
});
}
编码完成后,会回调对应的函数,即上面说到的didCompressH2641, 注意苹果的编码并不是一帧一帧的编码,来一帧数据就编码一帧,为保证编码效率,同时由于IPB各帧相互直接的关联,它可能会累加到一定数量再进行编码。
拿到编码后的数据后,先判断是否是关键帧,如果是关键帧需要在关键帧前面插入sps & pps数据,而sps和pps数据是从编码后的数据取出来的。如果想写入文件需要先写入NALU的header。
例如我们写入pps和sps数据时可以:
- (void)gotSpsPps:(NSData*)sps pps:(NSData*)pps
{
// 1.拼接NALU的header
const char bytes[] = "\x00\x00\x00\x01";
size_t length = (sizeof bytes) - 1;
NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];
2.将NALU的头&NALU的体写入文件
// 拿到文件句柄fileHandle
[self.fileHandle writeData:ByteHeader];
[self.fileHandle writeData:sps];
[self.fileHandle writeData:ByteHeader];
[self.fileHandle writeData:pps];
}
写入IBP帧:
- (void)gotEncodedData:(NSData*)data isKeyFrame:(BOOL)isKeyFrame
{
if (self.fileHandle != NULL)
{
const char bytes[] = "\x00\x00\x00\x01";
size_t length = (sizeof bytes) - 1; //string literals have implicit trailing '\0'
NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];
[self.fileHandle writeData:ByteHeader];
[self.fileHandle writeData:data];
}
}
如果只是推流,可以不进行上面两步。下面是编码完成的回调,代码如下:
// 编码完成回调 sampleBuffer是编码后的数据
void didCompressH2641(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer) {
// 1.判断状态是否等于没有错误
if (status != noErr) {
return;
}
// 2.根据传入的参数获取对象, 一般是当前对象
VideoHWEncoder* encoder = (__bridge VideoHWEncoder*)outputCallbackRefCon;
uint64_t timeStamp = [((__bridge_transfer NSNumber *)sourceFrameRefCon) longLongValue];
// 3.判断是否是关键帧
bool isKeyframe = !CFDictionaryContainsKey( (CFArrayGetValueAtIndex(CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true), 0)), kCMSampleAttachmentKey_NotSync);
// 判断当前帧是否为关键帧
// 获取sps & pps数据
if (isKeyframe && !encoder->sps)
{
// 获取编码后的信息(存储于CMFormatDescriptionRef中)
CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
// 获取SPS信息
size_t sparameterSetSize, sparameterSetCount;
const uint8_t *sparameterSet;
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0 );
// 获取PPS信息
size_t pparameterSetSize, pparameterSetCount;
const uint8_t *pparameterSet;
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0 );
// 装sps/pps转成NSData,以方便写入文件
NSData *sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];
NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];
encoder->sps=sps;
encoder->pps=pps;
// 写入文件
[encoder gotSpsPps:sps pps:pps];
}
// 获取数据块
CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t length, totalLength;
char *dataPointer;
OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);
if (statusCodeRet == noErr) {
size_t bufferOffset = 0;
static const int AVCCHeaderLength = 4; // 返回的nalu数据前四个字节不是0001的startcode,而是大端模式的帧长度length
// 循环获取nalu数据,因为有可能存在多个nalu数据
while (bufferOffset < totalLength - AVCCHeaderLength) {
uint32_t NALUnitLength = 0;
// Read the NAL unit length
memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);
// 从大端转系统端
NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
NSData* data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];
[encoder gotEncodedData:data isKeyFrame:isKeyframe];
// 把数据保存到LFVideoFrame *frame中,供后续推流使用
LFVideoFrame *frame = [LFVideoFrame new];
// 是否是关键帧
frame.isKeyFrame = isKeyframe;
// pps数据
frame.pps = encoder->pps;
// sps数据
frame.sps = encoder->sps;
// I或P或B帧NAL数据
frame.data = data;
// 当前时间戳
frame.timestamp = timeStamp;
// 回调给代理对象,一般是VC控制器,控制器进行下一步操作即推流。
if (encoder.delegate) {
[encoder.delegate encodedVideo:encoder videoFrame:frame];
}
// 移动到写一个块,转成NALU单元
// Move to the next NAL unit in the block buffer
bufferOffset += AVCCHeaderLength + NALUnitLength;
}
}
}
好了,至此视频编码就完成了,接下来是音频编码。
音频编码
上面视频编码代码封装到一个VideoHWEncoder类中,同样音频编码也封装到另一个类中,这里取名AudioHWEncoder,跟视频编码比较类似,要先设置编码参数,初始化相关对象,然后调用编码函数,在回调函数中处理编码后的数据。
设置编码参数声道数,音频采样率,编码后的格式(AAC),每采样点占用位数,代码如下:
// 配置编码参数
- (void)setupEncoderFromSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
NSLog(@"开始配置编码参数。。。。");
// 获取原音频声音格式设置
AudioStreamBasicDescription inAudioStreamBasicDescription = *CMAudioFormatDescriptionGetStreamBasicDescription((CMAudioFormatDescriptionRef)CMSampleBufferGetFormatDescription(sampleBuffer));
AudioStreamBasicDescription outAudioStreamBasicDescription = {0};
// 下面设置输出即编码后的音频参数
// 采样率
outAudioStreamBasicDescription.mSampleRate = inAudioStreamBasicDescription.mSampleRate;
sampleRate = (NSInteger)inAudioStreamBasicDescription.mSampleRate;
channelsCount = (NSInteger)inAudioStreamBasicDescription.mChannelsPerFrame;
// 格式 kAudioFormatMPEG4AAC = 'aac' ,
outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC;
// 标签格式 无损编码
outAudioStreamBasicDescription.mFormatFlags = kMPEG4Object_AAC_LC;
// 每个Packet 的 Bytes 数量 0:动态大小格
outAudioStreamBasicDescription.mBytesPerPacket = 0;
// 每个Packet的帧数量,设置一个较大的固定值 1024
outAudioStreamBasicDescription.mFramesPerPacket = 1024;
// 每帧的Bytes数量
outAudioStreamBasicDescription.mBytesPerFrame = 0;
// 1 单声道 2: 立体声
outAudioStreamBasicDescription.mChannelsPerFrame = 1;
// 语言每采样点占用位数
outAudioStreamBasicDescription.mBitsPerChannel = 0;
// 保留参数(对齐当时)
outAudioStreamBasicDescription.mReserved = 0;
// 获取编码器
//AudioClassDescription * description = [self getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC fromManufacturer:kAppleSoftwareAudioCodecManufacturer];
// 创建编码器
/*
inAudioStreamBasicDescription 传入源音频格式
outAudioStreamBasicDescription 目标音频格式
第三个参数:传入音频编码器的个数
description 传入音频编码器的描述,不指定描述则使用系统默认的
*/
//OSStatus status = AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, description, &_audioConverter);
OSStatus status = AudioConverterNew(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, &_audioConverter);
UInt32 value = 0;
UInt32 size = sizeof(value);
AudioConverterGetProperty(_audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &value);
// 初始化编码后数据buffer
_aacBufferSize = value;
_aacBuffer = malloc(value);
if (status != 0) {
NSLog(@"创建编码器失败");
}
}
之后可以开始编码:
// 编码数据
- (void)encode:(CMSampleBufferRef)sampleBuffer timeStamp:(uint64_t)timeStamp{
CFRetain(sampleBuffer);
dispatch_sync(_encoderQueue, ^{
if (!self.audioConverter) {
// 配置编码参数
[self setupEncoderFromSampleBuffer:sampleBuffer];
}
// 获取CMBlockBufferRef
CMBlockBufferRef blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
CFRetain(blockBuffer);
// 获取_pcmBufferSize 和 _pcmBuffer
OSStatus status = CMBlockBufferGetDataPointer(blockBuffer, 0, NULL, &self->_pcmBufferSize, &self->_pcmBuffer);
if (status != kCMBlockBufferNoErr) {
NSLog(@"获取 pcmBuffer 数据错误");
return ;
}
// 清空
memset(self->_aacBuffer, 0, self->_aacBufferSize);
// 初始化缓冲列表
AudioBufferList outAudioBufferList = {0}; // 结构体
// 缓冲区个数
outAudioBufferList.mNumberBuffers = 1;
// 渠道个数
outAudioBufferList.mBuffers[0].mNumberChannels = 1;
// 缓存区大小
outAudioBufferList.mBuffers[0].mDataByteSize = (int)self->_aacBufferSize;
// 缓冲区内容
outAudioBufferList.mBuffers[0].mData = self->_aacBuffer;
// 编码
AudioStreamPacketDescription * outPD = NULL;
UInt32 inPutSize = 1;
/*
inInputDataProc 自己实现的编码数据的callback引用
self 获取的数据
inPutSize 输出数据的长度
outAudioBUfferList 输出的数据
outPD 输出数据的描述
*/
status = AudioConverterFillComplexBuffer(self->_audioConverter,
inInputDataProc,
(__bridge void*)self,
&inPutSize,
&outAudioBufferList,
outPD
);
// 编码后完成
NSData * data = nil;
if (status == noErr) {
// 获取缓冲区的原始数据acc数据
NSData * rawAAC = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize];
// 加头ADTS,如果只做推流这一步是不需要做的,因为推流要求的数据是不需要加头ADTS
NSData * adtsHeader = [self adtsDataForPacketLength:rawAAC.length];
NSMutableData * fullData = [NSMutableData dataWithData:adtsHeader];
[fullData appendData:rawAAC];
data = fullData;
// 将rawAAC(不是fullData) 赋值给LFAudioFrame.data以做后续推流处理
LFAudioFrame *frame = [LFAudioFrame new];
frame.data = rawAAC;
frame.timestamp = timeStamp;
char exeData[2];
NSInteger sampleRateIndex = [self sampleRateIndex:sampleRate];
//exeData 这个是后续推流时用到的
exeData[0] = 0x10 | ((sampleRateIndex>>1) & 0x7);
exeData[1] = ((sampleRateIndex & 0x1)<<7) | ((channelsCount & 0xF) << 3);
frame.audioInfo = [NSData dataWithBytes:exeData length:2];
// 回调给控制器
if (self.delegate) {
[self.delegate encodedAudio:self audioFrame:frame];
}
NSLog(@"Output AAC data length: %li", rawAAC.length);
} else {
NSLog(@"数据错误");
return;
}
// 回调
// if (completionBlock) {
// dispatch_async(_callBackQueue, ^{
// completionBlock(data, nil);
// });
// }
// 写入数据 //TODO:
//[self.audioFileHandle writeData:data];
CFRelease(sampleBuffer);
CFRelease(blockBuffer);
});
}
// 转换采样率
- (NSInteger)sampleRateIndex:(NSInteger)frequencyInHz {
NSInteger sampleRateIndex = 0;
switch (frequencyInHz) {
case 96000:
sampleRateIndex = 0;
break;
case 88200:
sampleRateIndex = 1;
break;
case 64000:
sampleRateIndex = 2;
break;
case 48000:
sampleRateIndex = 3;
break;
case 44100:
sampleRateIndex = 4;
break;
case 32000:
sampleRateIndex = 5;
break;
case 24000:
sampleRateIndex = 6;
break;
case 22050:
sampleRateIndex = 7;
break;
case 16000:
sampleRateIndex = 8;
break;
case 12000:
sampleRateIndex = 9;
break;
case 11025:
sampleRateIndex = 10;
break;
case 8000:
sampleRateIndex = 11;
break;
case 7350:
sampleRateIndex = 12;
break;
default:
sampleRateIndex = 15;
}
return sampleRateIndex;
}
// 按照adts格式要求,加头信息
- (NSData*)adtsDataForPacketLength:(NSUInteger)packetLength {
int adtsLength = 7;
char *packet = malloc(sizeof(char) * adtsLength);
int profile = 2;
int freqIdx = 4;
int chanCfg = 1;
NSUInteger fullLength = adtsLength + packetLength;
packet[0] = (char)0xFF;
packet[1] = (char)0xF9;
packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));
packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));
packet[4] = (char)((fullLength&0x7FF) >> 3);
packet[5] = (char)(((fullLength&7)<<5) + 0x1F);
packet[6] = (char)0xFC;
NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES];
return data;
}
至此音频编码也完成了。