利用GPUImage录制直播流合成小视频

本篇文章是在开发新功能-观众端录制直播流的小视频过程中,记录学习到内容,踩过的坑,分享一下.

需求

在观众端可以录制正在播放的流的小视频,同时要将屏幕上的用户互动，包括:礼物,聊天,弹幕等元素同时录制下来，与视频流合在一起.

背景介绍

播放器使用七牛PLPlayerKit.而该框架在播放流时有两个回调方法,将解析到的流数据暴露出来.

/**
 回调将要渲染的帧数据
 该功能只支持直播

 @param player 调用该方法的 PLPlayer 对象
 @param frame 将要渲染帧 YUV 数据。
 CVPixelBufferGetPixelFormatType 获取 YUV 的类型。
 软解为 kCVPixelFormatType_420YpCbCr8Planar.
 硬解为 kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange.
 @param pts 显示时间戳 单位ms
 @param sarNumerator
 @param sarDenominator
 其中sar 表示 storage aspect ratio
 视频流的显示比例 sarNumerator sarDenominator
 @discussion sarNumerator = 0 表示该参数无效
 
 @since v2.4.3
 */
- (void)player:(nonnull PLPlayer *)player willRenderFrame:(nullable CVPixelBufferRef)frame pts:(int64_t)pts sarNumerator:(int)sarNumerator sarDenominator:(int)sarDenominator;

/**
 回调音频数据

 @param player 调用该方法的 PLPlayer 对象
 @param audioBufferList 音频数据
 @param audioStreamDescription 音频格式信息
 @param pts 显示时间戳 是解码器进行显示帧时相对于SCR（系统参考）的时间戳。SCR可以理解为解码器应该开始从磁盘读取数据时的时间
 @param sampleFormat 采样位数 枚举：PLPlayerAVSampleFormat
 @return audioBufferList 音频数据
 
 @since v2.4.3
 */
- (nonnull AudioBufferList *)player:(nonnull PLPlayer *)player willAudioRenderBuffer:(nonnull AudioBufferList *)audioBufferList asbd:(AudioStreamBasicDescription)audioStreamDescription pts:(int64_t)pts sampleFormat:(PLPlayerAVSampleFormat)sampleFormat;

分析

拿到需求时,针对要将用户互动内容一起渲染的需求,首先想到了OpenGL中的多重纹理混合的应用,将通过视频流创建的纹理和通过屏幕元素创建的纹理混合后,输出我们需要的纹理数据,转为视频数据,通过回调接口的pts与音频数据同步,录入视频.
而这个流程中的合成和写入视频,基于 OpenGL ES的GPUImage都有很好的是实现,本着不重复造轮子,合理利用资源,于是就决定基于GPUImage来实现.

视频数据

通过可以拿到的视频数据为kCVPixelFormatType_420YpCbCr8Planar(软解)或者kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange(硬解)的CVPixelBufferRef数据.

/**
 @abstract 是否使用 video toolbox 硬解码。
 @discussion 使用 video toolbox Player 将尝试硬解码，失败后，将切换回软解码。
 @waring 该参数仅对 rtmp/flv 直播生效, 默认不使用。支持 iOS 8.0 及更高版本。
 @since v2.1.4
 */
extern NSString  * _Nonnull PLPlayerOptionKeyVideoToolbox;

虽然GPUImage对kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange格式的数据有着很好的支持和使用过程,但是本着兼容性考虑,必须对使用kCVPixelFormatType_420YpCbCr8Planar格式视频数据作为输入.

代码

GPUImagePixelRender继承GPUImageOutput,作为输出视频纹理的类,进入GPUImage响应链.基本仿照了GPUImageMovie类的初始化流程.关键点在修改shader. kCVPixelFormatType_420YpCbCr8Planar是三个planar来分别存储YUV数据,在上传纹理时必然使用是三个纹理采样.

// DTVRecordVideoFrame:数据模型类,记录视频数据
- (DTVRecordVideoFrame *)creatTextureYUV:(CVPixelBufferRef)pixelBuffer
{
    OSType pixelType = CVPixelBufferGetPixelFormatType(pixelBuffer);
    NSAssert(pixelType == kCVPixelFormatType_420YpCbCr8Planar, @"pixelType error ...");
    int pixelWidth = (int)CVPixelBufferGetWidth(pixelBuffer);
    int pixelHeight = (int)CVPixelBufferGetHeight(pixelBuffer);
    CVPixelBufferLockBaseAddress(pixelBuffer, 0);
    
    DTVRecordVideoFrame *yuv = [[DTVRecordVideoFrame alloc] init];
    // 视频数据的宽高
    yuv.width = pixelWidth;
    yuv.height = pixelHeight;
    // YUV三个分量数据
    size_t y_size = pixelWidth * pixelHeight;
    uint8_t *yuv_y_frame = malloc(y_size);
    uint8_t *y_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
    memcpy(yuv_y_frame, y_frame, y_size);
    yuv.Y = yuv_y_frame;
    
    size_t u_size = y_size / 4;
    uint8_t *yuv_u_frame = malloc(u_size);
    uint8_t *u_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
    memcpy(yuv_u_frame, u_frame, u_size);
    yuv.U = yuv_u_frame;
    
    size_t v_size = y_size / 4;
    uint8_t *yuv_v_frame = malloc(v_size);
    uint8_t *v_frame = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 2);
    memcpy(yuv_v_frame, v_frame, v_size);
    yuv.V = yuv_v_frame;
    
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
    return yuv;
}

解析出数据后,创建FBO,上传顶点和纹理数据等不做详解,可参考GPUImageMovie来做.下面是创建纹理对象的代码

- (void)setupTexture:(DTVRecordVideoFrame *)videoFrame
{
    if (0 == _textures[0]) glGenTextures(3, _textures);
    
    const uint8_t *pixelByte[3] = {videoFrame.Y , videoFrame.U , videoFrame.V};
    const int widths[3]  = { videoFrame.width, videoFrame.width / 2, videoFrame.width / 2 };
    const int heights[3] = { videoFrame.height, videoFrame.height / 2, videoFrame.height / 2 };

    for (int i = 0; i < 3; i++) {
        glBindTexture(GL_TEXTURE_2D, _textures[i]);
        glTexImage2D(GL_TEXTURE_2D,
                     0,
                     GL_LUMINANCE,
                     widths[i],
                     heights[i],
                     0,
                     GL_LUMINANCE,
                     GL_UNSIGNED_BYTE,
                     pixelByte[i]);
        
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
        glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
        glBindTexture(GL_TEXTURE_2D, 0);
    }
}

再看一下shader,shader的代码是从kxMoive中学习到的,用来渲染YUV三个分量,

NSString *const kGPUImageYUVPlanarFragmentShaderString = SHADER_STRING
(
 varying highp vec2 textureCoordinate;
 
 uniform sampler2D s_texture_y;
 uniform sampler2D s_texture_u;
 uniform sampler2D s_texture_v;
 
 void main()
 {
     highp float y = texture2D(s_texture_y, textureCoordinate).r;
     highp float u = texture2D(s_texture_u, textureCoordinate).r - 0.5;
     highp float v = texture2D(s_texture_v, textureCoordinate).r - 0.5;
     
     highp float r = y +             1.402 * v;
     highp float g = y - 0.344 * u - 0.714 * v;
     highp float b = y + 1.772 * u;
     
     gl_FragColor = vec4(r,g,b,1.0);
 }
);

View数据

GPUImageUIElement就是用来根据view或layer来生成纹理的,按理说可以拿来直接使用,然而我们屏幕上的动画并不是都用的layer.contents来实现的,有些是基于UIView或者CAAnimation的一些layer动画,如果直接使用view或layer,一些动画根本不会显示出来.如果对CALayer图层了解的话,肯定知道为什么了.Layer层中modelLayer的属性是在修改后立刻就变为终值的，而presentationLayer则会经历一个渐变的修改过程.而我们平常view.layer就是modelLayer,直接是终值了.所以我们拿到的纹理动画上会有些奇怪.
知道了这点,对GPUImageUIElement进行修改,每次创建纹理获取数据时,是用presentationLayer来渲染.
同时发现GPUImageUIElement每次更新纹理时,都创建新的FBO,不会cache回收,对下面这段代码进行了修改,使用完FBO后.

    for (id<GPUImageInput> currentTarget in targets)
     {
        if (currentTarget != self.targetToIgnoreForUpdates)
        {
            NSInteger indexOfObject = [targets indexOfObject:currentTarget];
            NSInteger textureIndexOfTarget = [[targetTextureIndices objectAtIndex:indexOfObject] integerValue];
            [currentTarget setInputSize:layerPixelSize atIndex:textureIndexOfTarget];
            [currentTarget setInputFramebuffer:outputFramebuffer atIndex:textureIndexOfTarget];
            [currentTarget newFrameReadyAtTime:kCMTimeIndefinite atIndex:textureIndexOfTarget];
        }
    }

合成

PlanA:视频收到一帧合成绘制一帧.通常我们采用的视频流帧率是24或36,而屏幕刷新是60,经测试,以视频的帧率来刷新会比较节省CPU,但视频卡顿时,屏幕元素也会卡住,同时动画不够流畅.

PlanB:以CADisplayLink刷新屏幕元素,以接收到的帧数据刷新视频帧.
GPUImageMovieWriter用来写入视频数据,存入本地.

//缓存视频帧
- (void)addVideoPixelBuffer:(CVPixelBufferRef)pixelBuffer pts:(int64_t)videoPts fps:(int)videoFPS;
{
    // 已缓存足够的数据
    if (_hasFillFrame) {
        return;
    }
    // 记录开始的pts
    if (!_firstFramePTS) _firstFramePTS = videoPts;
    
    DTVRecordVideoFrame *videoframe = [self creatTextureYUV:pixelBuffer];
    if (videoframe.Y == NULL || videoframe.U == NULL || videoframe.V == NULL ) {
        NSLog(@"无视频效帧");
        return;
    }
    videoframe.pts = videoPts;
   //帧持续时长
    videoframe.duration = _previousFrame ? (videoPts - _previousFrame.pts) : (1 / 24.f * 1000);
   //帧在我们录制视频中的pts
    videoframe.frameTime = CMTimeMake((videoPts - _firstFramePTS) * 600, 600 * 1000);
  // 缓存
    [self.videoBuffer addObject:videoframe];
    
    _previousFrame = videoframe;
    if (self.videoBuffer.count > 3 && !self.displayLink) {
        //循环切换视频帧
        [self tick];
    }
}

在tick中会根据缓存的数量,和帧持续的时长切换当前的帧数据.通过GPUImageMovieWriter写入视频中.
https://github.com/BradLarson/GPUImage/issues/1729解答GPUImageMovieWriter写入AVFileTypeMPEG4时出现问题解决办法.

- (void)tick
{
    if (self.videoBuffer.count < 1) {
        if (_hasFillFrame) {
            [self stopDisplayLinkTimer];
            if (_movieWriter) {
                [_movieWriter finishRecording];
                [_blendFilter removeTarget:_movieWriter];
                _movieWriter = NULL;
            }
            if (self.completeBlock) self.completeBlock(_coverImage);
        }
        else{
            _renderVideoFrame = NO;
            NSLog(@"卡住...");
        }
    }
    else
    {
        _renderVideoFrame = YES;
        DTVRecordVideoFrame *frameTexture = self.videoBuffer.firstObject;
        
        if (!self.movieWriter) {
            unlink([DefaultFuckVideoPath UTF8String]);
            _movieWriter = [[GPUImageMovieWriter alloc] initWithMovieURL:[NSURL fileURLWithPath:DefaultFuckVideoPath] size:CGSizeMake(540, 960) fileType:AVFileTypeMPEG4 outputSettings:nil];
            _movieWriter.encodingLiveVideo = YES;
            _movieWriter.hasAudioTrack = YES;
            _movieWriter.assetWriter.movieFragmentInterval = kCMTimeInvalid;
            
            [self.pixelRender addTarget:self.blendFilter];
            [self.layerRender addTarget:self.blendFilter];
            [self.blendFilter addTarget:_movieWriter];
            [_movieWriter startRecording];
        }
        
        [self startDisplayLinkTimer];
        
        runAsynchronouslyOnVideoProcessingQueue(^{
            [self.pixelRender processVideoFrame:frameTexture];
        });
        
        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(frameTexture.duration * NSEC_PER_MSEC)), dispatch_get_main_queue(), ^{
            [self.videoBuffer removeObjectAtIndex:0];
            [self tick];
        });
        //作为封面
        if (CMTimeGetSeconds(_previousFrame.frameTime) > 0.5f && !_coverImage) {
            [self.blendFilter useNextFrameForImageCapture];
            _coverImage = [self.blendFilter imageFromCurrentFramebuffer];
        }
    }
}

CADisplayLink方法writerFrame,负责刷新获取屏幕元素数据,与当前视频帧_currentFrame通过GPUImageAlphaBlendFilter的滤镜合成最终一帧.

- (void)writerFrame
{
   [self.layerRender updateWithPresentationLayer:_renderView.layer.presentationLayer];
}

音频

音频要和视频同步,由于我们通过七牛接口拿到的是AudioBufferList数据,需要转换为GPUImageMovieWriter 需要的CMSampleBufferRef数据.

 // 根据视频的pts重新计算获取音频的pts
    CMTime time = CMTimeMake((audioPts - _firstFramePTS) * 600, 600 * 1000);
//转换CMSampleBufferRef
    CMSampleBufferRef audioBuffer = NULL;
    CMFormatDescriptionRef format = NULL;
    CMSampleTimingInfo timing = {CMTimeMake(1, audioStreamDescription.mSampleRate),time, kCMTimeInvalid};
     UInt32 size = audioBufferList->mBuffers->mDataByteSize / sizeof(UInt32);
     UInt32 mNumberChannels = audioBufferList->mBuffers->mNumberChannels;
     CMItemCount numSamples = (CMItemCount)size / mNumberChannels;

     OSStatus status;
     status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &audioStreamDescription, 0, NULL, 0, NULL, NULL, &format);
        if (status != noErr) {
            CFRelease(format);
            return;
        }
        
    status = CMSampleBufferCreate(kCFAllocatorDefault,NULL,false,NULL,NULL,format,numSamples, 1, &timing, 0, NULL, &audioBuffer);
        if (status != noErr) {
            CFRelease(format);
            return;
        }
        
    status = CMSampleBufferSetDataBufferFromAudioBufferList(audioBuffer, kCFAllocatorDefault,kCFAllocatorDefault, 0,audioBufferList);
        if (status != noErr) {
            CFRelease(format);
            return;
        }

    if (_movieWriter && audioBuffer) {
            [_movieWriter processAudioBuffer:audioBuffer];
        }

总结

在做这个功能的过程中学习到了很多内容,CALayer图层,视频数据格式,音频转换,简单的音视频同步,加深了GPUImage的理解.个人感觉收获颇多.

利用GPUImage录制直播流合成小视频

需求

背景介绍

分析

视频数据

代码

View数据

合成

音频

总结

推荐阅读更多精彩内容