前言
由于公司目前业务需要用到物体识别功能,所以需要集成和运用Google的TensorFlowLite这个机器学习的框架!官方给出了swift版本的demo,但是OC版本的都是老版本的,还是c++的函数居多,也不够全面和清晰!特别是对于非量化模型的识别数据处理,更是头疼!基于此,我参照swift版本的demo,写了一个OC版本的,希望对于不熟悉或者需要OC版本的小伙伴有所帮助...
一:TensorFlowLite的功能
TensorFlowLite主要是应用在移动端的轻量级机器学习框架,它支持的主要功能如下图:
我们业务目前需求的是对象检测功能。
-
1.什么是物体检测
对于给定的图片或者视频流,对象检测模块可以识别出已知的物体和该物体在图片中的位置。例如下图(图片来自官网)
-
2.物体检测模块输出
当我们为模型提供图片,模型将会返回一个列表,其中包含检测到的对象,包含对象矩形框的坐标和代表检测可信度的分数。坐标,输出数据在第0个数组,会根据每个检测到的物体返回一个[top,left,bottom,right]的float浮点数组。该四个数字代表了围绕物体的一个矩形框(官方的说法是坐标,但是实际使用是距离比例,需要自己换算成对应的尺寸)。
类别,也就是index,需要自己根据labels的定义转换具体的类名,index返回后需要+1操作,针对的是官方的训练模型。
信任分数,我们使用信任分数和所检测到对象的坐标来表示检测结果。分数反应了被检测到物体的可信度,范围在 0 和 1 之间。最大值为1,数值越大可信度越高。
检测到的数量,物体检测模块最多能够在一张图中识别和定位10个物体.所以一般返回小于10的数值
- 输入
模块使用单个图片作为输入。理想的图片尺寸是 300x300 像素,每个像素有3个通道(红,蓝,和绿)。这将反馈给模块一个 27000 字节( 300 x 300 x 3 )的扁平化缓存。由于该模块经过标准化处理,每一个字节代表了 0 到 255 之间的一个值。
- 输入
-
4.输出
该模型输出四个数组,分别对应索引的 0-4。前三个数组描述10个被检测到的物体,每个数组的最后一个元素匹配每个对象。检测到的物体数量总是10。
二.实操,实实在在的操作一遍,光蹭蹭不进去就是耍流氓
1.初始化识别器
- (void)setupInterpreter
{
NSError *error;
NSString *path = [[NSBundle mainBundle] pathForResource:@"detect" ofType:@"tflite"];
//初始化识别器,需要传入训练模型的路径,还可以传options
self.interpreter = [[TFLInterpreter alloc] initWithModelPath:path error:&error];
if (![self.interpreter allocateTensorsWithError:&error]) {
NSLog(@"Create interpreter error: %@", error);
}
}
2.初始化摄像头
- (void)setupCamera
{
self.session = [[AVCaptureSession alloc] init];
[self.session setSessionPreset:AVCaptureSessionPresetHigh];//开启高质量模式,一般使用16:9
// [self.session setSessionPreset:AVCaptureSessionPreset640x480];//如果需要4:3最好设置,避免自己裁切
self.inputDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];//默认
// self.inputDevice = [AVCaptureDevice defaultDeviceWithDeviceType:AVCaptureDeviceTypeBuiltInWideAngleCamera mediaType:AVMediaTypeVideo position:AVCaptureDevicePositionBack];//指定广角模式和镜头
NSError *error;
self.deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:self.inputDevice error:&error];
if ([self.session canAddInput:self.deviceInput]) {
[self.session addInput:self.deviceInput];
}
self.previewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:self.session];
[self.previewLayer setVideoGravity:AVLayerVideoGravityResizeAspectFill];
// [self.previewLayer setVideoGravity:AVLayerVideoGravityResizeAspect];按比例拉伸
CALayer *rootLayer = [[self view] layer];
[rootLayer setMasksToBounds:YES];
CGRect frame = self.view.frame;
[self.previewLayer setFrame:frame];
// [self.previewLayer setFrame:CGRectMake(0, 0, frame.size.width, frame.size.width * 4 / 3)];
[rootLayer insertSublayer:self.previewLayer atIndex:0];
//添加绘制图层
self.overlayView = [[OverlayView alloc] initWithFrame:self.previewLayer.bounds];
[self.view addSubview:self.overlayView];
self.overlayView.clearsContextBeforeDrawing = YES;//设置清空画布上下文
AVCaptureVideoDataOutput *videoDataOutput = [AVCaptureVideoDataOutput new];
NSDictionary *rgbOutputSettings = [NSDictionary
dictionaryWithObject:[NSNumber numberWithInt:kCMPixelFormat_32BGRA]
forKey:(id)kCVPixelBufferPixelFormatTypeKey];
[videoDataOutput setVideoSettings:rgbOutputSettings];
[videoDataOutput setAlwaysDiscardsLateVideoFrames:YES];
dispatch_queue_t videoDataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL);
[videoDataOutput setSampleBufferDelegate:self queue:videoDataOutputQueue];
if ([self.session canAddOutput:videoDataOutput])
[self.session addOutput:videoDataOutput];
// [[videoDataOutput connectionWithMediaType:AVMediaTypeVideo] setEnabled:YES];
[videoDataOutput connectionWithMediaType:AVMediaTypeVideo].videoOrientation = AVCaptureVideoOrientationPortrait;//指定镜头方向
[self.session startRunning];
}
3.在视频流回调代理里面,执行我们的一系列旋转,跳跃,我闭着眼的操作!弱弱的说一句,内存问题还有压缩变形变换等问题已经让我本来就不富裕的头发雪上加霜,掉发严重了许多,更惨的是白发丛生...
#pragma mark------ AVCaptureVideoDataOutputSampleBufferDelegate
- (void)captureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
NSTimeInterval curentInterval = [[NSDate date] timeIntervalSince1970] * 1000;
if (curentInterval - self.previousTime < self.delayBetweenMs) {
return;
}
/*
if (connection.videoOrientation != self.videoOrientation) {
//切换镜头方向,如果是官方训练模型不必要切换,可以注释掉
connection.videoOrientation = self.videoOrientation;
}
*/
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
size_t imageWidth = CVPixelBufferGetWidth(pixelBuffer);
size_t imageHeight = CVPixelBufferGetHeight(pixelBuffer);
//如果需要旋转识别图像,可以用下面的方法,但是在iOS13.4上,内存释放有问题
/*
CVPixelBufferRef rotatePixel = pixelBuffer;
switch (self.videoOrientation) {
case 1:
rotatePixel = [self rotateBuffer:pixelBuffer withConstant:0];
break;
case 2:
rotatePixel = [self rotateBuffer:pixelBuffer withConstant:2];
break;
case 3:
rotatePixel = [self rotateBuffer:pixelBuffer withConstant:1];
break;
case 4:
rotatePixel = [self rotateBuffer:pixelBuffer withConstant:3];
break;
default:
break;
}
*/
//如果需要裁剪并且缩放识别图像,可以用下面方法,需要自己设定裁剪范围,并且计算仿射变换
/*
CGRect videoRect = CGRectMake(0, 0, imageWidth, imageHeight);
CGSize scaledSize = CGSizeMake(300, 300);
// Create a rectangle that meets the output size's aspect ratio, centered in the original video frame
CGSize cropSize = CGSizeZero;
if (imageWidth > imageHeight) {
cropSize = CGSizeMake(imageWidth, imageWidth * 3 /4);
}
else
{
cropSize = CGSizeMake(imageWidth, imageWidth * 4 /3);
}
CGRect centerCroppingRect = AVMakeRectWithAspectRatioInsideRect(cropSize, videoRect);
CVPixelBufferRef croppedAndScaled = [self createCroppedPixelBufferRef:pixelBuffer cropRect:centerCroppingRect scaleSize:scaledSize context:self.context];
*/
//这里用的官方的训练模型,识别大小为300 * 300,所以直接缩放
CVPixelBufferRef scaledPixelBuffer = [self resized:CGSizeMake(300, 300) cvpixelBuffer:pixelBuffer];
//如果想看看缩放之后的图像是否满足要求,可以保存到相册
/*
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(1 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
UIImage *image = [self imageFromSampleBuffer:scaledPixelBuffer];
UIImageWriteToSavedPhotosAlbum(image, self, @selector(image:didFinishSavingWithError:contextInfo:), (__bridge void *)self);
});
*/
//TensorFlow 输入和输出数据处理
NSError *error;
TFLTensor *inputTensor = [self.interpreter inputTensorAtIndex:0 error:&error];
NSData *imageData = [self rgbDataFromBuffer:scaledPixelBuffer isModelQuantized:inputTensor.dataType == TFLTensorDataTypeUInt8];
[inputTensor copyData:imageData error:&error];
[self.interpreter invokeWithError:&error];
if (error) {
NSLog(@"Error++: %@", error);
}
//输出坐标,按照top,left,bottom,right的占比
TFLTensor *outputTensor = [self.interpreter outputTensorAtIndex:0 error:&error];
//输出index
TFLTensor *outputClasses = [self.interpreter outputTensorAtIndex:1 error:nil];
//输出分数
TFLTensor *outputScores = [self.interpreter outputTensorAtIndex:2 error:nil];
//输出识别物体个数
TFLTensor *outputCount = [self.interpreter outputTensorAtIndex:3 error:nil];
//格式化输出的数据
NSArray<HFInference *> *inferences = [self formatTensorResultWith:[self transTFLTensorOutputData:outputTensor] indexs:[self transTFLTensorOutputData:outputClasses] scores:[self transTFLTensorOutputData:outputScores] count:[[self transTFLTensorOutputData:outputCount].firstObject integerValue] width:imageWidth height:imageHeight];
NSLog(@"+++++++++++++");
for (HFInference *inference in inferences) {
NSLog(@"rect: %@ index %ld score: %f className: %@\n",NSStringFromCGRect(inference.boundingRect),inference.index,inference.confidence,inference.className);
}
NSLog(@"+++++++++++++");
//切换到主线程绘制
dispatch_async(dispatch_get_main_queue(), ^{
[self drawOverLayWithInferences:inferences width:imageWidth height:imageHeight];
});
}
4 .投喂给识别器的数据处理
因为官方的训练模型只能接受300 *300 *3的图片数据,所以我们视频流把CMSampleBufferRef
缩放成对应的大小
- 1.缩放
//缩放CVPixelBufferRef
- (CVPixelBufferRef)resized:(CGSize)size cvpixelBuffer:(CVPixelBufferRef)pixelBuffer
{
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
// CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
size_t imageWidth = CVPixelBufferGetWidth(pixelBuffer);
size_t imageHeight = CVPixelBufferGetHeight(pixelBuffer);
OSType pixelBufferType = CVPixelBufferGetPixelFormatType(pixelBuffer);
assert(pixelBufferType == kCVPixelFormatType_32BGRA);
size_t sourceRowBytes = CVPixelBufferGetBytesPerRow(pixelBuffer);
NSInteger imageChannels = 4;
unsigned char* sourceBaseAddr = (unsigned char*)(CVPixelBufferGetBaseAddress(pixelBuffer));
vImage_Buffer inbuff = {sourceBaseAddr, (NSUInteger)imageHeight,(NSUInteger)imageWidth, sourceRowBytes};
// NSInteger scaledImageRowBytes = ceil(size.width/4) * 4 * imageChannels;
NSInteger scaledImageRowBytes = vImageByteAlign(size.width * imageChannels , 64);
unsigned char *scaledVImageBuffer = malloc((NSInteger)size.height * scaledImageRowBytes);
if (scaledVImageBuffer == nil) {
return nil;
}
vImage_Buffer outbuff = {scaledVImageBuffer,(NSUInteger)size.height,(NSUInteger)size.width,scaledImageRowBytes};
vImage_Error scaleError = vImageScale_ARGB8888(&inbuff, &outbuff, nil, kvImageHighQualityResampling);
if(scaleError != kvImageNoError){
free(scaledVImageBuffer);
scaledVImageBuffer = NULL;
return nil;
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
CVPixelBufferRef scaledPixelBuffer = NULL;
// CVReturn status = CVPixelBufferCreateWithBytes(nil, (NSInteger)size.width, (NSInteger)size.height, pixelBufferType, scaledVImageBuffer, scaledImageRowBytes, releaseCallback, nil, nil, &scaledPixelBuffer);
NSDictionary *options =@{(NSString *)kCVPixelBufferCGImageCompatibilityKey:@YES,(NSString *)kCVPixelBufferCGBitmapContextCompatibilityKey:@YES,(NSString *)kCVPixelBufferMetalCompatibilityKey:@YES,(NSString *)kCVPixelBufferWidthKey :[NSNumber numberWithInt: size.width],(NSString *)kCVPixelBufferHeightKey: [NSNumber numberWithInt : size.height],(id)kCVPixelBufferBytesPerRowAlignmentKey:@(32)
};
CVReturn status = CVPixelBufferCreateWithBytes(kCFAllocatorDefault, size.width, size.height,pixelBufferType, scaledVImageBuffer, scaledImageRowBytes,releaseCallback,nil, (__bridge CFDictionaryRef)options, &scaledPixelBuffer);
options = NULL;
if (status != kCVReturnSuccess)
{
free(scaledVImageBuffer);
return nil;
}
return scaledPixelBuffer;
}
- 2.转化成输入数据
- (NSData *)rgbDataFromBuffer:(CVPixelBufferRef)pixelBuffer isModelQuantized:(BOOL)isQuantized
{
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
unsigned char* sourceData = (unsigned char*)(CVPixelBufferGetBaseAddress(pixelBuffer));
if (!sourceData) {
return nil;
}
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
size_t sourceRowBytes = (int)CVPixelBufferGetBytesPerRow(pixelBuffer);
int destinationChannelCount = 3;
size_t destinationBytesPerRow = destinationChannelCount * width;
vImage_Buffer inbuff = {sourceData, height, width, sourceRowBytes};
unsigned char *destinationData = malloc(height * destinationBytesPerRow);
if (destinationData == nil) {
return nil;
}
vImage_Buffer outbuff = {destinationData,height,width,destinationBytesPerRow};
if (CVPixelBufferGetPixelFormatType(pixelBuffer) == kCVPixelFormatType_32BGRA)
{
vImageConvert_BGRA8888toRGB888(&inbuff, &outbuff, kvImageNoFlags);
}
else if (CVPixelBufferGetPixelFormatType(pixelBuffer) == kCVPixelFormatType_32ARGB)
{
vImageConvert_ARGB8888toRGB888(&inbuff, &outbuff, kvImageNoFlags);
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
CVPixelBufferRelease(pixelBuffer);//记得释放资源
NSData *data = [[NSData alloc] initWithBytes:outbuff.data length:outbuff.rowBytes *height];
if (destinationData != NULL) {
free(destinationData);
destinationData = NULL;
}
if (isQuantized) {
return data;
}
Byte *bytesPtr = (Byte *)[data bytes];
//针对不是量化模型,需要转换成float类型的数据
NSMutableData *rgbData = [[NSMutableData alloc] initWithCapacity:0];
for (int i = 0; i < data.length; i++) {
Byte byte = (Byte)bytesPtr[I];
float bytf = (float)byte / 255.0;
[rgbData appendBytes:&bytf length:sizeof(float)];
}
return rgbData;
}
5.处理识别返回数据
- 识别结果返回的是四个数组,都需要分别处理,代码在视频流回调代理里
//输出坐标,按照top,left,bottom,right的占比
TFLTensor *outputTensor = [self.interpreter outputTensorAtIndex:0 error:&error];
//输出index
TFLTensor *outputClasses = [self.interpreter outputTensorAtIndex:1 error:nil];
//输出分数
TFLTensor *outputScores = [self.interpreter outputTensorAtIndex:2 error:nil];
//输出识别物体个数
TFLTensor *outputCount = [self.interpreter outputTensorAtIndex:3 error:nil];
//格式化输出的数据
NSArray<HFInference *> *inferences = [self formatTensorResultWith:[self transTFLTensorOutputData:outputTensor] indexs:[self transTFLTensorOutputData:outputClasses] scores:[self transTFLTensorOutputData:outputScores] count:[[self transTFLTensorOutputData:outputCount].firstObject integerValue] width:imageWidth height:imageHeight];
- (NSArray<HFInference *> *)formatTensorResultWith:(NSArray *)outputBoundingBox indexs:(NSArray *)indexs scores:(NSArray *)scores count:(NSInteger)count width:(CGFloat)width height:(CGFloat)height
{
NSMutableArray<HFInference *> *arry = [NSMutableArray arrayWithCapacity:count];
for (NSInteger i = 0; i < count; i++) {
CGFloat confidence = [scores[i] floatValue];
if (confidence < 0.5) {
continue;
}
NSInteger index = [indexs[i] integerValue] + 1;//官方模型需要+1;
CGRect rect = CGRectZero;
UIEdgeInsets inset;
[outputBoundingBox[i] getValue:&inset];
rect.origin.y = inset.top;
rect.origin.x = inset.left;
rect.size.height = inset.bottom - rect.origin.y;
rect.size.width = inset.right - rect.origin.x;
CGRect newRect = CGRectApplyAffineTransform(rect, CGAffineTransformMakeScale(width, height));
//如果是自定义并且图片识别有方向的话,就用下面的方法
// CGRect newRect = [self fixOriginSizeWithInset:inset videoOrientation:self.videoOrientation width:width height:height];
HFInference *inference = [HFInference new];
inference.confidence = confidence;
inference.index = index;
inference.boundingRect = newRect;
inference.className = [self loadLabels:@"labelmap"][index];
[arry addObject:inference];
}
return arry;
}
- (NSArray *)transTFLTensorOutputData:(TFLTensor *)outpuTensor
{
NSMutableArray * arry = [NSMutableArray array];
float output[40U];
[[outpuTensor dataWithError:nil] getBytes:output length:(sizeof(float) *40U)];
if ([outpuTensor.name isEqualToString:@"TFLite_Detection_PostProcess"]) {
for (NSInteger i = 0; i < 10U; i++) {
// top left bottom right
UIEdgeInsets inset = UIEdgeInsetsMake(output[4* i + 0], output[4* i + 1], output[4* i + 2], output[4* i + 3]);
[arry addObject:[NSValue valueWithUIEdgeInsets:inset]];
}
}
else if ([outpuTensor.name isEqualToString:@"TFLite_Detection_PostProcess:1"] ||[outpuTensor.name isEqualToString:@"TFLite_Detection_PostProcess:2"])
{
for (NSInteger i = 0; i < 10U; i++) {
[arry addObject:[NSNumber numberWithFloat:output[i]]];
}
}
else if ([outpuTensor.name isEqualToString:@"TFLite_Detection_PostProcess:3"])
{
// NSNumber *count = output[0] ? [NSNumber numberWithFloat:output[0]] : [NSNumber numberWithFloat:0.0];
NSNumber *count = @10;
[arry addObject:count];
}
return arry;
}
6.渲染和绘制识别框
根据处理好返回的数据,我们需要转化成绘制的数据
- (void)drawOverLayWithInferences:(NSArray<HFInference *> *)inferences width:(CGFloat)width height:(CGFloat)height
{
[self.overlayView.overlays removeAllObjects];
[self.overlayView setNeedsDisplay];
if (inferences.count == 0) {
return;
}
NSMutableArray<Overlayer *> * overlays = @[].mutableCopy;
for (HFInference *inference in inferences) {
CGRect convertedRect = CGRectApplyAffineTransform(inference.boundingRect , CGAffineTransformMakeScale(self.overlayView.bounds.size.width/width, self.overlayView.bounds.size.height / height));
if (convertedRect.origin.x < 0) {
convertedRect.origin.x = 5;
}
if (convertedRect.origin.y <0) {
convertedRect.origin.y = 5;
}
if (CGRectGetMaxY(convertedRect) > CGRectGetMaxY(self.overlayView.bounds)) {
convertedRect.size.height = CGRectGetMaxY(self.overlayView.bounds) - convertedRect.origin.y - 5;
}
if (CGRectGetMaxX(convertedRect) > CGRectGetMaxX(self.overlayView.bounds)) {
convertedRect.size.width = CGRectGetMaxX(self.overlayView.bounds) - convertedRect.origin.x - 5;
}
Overlayer *layer = [Overlayer new];
layer.borderRect = convertedRect;
layer.color = UIColor.redColor;
layer.name = [NSString stringWithFormat:@"%@ %.2f%%",inference.className,inference.confidence *100];
NSDictionary *dic = @{NSFontAttributeName:[UIFont systemFontOfSize:14]};
layer.nameStringSize = [layer.name boundingRectWithSize:CGSizeMake(MAXFLOAT, 20) options:(NSStringDrawingUsesLineFragmentOrigin) attributes:dic context:nil].size;
layer.font = [UIFont systemFontOfSize:14];
layer.nameDirection = self.videoOrientation;
[overlays addObject:layer];
}
self.overlayView.overlays = overlays;
[self.overlayView setNeedsDisplay];
}
总结
其实整个库使用起来还是比较简单的(特别指训练全面并且成熟的模型,因为任何方向和角度都可以识别出来,会超级省事,而我们自己的模型只支持横屏的识别,处理起来超级烦!),唯一要注意的点就是内存问题,float点精度转换问题,还有就是坐标变换映射问题!当然什么都不说了(说多了都是泪,这只是我抽出来的简单demo,还有很多更苦逼的要做)放上demo的 传送门