本例是上例的一个扩展,但是需要先下载一个YouTube的视频。
其他配置环境请参考上节,https://www.jianshu.com/p/3ac3f54636f8
下载yt-dlp,用于下载Youtube的视频文件,
yt-dlp下载地址
https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/15_tensorflow_object_detection_api_in_video/yt-dlp?ref_type=heads
写一个download.sh脚本用于简化下载步骤,download.sh脚本的代码如下,
#!/bin/bash
if [[ "$#" -lt 1 ]]; then
echo "Usage: ./download.sh {You tube file link}"
exit 1
fi
./yt-dlp "${1}" --proxy http://10.224.0.110:3128 --yes-playlist -f best
使用download.sh下载detect所需要的猫的视频
chmod a+x ./download.sh
./download.sh https://www.youtube.com/watch?v=IzluNxh-8_o
完事儿以后将下载的视频Rename成cat.mp4。
注意以上步骤均需要特殊网络才能下载,不要问为什么。
还有download.sh需要在linux环境运行,我觉得直接在tensorflow的docker里面执行就很好。
Detect object in video的代码如下,
完整的notebook地址如下,
https://gitlab.com/zhuge20100104/cpp_practice/-/blob/master/simple_learn/deep_learning/15_tensorflow_object_detection_api_in_video/15.%20Tensorflow%20Objection%20API%20in%20Video.ipynb?ref_type=heads
# imports
import os
os.environ['KERAS_BACKEND'] = 'jax'
import tensorflow as tf
from tensorflow import data as tf_data
import tensorflow_datasets as tfds
import tensorflow.keras
import keras_cv
import keras
import numpy as np
from keras_cv import bounding_box
import os
from keras_cv import visualization
import tqdm
# 3. env setup
%matplotlib inline
# 详细细节可参考: https://keras.io/guides/keras_cv/object_detection_keras_cv/
# Let's get started by constructing a YOLOV8Detector pretrained on the pascalvoc dataset.
pretrained_model = keras_cv.models.YOLOV8Detector.from_preset(
"yolo_v8_m_pascalvoc", bounding_box_format="xywh"
)
# Resize the image to the model compat input size
inference_resizing = keras_cv.layers.Resizing(
640, 640, pad_to_aspect_ratio=True, bounding_box_format='xywh'
)
# keras_cv.visualization.plot_bounding_box_gallery() supports a class_mapping parameter to
# highlight what class each box was assigned to. Let's assemble a class mapping now.
class_ids = [
"Aeroplane",
"Bicycle",
"Bird",
"Boat",
"Bottle",
"Bus",
"Car",
"Cat",
"Chair",
"Cow",
"Dining Table",
"Dog",
"Horse",
"Motorbike",
"Person",
"Potted Plant",
"Sheep",
"Sofa",
"Train",
"Tvmonitor",
"Total",
]
class_mapping = dict(zip(range(len(class_ids)), class_ids))
import imageio
from datetime import datetime
input_video = 'cats'
video_reader = imageio.get_reader('{}.mp4'.format(input_video))
video_writer = imageio.get_writer('{}_annotated.mp4'.format(input_video), fps=10)
t0 = datetime.now()
n_frames = 0
for frame in video_reader:
if n_frames > 10000:
break
n_frames += 1
# print(frame.shape)
# This can be used as our inference preprocessing pipeline:
image_batch = inference_resizing([frame])
y_pred = pretrained_model.predict(image_batch)
# 下面这个图就可以save了
image_with_boxes = visualization.draw_bounding_boxes(image_batch,
bounding_boxes=y_pred,
color = (0, 255, 0),
bounding_box_format="xywh",
class_mapping=class_mapping, )
image_with_boxes = image_with_boxes.reshape(640, 640, 3)
video_writer.append_data(image_with_boxes)
fps = n_frames/(datetime.now() - t0).total_seconds()
print('Frames processed: {}, speed: {} fps'.format(n_frames, fps))
# Clean up
video_writer.close()
video_reader.close()
程序输出的Video效果如下,