本文由 沈庆阳 所有,转载请与作者取得联系!
以人认识世界为例,当我们的眼睛捕捉到现实世界的图像的时候,图像会通过视觉神经传输到我们大脑的视觉中枢,形成视觉。数码相机也是一样的道理,现实世界反射的光进入到相机的感光芯片,然后感光芯片将信号传输到相机的处理芯片中形成图像。
然而仅仅形成了视觉而理解不了视觉,这样就和植物没什么两样。图像经过我们大脑的处理,让我们认识了世界上的各种花花草草,认识了我们的宠物猫和各种物体。通过长时间的生活,人类的大脑中的神经网络形成了各种特殊的回路,使我们可以认识世界上的各种物体。
那么,如何才能让人工智能程序认识世界呢?
什么是目标检测(Object Detection)
目标检测或对象检测。目标检测是计算机视觉的一个分支,通常指的是在一幅图像中将物体框出来,进而识别该物体是什么。物体检测是分类和定位的综合。
在上述图像中,我们可以发现通过物体检测,人工智能的程序框出了图中的人和风筝,并且列出了这些物体的类别名称和可信度。
通过目标检测,我们可以让AI认识世界。可以实现照片的快速分类,可以使自动驾驶汽车认识路上遇到的人、车和道路,可以识别步入银行的重点客户等。目标检测为人工智能的各种应用提供了无限的可能。
TensorFlow的Object Detection API
TensorFlow的物体检测API提供了精确地机器学习模型,该模型能够定位和识别单张图片中的多种物体。物体的分类和定位在计算机视觉中一直都是一个难题,更是对机器学习的一个挑战。然而,Google的人工智能框架TensorFlow可以做到快速的实时的物体分类与定位。
在上一节中,我们在TensorFlow的Github页面下载了相关的模型(Models)。
https://github.com/tensorflow/models/
TensorFlow的Models仓库中有各种训练好的模型。
其中,Official Models为官方提供的模型,官方模型是对TensorFlow高层API的实现,由官方团队对其进行维护,拥有更好的稳定性。
其次,Research Models为各种人工智能学者们发表的论文中提到的各种模型,由研究者对其进行发布和维护。
而我们所需要的Object Detection则在Research Models中。
如果你在继续进行下去的时候提示缺少某些软件,请使用apt-get install 命令或pip命令进行安装
进入Object Detection的目录,并在当前目录中Jupyter Notebook。
cd <Models Path>/research/object_detection
jupyter notebook
此时会打开一个你当前目录下面的Web页面,在当前工作目录下找到object_detection_tutorial.ipynb,点击打开,会出现如下的界面。
关于Jupyter Notebook:
Jupyter Notebook(前身为 IPython notebook)是一个交互式笔记本,支持运行 40 多种编程语言。
Jupyter Notebook可以编写漂亮的交互式文档,在文档中,我们可以修改代码,运行代码,观察代码输出的结果等。
点击最上方的Cell菜单,点击Run All,并静心等待结果。在运行过程中,我们把页面拉到最下方,不一会便会出现物体检测程序执行的结果。
很好,该程序成功地认出了图片中的两只狗,并且框出了狗所在的位置和标注了准确度。但是,我们并不想仅仅识别教程图片中的狗狗,我们的目标是通过摄像头让人工智能程序认识我们身边的事物。
修改程序
回到我们的Jupyter Notebook中,点击左上角的菜单File->Download as->Python(.py)将笔记本到处为python的代码。
修改厚的的Python代码如下
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
import cv2
cap=cv2.VideoCapture(0)
if tf.__version__ < '1.4.0':
raise ImportError('Please upgrade your tensorflow installation to v1.4.* or later!')
# ## Env setup
# In[2]:
# This is needed to display the images.
#get_ipython().magic(u'matplotlib inline')
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
# ## Object detection imports
# Here are the imports from the object detection module.
# In[3]:
from utils import label_map_util
from utils import visualization_utils as vis_util
# # Model preparation
# ## Variables
#
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file.
#
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.
# In[4]:
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')
NUM_CLASSES = 90
# ## Download Model
# In[5]:
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
# ## Load a (frozen) Tensorflow model into memory.
# In[6]:
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# In[7]:
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# ## Helper code
# In[8]:
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
# # Detection
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]
# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
# In[10]:
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
while True:
ret, image_np=cap.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('object detection',cv2.resize(image_np,(800,600)))
if cv2.waitKey(25) & 0xFF ==ord('q'):
cv2.destroyAllWindows()
break
在上述代码中,我们加入了python-opencv来从电脑的摄像头获取图像,然后传入原程序中进行处理。并在最后使用OpenCV的imshow函数来输出。整个程序执行的效果如下:
控制台输出:
root@jack-QTJ5:~/Dev/tensorflow/models/research/object_detection# python object_detection_CONVERTED.py
2018-01-12 16:09:56.439397: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-12 16:09:56.672274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-12 16:09:56.672731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 860M major: 5 minor: 0 memoryClockRate(GHz): 1.0195
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.74GiB
2018-01-12 16:09:56.672748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0, compute capability: 5.0)
此时我们的程序已经开始执行了,并且通过摄像头获取到了图像,实时标出了常用物体模型内的物体名称和位置,其运行速度达到了实时的标准。在TensorFlow的Object Detection API的基础上,我们可以给的智能小车、智能家居和无人机等加上各种不可思议的功能。
在下一节中,我们将会开始学习怎样训练自己的模型,训练我们想要识别的物体的模型。
觉得写的不错的朋友可以点一个 喜欢♥ ~
谢谢你的支持!