今天写点关于车道检测的。
车道检测作为最基本的车载摄像头的功能,他是非常非常重要的。理想状态时,算法很简单,不过根据路况,有时候也会变得很复杂。
本次不讨论复杂的情况,单纯从最简单的情况入手。
其实我觉得udacity的项目安排顺序很好。因为我在学习这个之前从来没有学过opencv之类的东西的。 我是通过这个项目了解到Opencv到底是怎么用的。再说一些有趣的东西就是,这个项目都是通过python做的,而我在上这个课之前,都不知道python是什么东西。所以刚开始学的时候,无比的痛苦。python语法,append都不会用。list,tuple,class 统统不会用。可想而知,多么的艰难。。
有一些和我一样刚开始用python语言的人会有一种疑问, 干嘛非得用jupyter。这里我想说,因为jupyter可以一段一段查看代码,这个功能很好(jupyter 的很多特点中的一个)。当我写了一堆代码后,发现代码中有错误,但是我又不知道这个错误的源头在哪。生气。那么只有利用jupyter从头开始,一点一点,一个个cell分开执行。直到找到错误,直到修改好所有的errors。
有点跑题了。
言归正传,开始简单的车道识别代码。
1 代码传送门
需要提醒一下的就是,整个udacity第一学习的课都是用python做的。如果没有python基础相对来说还是比较难做的。而且理解代码也比较困难。所以还是推荐学点python再开始。
https://github.com/Fred159/My-Udacity-Project1-Lane_Extraction/blob/master/Ming-Project1-final.ipynb
2 代码环境
我是通过下载anaconda的jupyter notebook来做的。具体步骤请参考网上的安装资料。(之后可能会更新一下如何安装tensorflow和opencv2,安装过程其实我认为是比较痛苦的)
3 project 目的
其实project的目的很单纯,就是给算法一段视频输入,然后由构建的算法通过计算,最后输出车道两边的线(当然是通过x,y坐标输出)的图片或者视频。
4 涉及到的知识点
opencv库的用法(参考官网)
cv2.inRange() #for color selectioncv2.fillPoly() #for regions selectioncv2.line()#to draw lines on an image given endpointscv2.addWeighted()#to coadd / overlay two imagescv2.cvtColor()to grayscale or change colorcv2.imwrite()to output images to filecv2.bitwise_and()#to apply a mask to an image
RGB的基本理解
图像中的xy坐标理解
灰度图
ROI(region of interest)
边缘检测
hough transfrom(检测直线算法)
理解好各个operator的意义
理解x<threshold && x> threshold 这种代码的意义
虽然Udacity给出的jupyter notebook的template也涉及到HTML库,但是我们只需要知道简单的用法就可以了,不用太过在意
5 代码解析
5.1 import 库
首先最开始的是,import 库。 库的import跟c语言的include是一个意思。就是把我们所需要的所有的函数包拿来备用。这里我们import了一下库。matplotlib,numpy其实都很好安装及import,但是吧这个cv2真得是特别难装。。。。尤其是以后基于tensorlfow GPU版本安装cv2更费劲。。又跑题了。
#importing some useful packages
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
%matplotlib inline
这里%matplotlib inline是一种jupyter notebook的特别的用法。叫magic mthods。这是干啥的呢? matplot 他本身默认是不会在jupyter notebook代码cell之间打开plot的。所以%matplotlib inline 就是命令matplot在cell之间打开plot。
import 。。。as 。。 就是把特定函数包单独按照我们想要的简称命名的。
Python中的代码包是按照a.b.c这种方式来的。也就说a的函数包里面包含b的函数包。b的函数包里包含c的函数包。嗯。就是所有语言中过的class的那种结构。
5.2 读取图片
为什么项目目的是输入视频,但是我们读取的却是图片呢? 这里需要解释一下,所有的视频都是连续的图片。FPS是指 frame per second ,也就是说一分钟播放几个图片。如果FPS30的话,就是一分钟播放30个图片。所以,处理图片和处理视频基本上是一个事情。其实就是我们的算法1秒钟处理30个图片,然后通过其他代码把这个再挨个播放或者合成成一个视频就好了。
这里用到了mpimg。值得注意的是,mpimg可以读取图片,cv2也可以的。但是他们读取图片之后的数据存储序列是不一样的。mpimg读取的是,RGB顺序的数据。而cv2读取的是BGR顺序的数据。数据本身并没有什么变化,除了顺序。RGB指的是 red,green,blue。 那么BGR指的是Blue, Green,Red。一般这三种数据类型称为三个channel。为啥是RGB这三种颜色? 因为他们是三原色。他们三个通过不同的组合,得到所有的颜色。
#reading in an image
image = mpimg.imread('test_images/solidWhiteRight.jpg')
#printing out some stats and plotting
print('This image is:', type(image), 'with dimensions:', image.shape)
plt.imshow(image) #call as plt.imshow(gray, cmap='gray') to show a grayscaled image
读取的图片
5.4 构建helper functions
什么是helper functions?
其实就是一堆函数。用处是简化代码。
内容如下
灰度图转化。把彩色的转换成黑白的。这里需要注意的是,黑白并不是0或者1,而是0~255.因为黑白的还有灰色等颜色。数字越大,图像越白。也就是说,0就是完全黑的,125是灰的,255就是完全白的。所以他叫灰度图,而不是黑白图。关键词:彩色转换为黑白
canny operator 用于边缘检测 。关键词:边缘检测
高斯blur(给pixel添加noise的部分)。关键词:添加noise
region of interest 简称ROI ,顾名思义就是我们只考虑及只计算我们关心范围内的东西。关键词:关心领域
draw lines 就是给定两个点的坐标(x1,x2,y1,y2)在图片上画出一条线。关键词:画线
weighted img 的作用就是为了可视化。我们最终算出来的车道线,所以要把标记好的车道线覆盖到原理的图片上。关键词:覆盖
import math
def grayscale(img):
"""Applies the Grayscale transform
This will return an image with only one color channel
but NOTE: to see the returned image as grayscale
you should call plt.imshow(gray, cmap='gray')"""
return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# Or use BGR2GRAY if you read an image with cv2.imread()
# return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
def canny(img, low_threshold, high_threshold):
"""Applies the Canny transform"""
return cv2.Canny(img, low_threshold, high_threshold)
def gaussian_blur(img, kernel_size):
"""Applies a Gaussian Noise kernel"""
return cv2.GaussianBlur(img, (kernel_size, kernel_size), 0)
def region_of_interest(img, vertices):
"""
Applies an image mask.
Only keeps the region of the image defined by the polygon
formed from `vertices`. The rest of the image is set to black.
"""
#defining a blank mask to start with
mask = np.zeros_like(img)
#defining a 3 channel or 1 channel color to fill the mask with depending on the input image
if len(img.shape) > 2:
channel_count = img.shape[2] # i.e. 3 or 4 depending on your image
ignore_mask_color = (255,) * channel_count
else:
ignore_mask_color = 255
#filling pixels inside the polygon defined by "vertices" with the fill color
cv2.fillPoly(mask, vertices, ignore_mask_color)
#returning the image only where mask pixels are nonzero
masked_image = cv2.bitwise_and(img, mask)
return masked_image
def draw_lines(img, lines, color=[255, 0, 0], thickness=10):
"""
NOTE: this is the function you might want to use as a starting point once you want to
average/extrapolate the line segments you detect to map out the full
extent of the lane (going from the result shown in raw-lines-example.mp4
to that shown in P1_example.mp4).
Think about things like separating line segments by their
slope ((y2-y1)/(x2-x1)) to decide which segments are part of the left
line vs. the right line. Then, you can average the position of each of
the lines and extrapolate to the top and bottom of the lane.
This function draws `lines` with `color` and `thickness`.
Lines are drawn on the image inplace (mutates the image).
If you want to make the lines semi-transparent, think about combining
this function with the weighted_img() function below
"""
for line in lines:
for x1,y1,x2,y2 in line:
cv2.line(img, (x1, y1), (x2, y2), color, thickness)
def hough_lines(img, rho, theta, threshold, min_line_len, max_line_gap):
"""
`img` should be the output of a Canny transform.
Returns an image with hough lines drawn.
"""
lines = cv2.HoughLinesP(img, rho, theta, threshold, np.array([]), minLineLength=min_line_len, maxLineGap=max_line_gap)
line_img = np.zeros((*img.shape, 3), dtype=np.uint8)
draw_lines(line_img, lines)
return line_img
# Python 3 has support for cool math symbols.
def weighted_img(img, initial_img, α=0.8, β=1., λ=0.):
"""
`img` is the output of the hough_lines(), An image with lines drawn on it.
Should be a blank image (all black) with lines drawn on it.
`initial_img` should be the image before any processing.
The result image is computed as follows:
initial_img * α + img * β + λ
NOTE: initial_img and img must be the same shape!
"""
return cv2.addWeighted(initial_img, α, img, β, λ)
#Removing noise slopes from the averaging performed below in lane_lines
def remove_noise(slopes, m = 2):
mean_value = np.mean(slopes)
stand_deviation = np.std(slopes)
for slope in slopes:
if abs(slope - mean_value) > (m * stand_deviation):
slopes.remove(slope)
return slopes
5.5 主程序
下面就是主程序process image(输入)。
这个函数里面所有代码都是要在1/FPS的时间里完成的。说实话,因为这个时候是第一次入门cv,所以我想原来计算机可以在这么短的时间里处理这么多的事情。只有在做更复杂的cv代码的的时候,我才知道,原来计算机可以处理更多。。。不过计算机视觉的处理时间确实是计算机视觉应用在无人驾驶上的障碍。
def process_image(img):
#find the size of image
xsize,ysize = [image.shape[1],image.shape[0]]
#copy the image to modify
origin_image = np.copy(img)
#make a gray image
gray = grayscale(img)
#Smooth with guassian blur
kernel_size = 9
blur_gray = gaussian_blur(gray, kernel_size)
#Use canny operator to extract edeges
low_threshold = 90
high_threshold = 180
edges = canny(blur_gray, 90,180)
#define the region of the interest
imshape = image.shape
vertices = np.array([[(0,imshape[0]),(470, 320), (550, 320), (imshape[1],imshape[0])]], dtype=np.int32)
masked_edges = region_of_interest(edges, vertices)
# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 6 # distance resolution in pixels of the Hough grid
theta = np.pi/180 # angular resolution in radians of the Hough grid
threshold = 50 # minimum number of votes (intersections in Hough grid cell)
min_line_len = 25 #minimum number of pixels making up a line
max_line_gap = 25 # maximum gap in pixels between connectable line segments
line_image = np.copy(img)*0 # creating a blank to draw lines on
# Run Hough on edge detected image
# Output "lines" is an array containing endpoints [x1,y1,x2,y2] of detected line segments
lines = cv2.HoughLinesP(masked_edges, rho, theta, threshold, np.array([]), min_line_len, max_line_gap)
#Make lists of the lines and slopes for averaging
left_lines = []
left_slopes = []
right_slopes = []
right_lines = []
for line in lines:
for x1,y1,x2,y2 in line:
slope = (y2 - y1) / (x2-x1)
if slope < 0:
left_lines.append(line)
left_slopes.append(slope)
else:
right_lines.append(line)
right_slopes.append(slope)
#Average line positions,zip function can generate the all the column elements in a list . * stands for unpacked lists
mean_left_pos = [sum(column)/len(column) for column in zip(*left_lines)]
mean_right_pos = [sum(column)/len(column) for column in zip(*right_lines)]
#Remove slope outliers, and take the average
mean_left_slope = np.mean(remove_noise(left_slopes))
mean_right_slope = np.mean(remove_noise(right_slopes))
#Extrapolate to our mask boundaries - up to 325, down to 539
#Exrapoplate the left line, right line to boundary up to y_top = 320, down to y_bottom = 540
mean_left_line = []
mean_right_line = []
for x1,y1,x2,y2 in mean_left_pos:
x = int(np.mean([x1, x2])) #Midpoint x
y = int(np.mean([y1, y2])) #Midpoint y
slope = mean_left_slope
#base on y = mx + b calculate the b = y-mx
b = y -(slope * x) #Solving y=mx+b for b
mean_left_line = [int((320-b) / slope), 320, int((540-b)/slope), 540]
for x1,y1,x2,y2 in mean_right_pos:
x = int(np.mean([x1, x2]))
y = int(np.mean([y1, y2]))
slope = mean_right_slope
b = y - (slope * x)
mean_right_line = [int((320-b)/slope), 320, int((540 - b)/slope), 540]
#The final lines of the lane
lines = [[mean_left_line], [mean_right_line]]
#Draw the lines to the line_image
draw_lines(line_image, lines)
# Transparent the processed lines image to original
weighted_image = weighted_img(line_image, img)
#return the weighted_image to the fucntion process_image
return weighted_image
乱七八糟的,看不懂。是的。所以需要拆分重要的部分进行解释。(没涉及的内容多看几遍代码就可以知道了。如果还是不懂可以留言给我)
下面一行代码是为了获取图片的x,y方向的个数。这样我们才能通过指定位置来定位那个像素点并编辑那个像素点
xsize,ysize = [image.shape[1],image.shape[0]]
下面这行代码利用我们在helper function里面定义的函数,将彩色图变成灰度图。
#make a gray image
gray = grayscale(img)
也是利用helper function的函数,给像素点添加噪声。其实可以想象一下,添加噪声的图片会变得怎么样? (变得模糊)这里kernal_size就是人为设定的值。我是凭感觉设定的。
#Smooth with guassian blur
kernel_size = 9
blur_gray = gaussian_blur(gray, kernel_size)
下面是利用canny operator提取边缘的。为什么要检测边缘? 因为理解物体的边缘是我们识别物体的最基本的方法。计算机视觉也是一样的。canny 其实就是利用特定的operator,也就是一种3*3的矩阵,通过卷积对图片上的每一个点进行计算。计算后值如果在我们定义的low_threshold和high_threshold之间, 那么我们就认为他是有效的边缘点。可以用来识别物体。所以low_threshold和high_threshold也是认为调的。调的效果好,那么就是好的。
#Use canny operator to extract edeges
low_threshold = 90
high_threshold = 180
edges = canny(blur_gray, 90,180)
提取的edge
整个图片的edge提取
下面代码是用来定义ROI的。对于无人车来说,他不需要关注整个摄像机拍到的所有的东西。无人车只需要关心自己前面的道路及自己周围的目标就可以了。所以通过设定ROI区域,来减少计算量。设定ROI是通过给定vertices(就是顶点)的坐标,让算法排除一切在ROI区域之外的像素点。
换成大白话,就是我只关心我关心的,别人爱咋咋的。
#define the region of the interest
imshape = image.shape
vertices = np.array([[(0,imshape[0]),(470, 320), (550, 320), (imshape[1],imshape[0])]], dtype=np.int32)
masked_edges = region_of_interest(edges, vertices)
下面是hough 变换的代码。边缘点,ROI已经定义好了,那么我们就要在图片上找找车道了。车道是直的,人类一眼就能看出来。但是想没想过人类是如何判断直的呢?计算机视觉的算法又应该怎么落实呢?人类是通过透视和车道大部分是直的这种假设来判断的。那对与计算机视觉也是一样。计算机需要找到图片里有一定规律的点,然后把他们都标记出来。how? 所有有类似的斜率且像素间的距离不大的两点,认为其是直线。hough transfrom就是做这个事情的。 xy坐标系里的直线在hough space里,可以用一个点来表示。如下面的图片。其实就是把y=mx+b用斜率和截距来表示。那么在hough space里面,聚集在一定范围内的点们就是一条直线。这个一定范围就是用rho来表示,theta就是指hough space里点构成的直线的斜率。有点说不明白,建议在网上找个动图或者看看这个链接第三章 霍夫变换(Hough Transform)(作者看到了如果觉得不妥可以告诉我)
# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 6 # distance resolution in pixels of the Hough grid
theta = np.pi/180 # angular resolution in radians of the Hough grid
threshold = 50 # minimum number of votes (intersections in Hough grid cell)
min_line_len = 25 #minimum number of pixels making up a line
max_line_gap = 25 # maximum gap in pixels between connectable line segments
line_image = np.copy(img)*0 # creating a blank to draw lines on
# Run Hough on edge detected image
# Output "lines" is an array containing endpoints [x1,y1,x2,y2] of detected line segments
lines = cv2.HoughLinesP(masked_edges, rho, theta, threshold, np.array([]), min_line_len, max_line_gap)
左边的点对应右边哪个? 答案是A
对应的是哪个? 答案是C
直线检测结果
剩下的就简单了。因为有ROI我们只会看到车道里面的直线。那么有很多小的直线,但是都是不连续怎么办? 我们先通过挨个定义左右两边的直线们来计算出连续的直线。
先通过各个直线(通过点表示)的斜率分成两个部分。左右两边。
#Make lists of the lines and slopes for averaging
left_lines = []
left_slopes = []
right_slopes = []
right_lines = []
for line in lines:
for x1,y1,x2,y2 in line:
slope = (y2 - y1) / (x2-x1)
if slope < 0:
left_lines.append(line)
left_slopes.append(slope)
else:
right_lines.append(line)
right_slopes.append(slope)
最终左右两边的斜率通过取平均确定下来,然后通过这个确定的斜率,在图像的坐标轴系里求出相应的直线。(斜率确定下来了,ROI也给出了x,y的最大值,所以可以得到相应的xy顶点)
#Average line positions,zip function can generate the all the column elements in a list . * stands for unpacked lists
mean_left_pos = [sum(column)/len(column) for column in zip(*left_lines)]
mean_right_pos = [sum(column)/len(column) for column in zip(*right_lines)]
#Remove slope outliers, and take the average
mean_left_slope = np.mean(remove_noise(left_slopes))
mean_right_slope = np.mean(remove_noise(right_slopes))
#Extrapolate to our mask boundaries - up to 325, down to 539
#Exrapoplate the left line, right line to boundary up to y_top = 320, down to y_bottom = 540
mean_left_line = []
mean_right_line = []
for x1,y1,x2,y2 in mean_left_pos:
x = int(np.mean([x1, x2])) #Midpoint x
y = int(np.mean([y1, y2])) #Midpoint y
slope = mean_left_slope
#base on y = mx + b calculate the b = y-mx
b = y -(slope * x) #Solving y=mx+b for b
mean_left_line = [int((320-b) / slope), 320, int((540-b)/slope), 540]
for x1,y1,x2,y2 in mean_right_pos:
x = int(np.mean([x1, x2]))
y = int(np.mean([y1, y2]))
slope = mean_right_slope
b = y - (slope * x)
mean_right_line = [int((320-b)/slope), 320, int((540 - b)/slope), 540]
#The final lines of the lane
lines = [[mean_left_line], [mean_right_line]]
#Draw the lines to the line_image
draw_lines(line_image, lines)
# Transparent the processed lines image to original
weighted_image = weighted_img(line_image, img)
#return the weighted_image to the fucntion process_image
得到直线的坐标后,通过draw_lines的函数,我们就可以在原图上覆盖车道识别结果了。
这样所有的代码就结束了。
6 结果
最终就可以得到如下的结果。(图片)
通过其他编码器,就可以把所有的处理过的图片合成最终得到有车道标记后的视频文件。
7 总结
本次项目通过边缘检测,霍普变换实现简单的车道识别。但是实际应用中,根据环境不同,车道识别的识别率也会很不同。比如颜色啊,标记模糊啊等等。在之后的项目中,我们会利用更好的算法得到更加稳定的输出。
相关内容也会在未来更新的。
谢谢支持,各位看官的关注就是持续更新的动力~
看完就别吝啬点赞加关注啦~
同时也希望朋友往咱们专栏投稿,让我们在无人车算法的造诣上不停的成长~!