吴恩达Deep Learning课程Autonomous driving - Car detection翻译

2 - YOLO

2.1 Model Details 模型细节

First things to know:

  • The input is a batch of images of shape (m, 608, 608, 3)
  • The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (pc,bx,by,bh,bw,c)(pc,bx,by,bh,bw,c) as explained above. If you expand c into an 80-dimensional vector, each bounding box is then represented by 85 numbers.

We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

  • 输入是一批数据(m,608,608,3)

m为样本数量,图像分辨率为608x608x3 , 3是通道数量,代表RGB

  • 输出是一列带有分类标志的向量。每个边界框向量由6个元素组成,如果你把参数C扩展成80维的向量,那么边界框向量就由85个数字元素组成。

6个元素分别是辨识对象的概率pc,对象中心点的横、纵坐标(bx、by),对象边界框的高、宽(bh、bw),还有类别代码(c)。

我们使用5个目标框,所以YOLO结构最终输出(m, 19, 19, 5, 85)。


Now, for each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.


Figure 4 : Find the class detected by each box

Here's one way to visualize what YOLO is predicting on an image:

简单概括,用可能性pc乘以80个对象的标识,得到每个对象的分数score,即为算法认为此处是该对象的可能性。
其中用(bx,by,bh,bw)定位,用c的数值标识对象类型。

  • For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
  • Color that grid cell according to what object that grid cell considers the most likely.

Doing this results in this picture:


Figure 5 : Each of the 19x19 grid cells colored according to which class has the largest predicted probability in that cell.

Note that this visualization isn't a core part of the YOLO algorithm itself for making predictions; it's just a nice way of visualizing an intermediate result of the algorithm.
Another way to visualize YOLO's output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:


Figure 6 : Each cell gives you 5 boxes. In total, the model predicts: 19x19x5 = 1805 boxes just by looking once at the image (one forward pass through the network)! Different colors denote different classes.

In the figure above, we plotted only boxes that the model had assigned a high probability to, but this is still too many boxes. You'd like to filter the algorithm's output down to a much smaller number of detected objects. To do so, you'll use non-max suppression. Specifically, you'll carry out these steps:
  • Get rid of boxes with a low score (meaning, the box is not very confident about detecting a class)
  • Select only one box when several boxes overlap with each other and detect the same object.

简单翻译一下,有两种办法标记出anchor box,

  1. 第一种,对于19x19的每一个网格中的5个可能的对象,把得分最高的那个用颜色标记出来。
  2. 第二种,把检测到的每个对象边界框都画出来

对于第二种,我们虽然只标记了可能性较大的对象,但是仍然还有很多框,所以我们继续做以下工作:

  • 放弃那些分数低的标记框
  • 当多个框重叠标记同一个对象时,只选择一个

2.2 - Filtering with a threshold on class scores 依据scores参数过滤

You are going to apply a first filter by thresholding. You would like to get rid of any box for which the class "score" is less than a chosen threshold.

The model gives you a total of 19x19x5x85 numbers, with each box described by 85 numbers. It'll be convenient to rearrange the (19,19,5,85) (or (19,19,425)) dimensional tensor into the following variables:

  • box_confidence: tensor of shape (19 x 19, 5, 1) containing pc (confidence probability that there's some object) for each of the 5 boxes predicted in each of the 19x19 cells.
  • boxes: tensor of shape (19 x 19, 5, 4) containing (b_x, b_y, b_h, b_w) for each of the 5 boxes per cell.
  • box_class_probs: tensor of shape (19 x 19, 5, 80) containing the detection probabilities (c_1, c_2, ... c_{80}) for each of the 80 classes for each of the 5 boxes per cell.

Exercise: Implement yolo_filter_boxes().

  1. Compute box scores by doing the elementwise product as described in Figure 4. The following code may help you choose the right operator:
a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
  1. For each box, find:
    • the index of the class with the maximum box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
    • the corresponding box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
  2. Create a mask by using a threshold. As a reminder: ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) returns: [False, True, False, False, True]. The mask should be True for the boxes you want to keep.
  3. Use TensorFlow to apply the mask to box_class_scores, boxes and box_classes to filter out the boxes we don't want. You should be left with just the subset of boxes you want to keep. (Hint)

Reminder: to call a Keras function, you should use K.function(...).
翻译一下:

  • box_confidence :即为19*19的每个区域中,生成5个anchorbox,每个anchorbox生成一个pc
  • boxes :即为19*19的每个区域中,生成5个anchorbox,每个anchorbox的边界框
  • box_class_probs :19195个anchorbox中,每个box的边界参数,前面已经解释了,4个参数各代表什么意义

实现yolo_filter_boxes():

  1. 实现图片4中的的运算,用乘法就行,box_confidence * box_class_probs , 其中box_confidence不足的维度将自动扩充,所以运算结果是 (19x19x5x80)的向量。
  2. 对于每个anchor box(19x19x5个),找出:
    • 盒子中最大的分数score的类别序号(即80个中找到最大的那个)
    • 该类别对应的分数socre
  3. 创造一个掩码,这个掩码将你想保留的anchorbox设为true
  4. 使用TensorFlow对box_class_probs,boxes和box_classes应用掩码,将我们不想要的boxes筛选掉,你应当留下你想留下的boxes子集。
    注意,想使用keras的函数,需要用k.function(...)
# GRADED FUNCTION: yolo_filter_boxes

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """Filters YOLO boxes by thresholding on object and class confidence.
    
    Arguments:
    box_confidence -- tensor of shape (19, 19, 5, 1)
    boxes -- tensor of shape (19, 19, 5, 4)
    box_class_probs -- tensor of shape (19, 19, 5, 80)
    threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    
    Returns:
    scores -- tensor of shape (None,), containing the class probability score for selected boxes
    boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
    
    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """
    
    # Step 1: Compute box scores
    ### START CODE HERE ### (≈ 1 line) 算出得分可能性
    box_scores = box_confidence * box_class_probs
    ### END CODE HERE ###
    
    # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
    ### START CODE HERE ### (≈ 2 lines)
    #获得最高分数的序号 19x19x5x1
    box_classes = K.argmax(box_scores, axis=-1)
    #获得最高分数的分数 19x19x5x1
    box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
    ### END CODE HERE ###
    
    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ### START CODE HERE ### (≈ 1 line)
    #将分数大于输入值threshold的标记为true,创造掩码
    filtering_mask = box_class_scores >= threshold
    ### END CODE HERE ###
    
    # Step 4: Apply the mask to scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines) 获得符合mask最高分数,该分数所属对象的边界框,该分数所属对象类别
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    ### END CODE HERE ###
    
    return scores, boxes, classes

2.3 - Non-max suppression 非极大值抑制

Even after filtering by thresholding over the classes scores, you still end up a lot of overlapping boxes. A second filter for selecting the right boxes is called non-maximum suppression (NMS).

Figure 7 : In this example, the model has predicted 3 cars, but it's actually 3 predictions of the same car. Running non-max suppression (NMS) will select only the most accurate (highest probabiliy) one of the 3 boxes.

Non-max suppression uses the very important function called "Intersection over Union", or IoU.
Figure 8 : Definition of "Intersection over Union".

Exercise: Implement iou(). Some hints:

  • In this exercise only, we define a box using its two corners (upper left and lower right): (x1, y1, x2, y2) rather than the midpoint and height/width.
  • To calculate the area of a rectangle you need to multiply its height (y2 - y1) by its width (x2 - x1).
  • You'll also need to find the coordinates (xi1, yi1, xi2, yi2) of the intersection of two boxes. Remember that:
    • xi1 = maximum of the x1 coordinates of the two boxes
    • yi1 = maximum of the y1 coordinates of the two boxes
    • xi2 = minimum of the x2 coordinates of the two boxes
    • yi2 = minimum of the y2 coordinates of the two boxes
  • In order to compute the intersection area, you need to make sure the height and width of the intersection are positive, otherwise the intersection area should be zero. Use max(height, 0) and max(width, 0).

In this code, we use the convention that (0,0) is the top-left corner of an image, (1,0) is the upper-right corner, and (1,1) the lower-right corner.

非极大值抑制这部分,其实吴恩达老师在课程里讲得很清楚了,我简单翻译一下:
即使经过了用掩码对类别得分进行过滤,你仍然有许多重叠的边界框(如图七),下一个用来选择正确边界框的过滤器被称作非极大值抑制(NMS)。
而非极大值抑制需要用到一个非常重要的函数,交并比(IoU,Intersection over Union),如图8。
练习:实现iou()函数

  • 仅在此练习中,我们用两个顶点来定义边界框(x1,y1,x2,y2),而不是中点和宽高。
  • 你需要用高(y2 - y1)乘以宽(x2 - x1)来计算矩形区域(的面积)。

在这段代码中,(0,0)是图像的左上角坐标,(1,1)是左下角坐标。

  • 你还需要找到两个边界框相交部分的交点(xi1, yi1, xi2, yi2)
    • xi1 = 两个边界框x1坐标(左上角坐标)的最大值
    • yi1 = 两个边界框y1坐标(左上角坐标)的最大值
    • xi2 = 两个边界框的x2坐标(右下角坐标)的最小值
    • yi2 = 两个边界框的y2坐标(右下角坐标)的最小值
  • 为了计算香蕉区域,你得确保相交区域的宽和高为正值,否则相交区域就归零。用 max(height, 0)max(width, 0)
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,530评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,403评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,120评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,770评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,758评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,649评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,021评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,675评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,931评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,751评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,410评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,004评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,969评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,042评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,493评论 2 343

推荐阅读更多精彩内容