目标检测（一）：目标框和锚框

1、什么是目标检测？

目标检测（object detection）是盘算机视觉的紧张分支，在各个场景使用广泛。如我们在超市中的刷脸付出、在泊车场的车牌识别、在主动驾驶中的车辆识别、在产业场景中的零件缺陷检测等都离不开目标检测。目标检测的主要职责是将图像中的目标框选出来，并识别是什么目标。

因此，目标检测主要处理的成绩是：目标在哪？目标是什么？

图1. 目标检测表现图

如图1所示，图像中包含两一局部，若我们完成对 “人” 的目标检测，则必要将 “人” 的界限框选，并识别框选的目标对否是 “人”。

2、什么是目标框？

在目标检测中，通常使用目标框（bounding box）来形貌目标的地点，目标框是矩形的。由矩形左上角的坐标（x1,y1）以及右下角的坐标（x2,y2）举行表现。别的，还可以接纳界限框矩形的中央坐标（xc,yc）以及宽高（w,h）举行表现。

图2. 目标检测示例图

如图2，图片中包含 “狗” 与 “猫” 两个目标。底下接纳”左上-右下”，”中央-宽高”的办法举行表现。

（1）”左上-右下”转换为”中央-宽高”

def box_corner_to_center(boxes): """从（左上，右下）转换到（正中，宽度，高度）""" x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3] cx = (x1 + x2) / 2 cy = (y1 + y2) / 2 w = x2 - x1 h = y2 - y1 boxes = torch.stack((cx, cy, w, h), axis=-1) return boxes

（2）”中央-宽高”转换为”左上-右下”

def box_center_to_corner(boxes): """从（正中，宽度，高度）转换到（左上，右下）""" cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3] x1 = cx - 0.5 * w y1 = cy - 0.5 * h x2 = cx + 0.5 * w y2 = cy + 0.5 * h boxes = torch.stack((x1, y1, x2, y2), axis=-1) return boxes

（3）画出目标框（dog_bbox接纳绿色，cat_bbox接纳赤色）

def bbox_to_rect(bbox, color): # 将界限框(左上x,左上y,右下x,右下y)格式转换成matplotlib格式： # ((左上x,左上y),宽,高) return plt.Rectangle( xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1], fill=False, edgecolor=color, linewidth=2) img = plt.imread('1.jpg') fig = plt.imshow(img) fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue')) fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red')) plt.show()

3、什么是锚框？

如图3，假如想要定位 “狗” 与 “猫” 的地点，直接的想法是在图像上天生多量不同外形、尺寸的检测地区，然后推断这些检测地区内对否包含目标。天生的这些检测地区称为锚框（anchor box）。

这里先容一种天生办法：在每个像素地点处，天生不同比例、不同尺寸的锚框。

图3. 锚框表现图

在图像中的每个像素处天生不同尺寸、不同比例的锚框。

假定图像轻重：
天生的不同缩放比尺寸品种：
天生的不同宽高比品种：
则天生的总的锚框品种：
因此每张图像上锚框个数：
每个锚框在长度以及宽度上干系于原图长度与宽度相应的缩放比例为
, 并界说锚框尺寸为

，则有：
锚框的宽度与长度朝向比例：
因此，可得：
对上式举行归一化可取得：

因此，在每个像素地点处，依据锚框缩放比 s 与长宽比 r，可确定锚框。

图4. 锚框天生图

锚框天生的代码如下：

import torch import matplotlib.pyplot as plt


def multibox_prior(data, sizes, ratios):

    """天生以每个像素为中央具有不同外形的锚框"""

    in_height, in_width = data.shape[-2:]

    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)  # 3, 3

    boxes_per_pixel = (num_sizes + num_ratios - 1)  # 每个像素的锚框数

    size_tensor = torch.tensor(sizes, device=device)  # list 转为 tensor

    ratio_tensor = torch.tensor(ratios, device=device)
    # 为了将锚点挪动到像素的中央，必要设置偏移量。

    # 由于一个像素的的高为1且宽为1，我们选择偏移我们的中央0.5

    offset_h, offset_w = 0.5, 0.5

    steps_h = 1.0 / in_height  # 在y轴上缩放步长

    steps_w = 1.0 / in_width  # 在x轴上缩放步长
    # 天生锚框的一切中央点

    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h

    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w

    shift_y, shift_x = torch.meshgrid(center_h, center_w)
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    # 天生“boxes_per_pixel”个高和宽，

    # 之后用于创建锚框的四角坐标(xmin,xmax,ymin,ymax)
    w = torch.cat((sizes[0] * torch.sqrt(in_height * ratio_tensor[:] / in_width),

                   size_tensor[1:] * torch.sqrt(in_height * ratio_tensor[0] / in_width)))

    h = torch.cat((sizes[0] * torch.sqrt(in_width / ratio_tensor[:] / in_height),

                   size_tensor[1:] * torch.sqrt(in_width / ratio_tensor[0] / in_height)))

    # 除以2来取得半高和半宽

    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(

        in_height * in_width, 1) / 2
    # 每此中央点都将有“boxes_per_pixel”个锚框，

    # 以是天生含一切锚框中央的网格，反复了“boxes_per_pixel”次

    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],

                           dim=1).repeat_interleave(boxes_per_pixel, dim=0)

    output = out_grid + anchor_manipulations

    return output.unsqueeze(0)
def bbox_to_rect(bbox, color):

    # 将界限框(左上x,左上y,右下x,右下y)格式转换成matplotlib格式：

    # ((左上x,左上y),宽,高)

    return plt.Rectangle(

        xy=(bbox[0], bbox[1]), width=bbox[2] - bbox[0], height=bbox[3] - bbox[1],

        fill=False, edgecolor=color, linewidth=2)
def show_bboxes(axes, bboxes, labels=None, colors=None):

    """体现一切界限框"""
    def _make_list(obj, default_values=None):

        if obj is None:

            obj = default_values

        elif not isinstance(obj, (list, tuple)):

            obj = [obj]

        return obj
    labels = _make_list(labels)

    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])

    for i, bbox in enumerate(bboxes):

        color = colors[i % len(colors)]

        rect = bbox_to_rect(bbox.detach().numpy(), color)

        axes.add_patch(rect)

        if labels and len(labels) > i:

            text_color = 'k' if color == 'w' else 'w'

            axes.text(rect.xy[0], rect.xy[1], labels[i],

                      va='center', ha='center', fontsize=9, color=text_color,

                      bbox=dict(facecolor=color, lw=0))

if __name__ == "__main__": img = plt.imread('1.jpg') h, w = img.shape[:2] X = torch.rand(size=(1, 3, h, w)) Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5]) boxes = Y.reshape(h, w, 5, 4) bbox_scale = torch.tensor((w, h, w, h)) fig = plt.imshow(img) show_bboxes(fig.axes, boxes[350, 450, :, :] * bbox_scale, ['s=0.75, r=1', 's=0.75, r=2', 's=0.75, r=0.5', 's=0.5, r=1', 's=0.25, r=1']) plt.show()