复杂背景下基于YOLOv7-tiny的图像目标检测算法

Image target detection algorithm based on YOLOv7-tiny in complex background

  • 摘要: “黑飞”无人机一旦带有炸弹等物品,会对人们带来威胁。对在公园、游乐场、学校等复杂背景下“黑飞”的无人机进行目标检测是十分必要的。前沿算法YOLOv7-tiny属于轻量级网络,具有更小的网络结构和参数,更适合检测小目标,但在识别小目标无人机时出现特征提取能力弱、回归损失大、检测精度低的问题;针对此问题,提出了一种基于YOLOv7-tiny改进的无人机图像目标检测算法YOLOv7-drone。首先,建立无人机图像数据集;其次,设计一种新的注意力机制模块SMSE嵌入到特征提取网络中,增强对复杂背景下无人机目标的关注度;然后,在主干网络中融入RFB结构,扩大特征层的感受野,丰富特征信息以增强特征提取的鲁棒性;然后,改进网络中的特征融合机制,通过新增小目标检测层,增加对小尺度目标的检测精度;然后,改变损失函数提高模型的收敛速度,减少损失以增强模型的鲁棒性;最后,引入可变形卷积(Deformable convolution, DCN),更好的根据目标本身形状进行特征提取,提升了检测精度。在PASCAL VOC公共数据集上进行对比实验,结果表明改进后的算法YOLO7-drone相比于YOLOv7-tiny,平均精度(map@0.5)提升了6%;在自制无人机数据集上进行实验,结果表明YOLOv7-drone与原算法相比,平均精度(map@0.5)提高了6.1%,并且检测速度为72帧/s;与YOLOv5l、YOLOv7目标检测算法进行对比实验,结果表明改进后的算法在平均精度(map@0.5)上分别高于对比算法4%、3.1%,验证了文中算法的可行性。

     

    Abstract:
      Objective  Once the "black flying" drone carries items such as bombs, it can pose a threat to people. Target detection of "black flying" drones in complex backgrounds such as parks, amusement parks, and schools is the key to anti-drone systems in public areas. This paper aims to detect small-scale targets in complex background. Because the traditional manual image feature extraction methods are not targeted, time complexity is high, windows are redundant, the detection effect is poor, and the average accuracy is low. The problems of false detection and missing detection will occur when detecting small-scale UAVs in complex background. Therefore, this paper aims to develop a black flying UAV detection model based on deep learning, which is essential for the detection of unmanned aerial vehicles.
      Methods  YOLOv7 is a stage target detection algorithm without anchor frame, with high detection accuracy and good inference speed. YOLOv7-tiny belongs to the grain grabbing memory model, with fewer parameters and fast operation, making it widely used in industry. In the backbone network, the built multi-scale channel attention module SMSE (Fig.5) is introduced to enhance the attention of UAVs in complex backgrounds. Between the backbone network and the feature fusion layer, the RFB feature extraction module (Fig.6) is introduced to increase the Receptive field and expand the feature information extraction. In the feature fusion, the small target detection layer is added to improve the detection ability of small UAV targets. In terms of calculating losses, the introduction of SIoU Loss function redefines the penalty index, which significantly improves the speed of training and the accuracy of reasoning. Finally, the ordinary convolution is replaced by the deformable convolution (Fig.7), making the detection closer to the shape and size of the object.
      Results and Discussions   The dataset selected in this article is a combination of the self-made dataset (Fig.1) and the Dalian University of Technology drone dataset (Fig.2). The mainly used evaluation indicators are mAP (mean accuracy) and FPS (detection speed), Params (parameter quantity) and GFLOPS (computational quantity) as secondary indicators. Each module was compared with the original algorithm, including attention comparison experiment (Tab.1), RFB module comparison experiment (Tab.2), small target detection layer comparison experiment (Tab.3), Loss function comparison experiment (Tab.4), and deformable convolution comparison experiment (Tab.5). And ablation experiments were conducted (Tab.6), which confirmed the effectiveness and feasibility of the proposed algorithm through mAP comparison, improving accuracy by 6.1%. On this basis, the detection performance of different algorithms was compared (Tab.7), and the generalization of the algorithm was verified on the VOC public dataset (Tab.8).
      Conclusions  This article proposes an improved object detection algorithm for anti-drone systems. Through the multi-scale channel attention module, the attention of small targets is enhanced, the fusion RFB increases the Receptive field, adds a small target detection layer to improve the detection ability, and improves the Loss function to improve the training speed and reasoning accuracy. Finally, deformable convolution is introduced to better fit the target size. The improved algorithm has achieved good detection results on different datasets.

     

/

返回文章
返回