注意力引导的多尺度红外行人车辆实时检测

张印辉; 计凯; 何自芬; 陈光晨

doi:10.3788/IRLA20240063

摘要: 红外成像技术通过捕捉目标热辐射特征进行成像，能实现复杂道路场景下的目标监测和道路冗杂信息滤除。针对红外行人和车辆目标检测模型参数量大、依赖高性能GPU资源和检测速度慢等问题，提出了一种注意力引导的多尺度红外行人车辆实时检测模型。首先，为精确匹配校准红外行人和车辆目标尺度与锚框尺寸，利用K-Means++算法对红外行人和车辆目标尺度进行先验框预置参数重聚类生成，并设计128×128精细尺度检测层；其次，设计注意力引导广域特征提取模块增强模型特征提取能力和空间及通道信息聚焦能力；随后，构建跨空间感知模块引入空间信息感知，强化不同尺度空间下的目标的特征表达能力；最后，针对资源受限设备，通过4倍通道剪枝方法降低模型参数量，增强移动端算法部署适应性。实验结果表明：所提IRDet算法与基准方法相比，模型平均检测精度提升4.3%，达到87.4%，模型权重值压缩60.4%，降至5.7 MB。

Abstract:

Objective Infrared imaging technology can realize target monitoring and redundant information filtering in complex road scenes by capturing the thermal radiation characteristics of the target for imaging. In order to solve the problems of large number of parameters, dependence on high-performance GPU resources and slow detection speed of infrared pedestrian and vehicle target detection models, an attention-guided multi-scale infrared pedestrian vehicle real-time detection model was proposed which aims to strike a balance between detection accuracy and real-time performance in infrared vehicle and pedestrian target detection tasks.

Methods This article improves upon the YOLOv5 algorithm and proposes a attention-guided multi-scale infrared real-time detection model for pedestrian and vehicle-IRDet (Fig.1). Firstly, in order to accurately match and calibrate the target scale of infrared pedestrians and vehicles and the size of the anchor frame, this paper uses the K-Means++ algorithm to generate the preset parameters of the preset parameters of the infrared pedestrian and vehicle target scale, and designs a 128×128 fine-scale detection layer. Additionally, the attention-guided global feature extraction module (Fig.3) is designed to enhance the model feature extraction ability and spatial and channel information focusing ability. Secondly, A cross-space perception module (Fig.4) is constructed to introduce spatial information perception to strengthen the feature expression ability of targets in different scales. Finally, the model was made lightweight by using channel pruning (Fig.5-6) to reduce model parameters.

Results and Discussions In order to avoid overfitting caused by the similarity between adjacent frames in model training, this article conducts sparse filtering on the FLIR Thermal Starter assisted driving infrared dataset, removing images with high similarity. In order to evaluate the performance of the algorithm in this paper in multiple aspects, the evaluation criteria for the model are Average Precision (AP), mean Average Precision (mAP), Model Size (Size), Single Image Reasoning Time (Time), Floating-point Arithmetic (Flops) and Model Parameters (Parameters). The ablation experiment (Tab.2) shows that the average detection accuracy of the improved infrared pedestrian and vehicle detection model has increased from 83.1% to 88%. However, this also leads to a significant increase in model size, thus requiring compression of the model. The scaling factor comparison experiment (Tab.3) identifies the optimal scaling factor.The pruning experiment (Tab.4) shows that while ensuring the accuracy and speed of the model, the optimal pruning rate is determined to be 0.8. Comparative experiments (Tab.5) have shown that our model exhibits the best detection performance when compared with other models.

Conclusions This article proposes a lightweight infrared pedestrian and vehicle detection algorithm, which combines fine-scale detection layers and uses K-Means++ algorithm to recluster prior boxes suitable for infrared pedestrian and vehicle, which helps the model locate targets more accurately. Attention-guided global feature extraction module was proposed to enhance the model feature extraction ability and spatial and channel information focusing ability. The dynamic detection head is embedded into the original detection head to improve the model's detection ability. A cross-spatial perception module was designed to correlate the spatial feature information of infrared images at different scales. Based on the improved pedestrian and vehicle detection model, a BN layer channel pruning strategy was used to compress and fine tune the model, achieving deep compression while maintaining accuracy.

注意力引导的多尺度红外行人车辆实时检测

Attention-guided multi-scale infrared real-time detection of pedestrian and vehicle