Objective Infrared imaging technology can realize target monitoring and redundant information filtering in complex road scenes by capturing the thermal radiation characteristics of the target for imaging. In order to solve the problems of large number of parameters, dependence on high-performance GPU resources and slow detection speed of infrared pedestrian and vehicle target detection models, an attention-guided multi-scale infrared pedestrian vehicle real-time detection model was proposed which aims to strike a balance between detection accuracy and real-time performance in infrared vehicle and pedestrian target detection tasks.
Methods This article improves upon the YOLOv5 algorithm and proposes a attention-guided multi-scale infrared real-time detection model for pedestrian and vehicle-IRDet (Fig.1). Firstly, in order to accurately match and calibrate the target scale of infrared pedestrians and vehicles and the size of the anchor frame, this paper uses the K-Means++ algorithm to generate the preset parameters of the preset parameters of the infrared pedestrian and vehicle target scale, and designs a 128×128 fine-scale detection layer. Additionally, the attention-guided global feature extraction module (Fig.3) is designed to enhance the model feature extraction ability and spatial and channel information focusing ability. Secondly, A cross-space perception module (Fig.4) is constructed to introduce spatial information perception to strengthen the feature expression ability of targets in different scales. Finally, the model was made lightweight by using channel pruning (Fig.5-6) to reduce model parameters.
Results and Discussions In order to avoid overfitting caused by the similarity between adjacent frames in model training, this article conducts sparse filtering on the FLIR Thermal Starter assisted driving infrared dataset, removing images with high similarity. In order to evaluate the performance of the algorithm in this paper in multiple aspects, the evaluation criteria for the model are Average Precision (AP), mean Average Precision (mAP), Model Size (Size), Single Image Reasoning Time (Time), Floating-point Arithmetic (Flops) and Model Parameters (Parameters). The ablation experiment (Tab.2) shows that the average detection accuracy of the improved infrared pedestrian and vehicle detection model has increased from 83.1% to 88%. However, this also leads to a significant increase in model size, thus requiring compression of the model. The scaling factor comparison experiment (Tab.3) identifies the optimal scaling factor.The pruning experiment (Tab.4) shows that while ensuring the accuracy and speed of the model, the optimal pruning rate is determined to be 0.8. Comparative experiments (Tab.5) have shown that our model exhibits the best detection performance when compared with other models.
Conclusions This article proposes a lightweight infrared pedestrian and vehicle detection algorithm, which combines fine-scale detection layers and uses K-Means++ algorithm to recluster prior boxes suitable for infrared pedestrian and vehicle, which helps the model locate targets more accurately. Attention-guided global feature extraction module was proposed to enhance the model feature extraction ability and spatial and channel information focusing ability. The dynamic detection head is embedded into the original detection head to improve the model's detection ability. A cross-spatial perception module was designed to correlate the spatial feature information of infrared images at different scales. Based on the improved pedestrian and vehicle detection model, a BN layer channel pruning strategy was used to compress and fine tune the model, achieving deep compression while maintaining accuracy.