Objective Infrared image target detection has significant application value in the field of transportation, as it can help people promptly detect targets and respond in special conditions such as strong light at night or in rainy and foggy weather. However, due to the characteristics of infrared images, such as low resolution, lack of color information, poor contrast, and blurred features, existing models do not achieve high average detection accuracy when detecting infrared vehicles and pedestrians. The main issue is the problem of missing detection for overlapping targets and small targets in traffic scenes. Therefore, this paper aims to design an infrared pedestrian and vehicle detection model based on YOLOv8s (You only look once version 8), which is crucial for improving the safety of intelligent driving.
Methods YOLOv8s, an advanced object detection model in recent years, is categorized into five distinct versions—n, s, m, l, and x—according to the network's depth and breadth to cater to diverse requirements. YOLOv8s, ensuring a certain level of detection precision with a moderate model size, is chosen as the base model. The manuscript introduces four improvements to the YOLOv8s architecture (Fig.2). Firstly, the network architecture is re-engineered with the incorporation of a small target detection layer to improve detection capabilities for distant pedestrians and vehicles (Fig.3). Secondly, the SPD (space-to-depth) module replaces the original network's 3×3 downsampling convolution in the backbone and neck networks (Fig.4), to safeguard the fine-grained details within the image. Thirdly, a hybrid attention mechanism (Fig.5) is crafted to bolster the network's attentiveness to pedestrians and vehicles. Fourthly, the Focal EIOU loss function is utilized, which not only addresses the deficiencies of the CIOU loss function that may become ineffective under certain circumstances but also mitigates the issue of imbalance between positive and negative samples.
Results and Discussions The dataset utilized in this study is the FLIR ADAS (Advanced Driver Assistance System) v2 dataset, which was recently released by Teledyne FLIR in 2022 for the purpose of environmental perception in autonomous driving applications (Fig.1). The main evaluation metrics are mAP (mean Average Precision) and model size, with P (precision) and R (recall) as secondary metrics. Ablation experiments (Tab.1) were used to verify the feasibility of each improvement method introduced, with the improved network showing a 5.3% increase in mAP compared to the initial network. This paper compares the detection effect before and after adding a small object detection layer (Fig.6) and before and after adding an SPD module (Fig.7), compares detection accuracy with different attention mechanisms (Tab.2), and further demonstrates the effectiveness of the hybrid attention mechanism with heat maps (Fig.8-Fig.9). It also compares the detection effect before and after using attention mechanisms, compares the performance with different loss functions (Tab.3), and shows the detection effect before and after changing the loss function (Fig.11). On this basis, the detection performance of different algorithms is compared (Tab.4), and the detection effect before and after the improvement is compared (Fig.12). Through the above experiments, the improved network has shown excellent detection performance.
Conclusions This paper presents an improved YOLOv8s-based infrared vehicle and pedestrian object detection algorithm. By adding a small target detection layer, the algorithm enhances its ability to detect small target vehicles and pedestrians. The SPD module is utilized to preserve fine-grained information during downsampling. The designed hybrid attention mechanism enables the network to suppress noise interference and focus more on the targets themselves. The improved loss function enhances the model's learning capabilities. The refined algorithm has demonstrated good detection performance on the test set, showing improved detection capabilities for overlapping targets, small targets, and blurred targets.