Abstract:
Objective This article presents an infrared small target detection model called Embedded Spatial Location Information and Multi-view Feature Extraction (ESLIMFE) aims to tackle the challenges of low resolution, restricted feature information, and low recognition accuracy in infrared small target images.
Methods This article proposes an improved version of the YOLOv5 algorithm, introducing a network model for infrared small target detection designated as ESLIMFENet (Fig.1). To address the loss of detailed information caused by the decreasing resolution of feature maps as network depth increases, a Spatial Location Information Fusion attention mechanism is embedded in the backbone network (Fig.2). Additionally, a Multi-view Feature Extraction Module combines the C3 module and Dynamic Snake Convolution (Fig.3) to enhance the feature representation of small targets by extracting features from various viewpoints. The Large Selection Kernel module utilizes different sizes of convolution kernels to capture multi-scale information, improving the localization capability of infrared small targets. Finally, the Attention-based Intrascale Feature Interaction (AIFI) module is introduced to strengthen feature interaction.
Results and Discussions To evaluate the detection performance of the proposed network model, various metrics such as mAP75, mAP50-95, Size, GFLOPs, and Inference time are utilized for a comprehensive comparison. The ablation study in Tab.1 shows that the improved infrared small target detection model achieves an average detection accuracy increase from 82.8% to 90.5%, with a slight increase in inference time from 6.6 ms to 8.5 ms. Comparative experiments in Tab.2 reveal that the improved model outperforms advanced models YOLOv8n and YOLOv8s in detection accuracy. Furthermore, the proposed modules are compared with classic attention mechanisms (SA, CBAM, etc.) in Tab.3, demonstrating superior detection accuracy with our tailored attention mechanism for infrared small target detection.
Conclusions An algorithmic model is introduced to address the deficiencies in feature information and detection accuracy of infrared small targets. Improvements are made to the YOLOv5s model by integrating a Spatial Location Information Fusion attention mechanism into the original backbone network, enhancing its capacity to extract spatial location information for small targets. A Multi-view Feature Extraction module is designed to further bolster the feature extraction capability for C3 small targets by extracting features from the same perspective at different angles. Furthermore, a Large Selection Kernel module is incorporated to expand the model's receptive field and increase focus on small targets. Finally, an Attention-based Intrascale Feature Interaction module is implemented to dynamically assess the importance of different positions in the input sequence, highlighting key small targets and allocating additional attention to them. Experimental results demonstrate that the model achieves an mAP value of 90.5%, indicating a significant precision advantage over other models and meeting the detection requirements for infrared small targets.