嵌入空间位置信息和多视角特征提取的红外小目标检测

Embedding spatial position information and Multi-view Feature Extraction for infrared small target detection

  • 摘要: 针对红外小目标图像的低分辨率、特征信息少、识别准确率低等问题,提出嵌入空间位置信息和多视角特征提取(Embedded Spatial Location Information and Multi-view Feature Extraction,ESLIMFE)的红外小目标检测模型。首先,随着网络深度的增加导致特征图分辨率逐渐减小从而丢失细节信息,因此在骨干网络中嵌入空间位置信息融合注意力机制(Spatial Location Information Fusion,SLIF)弥补小目标特征信息。其次,结合C3模块和动态蛇形卷积提出多视角特征提取 (Multi-view Feature Extraction,MVFE) 模块,通过在不同视角下提取同一特征来增强小目标的特征表达能力。采用大选择核(Large Selection Kernel,LSK) 模块,通过使用不同大小的卷积核提取小目标多尺度信息,以提高对红外小目标定位能力。最后,引入基于注意力的尺度内特征交互(Attention-based Intrascale Feature Interaction,AIFI) 模块增强特征之间的交互性。 在对空红外小目标数据集上进行实验,实验结果表明,mAP75的检测精度为90.5%,mAP50~95检测精度为74.5%,文中模型能够较好地实现对红外小目标精确检测。

     

    Abstract:
    Objective This article presents an infrared small target detection model called Embedded Spatial Location Information and Multi-view Feature Extraction (ESLIMFE) aims to tackle the challenges of low resolution, restricted feature information, and low recognition accuracy in infrared small target images.
    Methods This article proposes an improved version of the YOLOv5 algorithm, introducing a network model for infrared small target detection designated as ESLIMFENet (Fig.1). To address the loss of detailed information caused by the decreasing resolution of feature maps as network depth increases, a Spatial Location Information Fusion attention mechanism is embedded in the backbone network (Fig.2). Additionally, a Multi-view Feature Extraction Module combines the C3 module and Dynamic Snake Convolution (Fig.3) to enhance the feature representation of small targets by extracting features from various viewpoints. The Large Selection Kernel module utilizes different sizes of convolution kernels to capture multi-scale information, improving the localization capability of infrared small targets. Finally, the Attention-based Intrascale Feature Interaction (AIFI) module is introduced to strengthen feature interaction.
    Results and Discussions To evaluate the detection performance of the proposed network model, various metrics such as mAP75, mAP50-95, Size, GFLOPs, and Inference time are utilized for a comprehensive comparison. The ablation study in Tab.1 shows that the improved infrared small target detection model achieves an average detection accuracy increase from 82.8% to 90.5%, with a slight increase in inference time from 6.6 ms to 8.5 ms. Comparative experiments in Tab.2 reveal that the improved model outperforms advanced models YOLOv8n and YOLOv8s in detection accuracy. Furthermore, the proposed modules are compared with classic attention mechanisms (SA, CBAM, etc.) in Tab.3, demonstrating superior detection accuracy with our tailored attention mechanism for infrared small target detection.
    Conclusions An algorithmic model is introduced to address the deficiencies in feature information and detection accuracy of infrared small targets. Improvements are made to the YOLOv5s model by integrating a Spatial Location Information Fusion attention mechanism into the original backbone network, enhancing its capacity to extract spatial location information for small targets. A Multi-view Feature Extraction module is designed to further bolster the feature extraction capability for C3 small targets by extracting features from the same perspective at different angles. Furthermore, a Large Selection Kernel module is incorporated to expand the model's receptive field and increase focus on small targets. Finally, an Attention-based Intrascale Feature Interaction module is implemented to dynamically assess the importance of different positions in the input sequence, highlighting key small targets and allocating additional attention to them. Experimental results demonstrate that the model achieves an mAP value of 90.5%, indicating a significant precision advantage over other models and meeting the detection requirements for infrared small targets.

     

/

返回文章
返回