基于跨模态数据增强的红外时敏目标检测技术

王思宇; 杨小冈; 卢瑞涛; 李清格; 范继伟; 朱正杰

doi:10.3788/IRLA20220876

基于跨模态数据增强的红外时敏目标检测技术

Infrared time-sensitive target detection technology based on cross-modal data augmentation

摘要

摘要: 目前红外时敏目标检测技术在无人巡航、精确打击、战场侦察等领域应用广泛，但有些高价值目标图像的获取难度高且成本昂贵。针对红外时敏目标图像数据匮乏、缺少用于训练的多场景多目标数据、检测效果不佳等问题，文中提出一种基于跨模态数据增强的红外时敏目标检测技术，跨模态数据增强方法为两阶段模型。首先在第一阶段通过基于CUT网络的模态转换模型将包含时敏目标的可见光图像转换为红外图像，其次在第二阶段模型中引入coordinate attention注意力机制，随机生成大量红外目标图像，实现了数据增强效果。最后提出一种基于SE模块和CBAM模块改进的Yolov5目标检测架构，实验结果表明，文中提出的Yolov5（CSP-A）目标检测技术与原网络相比，准确率提升了7.36%，召回率提升了5.43%，平均精度提升了2.74%。有效提高了红外时敏目标的检测精度，实现了红外时敏目标精确检测。

Abstract:
Objective Infrared time-sensitive targets refer to infrared targets such as ships and aircraft, which have high military value and the opportunity of attack is limited by the time window. Infrared time-sensitive target detection technology is widely used in military and civilian fields such as unmanned cruise, precision strike, battlefield reconnaissance, etc. The target detection algorithm based on deep learning has made great progress in the field of target detection due to its powerful computing power, deep network structure and a large number of labeled data. However, the acquisition of some high-value target images is difficult and costly. Therefore, the infrared time-sensitive target image data is scarce, and the multi-scene and multi-target data for training is lacking, which makes it difficult to ensure the detection effect. Based on this, this paper proposes an infrared time-sensitive target detection technology based on cross-modal data enhancement, which generates "new data" by processing the data, expands the infrared time-sensitive target data set, and improves the model detection accuracy and generalization ability.
Methods We propose an infrared time-sensitive target detection technology based on cross-modal data enhancement. The cross-modal data enhancement method is a two-stage model (Fig.1). First, in the first stage, the visible light image containing time-sensitive targets is converted into infrared images through the mode conversion model based on the CUT network, and then the coordinate attention mechanism is introduced into the second stage model to randomly generate a large number of infrared target images, realizing the data enhancement effect. Finally, an improved Yolov5 target detection architecture based on SE module and CBAM module is proposed (Fig.3).
Results and Discussions The proposed cross-modal infrared time-sensitive target data enhancement method combines the style migration model with the target generation model, and uses the visible light image data set to achieve infrared time-sensitive target data enhancement. We can convert remote sensing visible image into infrared image without losing size, structure and field of view, without distortion, noise, distortion and other problems. It can be seen from Fig.6 that the generated infrared time-sensitive target has good texture details and infrared characteristics, and is clearly distinguished from the background. An improved Yolov5 target detection model is proposed. SE and CBAM attention mechanisms are added to the CSP network to enhance the feature expression of the network and better achieve infrared time-sensitive target detection. It can be seen from the analysis of Tab.2 that compared with using the original data to train the deep learning detection network, the data enhancement algorithm proposed in this paper has significantly improved the detection ability of positive samples, the detection accuracy rate, the recall rate, and the average accuracy have increased by 14.57%, 5.99%, and 8.82% respectively. It can be seen from Tab.3 that compared with SSD, Fast R-CNN and Yolov5, the algorithm in this paper has a great improvement in accuracy, average accuracy and F1 index. Compared with the original Yolov5 network, the accuracy rate, the recall rate, the average accuracy, and the F1 index have increased by 7.36%, 5.43%, 2.74%, and 6.45% respectively. Some test results are shown (Fig.9).
Conclusion Due to the lack of infrared time-sensitive target data and poor detection effect, we proposes a cross-modal data enhancement infrared time-sensitive target detection technology. In the aspect of two-stage model data enhancement, firstly, the visible light remote sensing image containing time-sensitive targets is converted into the target image with infrared characteristics using the mode conversion network. Secondly, the coordinate attention mechanism is introduced into the sample random generation model. Finally, the Yolov5 detection technology based on the improved CSP module is proposed. Multiple sets of experimental results show that the detection accuracy of the algorithm in this paper is up to 98.06% in the infrared time-sensitive target data set, which solves the problem of the lack of infrared time-sensitive target data and has good target detection ability.

HTML全文

参考文献(26)

施引文献

资源附件(0)