-
依靠高超声速飞行器搭载的红外探测器探测目标时,由于相对速度很大,目标在短时间内会发生较大位移,同时大小、形态明显变化。由于条件限制,实验室无法利用高超声速飞行器对空中红外目标进行实时检测。基于此考虑,文中构建了一个包含1500张大小为640×512的常速运动红外无人机(UAV)连续帧图像序列,并选取多帧间隔、包含多尺度、多形态目标图像作为实验测试集,背景包括建筑、树木、云朵等,以模拟复杂背景空中目标检测场景。为验证文中方法的有效性,选取了三组测试集中包含建筑、空中飞鸟(点噪声)、云层等干扰的连续帧图像进行对比实验,对比算法包括C3D[17]、TSN[18]、ECO[19]、3DLocalCNN[20]、TAda[21]。
文中根据识别准确率、实时性和计算资源评估算法性能,如表1所示。其中识别准确率Accuracy=(TP+TN)/(TP+TN+FP+FN),即所有正确预测为正样本的数据与正确预测为负样本的数据数量占总样本的比值;算法实时性指标FPS(Frames Per Second)表示网络每秒可处理图像帧数;算法运行占用资源Run memory以GB(Gigabyte)计算。
Table 1. Comparison of detection performance of different algorithms on self-built dataset
结合对连续帧图像的识别结果以及表1可以看出,TSN、ECO、3DLocalCNN、TADa四种方法可以较好地识别无人机目标,但存在对空中点噪声以及云层背景的大量误检,虚警率很高;C3D方法对于背景噪声的抑制较好,但无法对连续帧图像中的目标实时跟踪,存在丢帧的现象,识别准确率低。文中提出的基于深度空时域特征融合的目标识别方法能够有效抑制复杂背景中的噪声信息,大幅降低虚警率;保持实时性的同时目标识别准确率达到了89.87%,优于现有基于时空域特征融合的目标识别算法。
为验证所提出的基于深度学习的目标识别方法相比传统方法的优势,选取PSTNN[1]、NRAM[2]、TDLMS[3]和Top-hat[4]四种传统方法对图6中的三组连续帧图像进行测试,实验结果如图7所示。
由传统方法无人机目标识别结果可以看出,PSTNN误检较少,但只能滤出无人机发动机、旋翼等高温位置,无法整体检出目标,当目标与背景重叠时目标检测效果差;NRAM同样无法整体检测出无人机目标,且当背景中存在大量高温物体时,检测效果差;TDMLS能以较高准确率提取出运动目标,但存在明显的运动轨迹,影响识别效果;Top-hat能将目标准确滤出,但存在大量误检,虚警率过高。
以上对时空域融合方法与传统方法的分析证明了文中方法在高超声速飞行器制导场景中的有效性,满足高动态下红外目标智能检测识别的需求。
Highly dynamic aerial polymorphic target detection method based on deep spatial-temporal feature fusion (Invited)
doi: 10.3788/IRLA20220167
- Received Date: 2022-03-10
- Rev Recd Date: 2022-04-07
- Publish Date: 2022-05-06
-
Key words:
- object detection /
- feature fusion /
- multi-scale pyramid /
- sparse optical flow /
- 3D convolution
Abstract: Aiming at the problem of reliable detection and accurate recognition of high dynamic aerial targets by infrared detectors carried by hypersonic vehicles in complex background, an aerial polymorphic target detection method based on deep spatial-temporal feature fusion was proposed. A weighted bidirectional cyclic feature pyramid structure was designed to extract the static features of polymorphic target, and switchable atrous convolution was introduced to increase the receptive field and reduce spatial information loss. For the extraction of temporal motion features, in order to suppress the complex background noise and concentrate the corner information into the moving region, the feature point matching method was used to generate the mask image, then the optical flow was calculated, and the sparse optical flow feature map was designed according to calculation results. Finally, the temporal features contained in multiple continuous frame images were extracted by 3D convolution to generate a 3D temporal motion feature map. By concatting the image static features and temporal motion features in channel dimension, the deep spatial-temporal fusion could be realized. A large number of comparative experiments showed that this method can significantly reduce the false recognition probability in complex background, and the target detection accuracy reached 89.87% with high real-time performance, which can meet the needs of infrared targets intelligent detection and recognition under high dynamic conditions.