[1] Jiang Guoqing, Wan Lanjun. Detection of dim and small infrared targets based on the most appropriate contrast saliency analysis [J]. Infrared and Laser Engineering, 2021, 50(4): 20200377. (in Chinese) doi:  10.3788/IRLA20200377
[2] Liu Gaoru, Sun Shengli, Lin Changqing. Two-dimensional spatial profile method for infrared dim point target background suppression [J]. Infrared Technology, 2019, 41(4): 329-334. (in Chinese)
[3] Zhang Congcong. Infrared dim small target detection method based on low rank background and sparse target characteristics [D]. Nanjing: Nanjing University of Science and Technology, 2018. (in Chinese)
[4] Huang Yuanyuan. Research on infrared dim small target detection algorithm based on local contrast [D]. Chongqing: Chongqing University of Posts and Telecommunications, 2020. (in Chinese)
[5] Zhao Yan, Liu Di, Zhao Lingjun. Infrared dim and small target detection based on YOLOv3 in complex environment [J]. Aero Weaponry, 2019, 26(6): 29-34. (in Chinese)
[6] Feng Xiaoyu, Mei Wei, Hu Dashuai. Air target detection based on improved fast R-CNN [J]. Acta Optica Sinica, 2018, 38(6): 0615004. (in Chinese) doi:  10.3788/AOS201838.0615004
[7] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer vision and Pattern Recognition (CVPR), 2016: 779-788.
[8] Redmon J, Farhadi A. YOLO9000: Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer vision and Pattern Recognition (CVPR), 2017: 7263-7271.
[9] Redmon J, Farhadi A. Yolov3: An incremental improvement [J]. arXiv, 2018: 1804.02767.
[10] Bochkovskiy A, Wang C Y, Liao H Y. YOLOv4: Optimal speed and accuracy of object detection [J]. arXiv, 2020: 2004.10934.
[11] Hui B, Song Z, Fan H. A dataset for infrared detection and tracking of dim-small aircraft targets underground/air background [J]. China Scientific Data, 2020, 5(3): 291-302.
[12] Misra D. Mish: A self-regularized non-monotonic neural activation function [J]. arXiv, 2019: 1908.08681.
[13] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer Assisted Intervention(MICCAI), 2015: 234–241.
[14] Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer vision and Pattern Recognition (CVPR), 2017: 2117-2125.
[15] Yuan W, Wang S, Li X, et al. A skip attention mechanism for monaural singing voice separation. [J]. IEEE Signal Processing Letters, 2019, 26(10): 1481-1485. doi:  10.1109/LSP.2019.2935867
[16] Fan Xiangsuo. Research on small target detection and tracking algorithm in image sequences[D]. Chengdu: University of Electronic Science and Technology, 2019. (in Chinese)
[17] Huang Z, Wang J, Fu X, et al. DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection [J]. Information Sciences, 2020, 522: 241-258. doi:  10.1016/j.ins.2020.02.067
[18] Choi J, Chun D, Kim H, et al. Gaussian YOLOv3: An accurate and fast object detector using localization uncertainty for autonomous driving[C]//Proceedings of the IEEE/CVF Inter-national Conference on Computer Vision (ICCV), 2019: 502-511.
[19] Chen L, Shi W, Deng D. Improved YOLOv3 based on attention mechanism for fast and accurate ship detection in optical remote sensing images [J]. Remote Sensing, 2021, 13(4): 660. doi:  10.3390/rs13040660
[20] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks [J]. arXiv, 2015: 1506.01497.
[21] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multi-box detector[C]//European Conference on Computer Vision (ECCV), 2016: 21-37.
[22] Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of the IEEE Conference on Computer vision and Pattern Recognition (CVPR), 2018: 4203-4212.
[23] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision, 2017: 2980-2988.