Two-stage object tracking method based on Siamese neural network

Zhang Hongwei; Li Xiaoxia; Zhu Bin; Zhang Yang

doi:10.3788/IRLA20200491

Through the introduction of deep learning, the accuracy and robustness of object tracking have been greatly improved. Siamese network based trackers can deal with various deformation of target through training on large-scale datasets, but that makes it difficult to eliminate the interference of similar targets. Therefore, a two-stage tracking method based on Siamese network was proposed. Firstly, the modified residual network was used to extract the deep feature with better performance. Through integrating the temporal information, the template of the region proposal network was adaptively updated through correlation filter modulation, so as to filter out the easily distinguished negative samples. Then, the fixed scale features of candidate regions were extracted by the region-of-interest pooling and fed to the verification network for more refined classification and regression. In order to improve the network's ability to discriminate difficultly distinguished samples, joined training method combining the positive and negative samples was adopted to improve the performance of feature matching. The performance of the proposed method was evaluated on the OTB100, VOT standard benchmarks and the UAV123 aerial benchmark. The experimental results demonstrate that the proposed method can significantly improve the performance of the baseline.

HTML

3. 总结与展望

文中在孪生网络的基础上，提出了一种基于孪生网络的两阶段跟踪方法。在RPN阶段，通过相关滤波调制和锚点结构设计，得到初步的目标候选框，获取的候选框经感兴趣池化层提取特征后输入到验证网络进行更精准化的分类和回归。相对于一阶段SiamRPN孪生网络，文中方法较好的解决了原来算法无法兼顾泛化能力与抗干扰性的问题，相关滤波调制加上两次目标框回归使模型具有更好的精确度。在多个标准测试集上的评测表明，文中方法在保证较快跟踪速度的前提下，跟踪精度与区分相似干扰物的能力大大提升。由于缺少长时间跟踪策略，模型在跟踪失败后无法对全图进行目标搜索，值得更进一步的研究。

Reference (29)

[1]	Smeulder A, Chu D, Cucchiara R, et al. Viusal tracking: An experimental survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1442-1468.
[2]	Hou Z Q, Han C X. A survey of visual tracking [J]. Acta Automatica Sinica, 2016, 32(4): 603-607. (in Chinese)
[3]	Wang N, Shi J, Yeung D-Y, et al. Understanding and diagnosing visual tracking systems[C]//IEEE International Conference on Computer Vision, 2015: 3101-3109.
[4]	亓贺. 光电图像制导系统中目标跟踪关键技术研究[D]. 北京理工大学, 2016: 1-4.	Qi He. Research on target tracking and key techniques of electro-optical image guidance system[D]. Beijing: Beijing Institute of Technology, 2016: 1-4. (in Chinese)
[5]	Yang Chunwei, Wang Shicheng, Liao Shouyi, et al. Forward-looking-infrared building object tracking based on sparse representation of covariance descriptor [J]. Infrared Technology, 2016, 38(5): 389-395. (in Chinese)
[6]	Hossain S, Lee D J. Deep learning-based real-time multiple-object detection and tracking from aerial imagery via a flying robot with GPU-Based embedded devices [J]. Sensors, 2019, 19(15): 1-2.
[7]	Qiu Z L, Zha Y F, Zhu P, et al. Visual tracking algorithm based on online feature discrimination with Siamese network [J]. Acta Optica Sinica, 2019, 39(9): 2247.
[8]	Li Yong, Yang Dedong, Han Yajun, et al. Siamese neural networks object tracking integrating [J]. Acta Optica Sinica, 2020, 40(4): 0415002. (in Chinese)
[9]	Shi Guoqiang, Zhao Xia. Object tracking algorithm based on jointly-optimized strong-coupled Siamese region proposal network [J]. Journal of Computer Applications, 2020, 40(10): 2822-2830. (in Chinese)
[10]	Bolme D, Beveridge J, Draper B, et al. Visual object tracking using adaptive correlation filters[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2010: 2544-2550.
[11]	Danelljan M, Khan F, Felsberg M, et al. Adaptive color attributes for real-time visual tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1090-1097.
[12]	Henriques J, Caserio R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]//European Conference on Computer Vision, 2012: 702-715.
[13]	Henriques J, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596.
[14]	Valmadre J, Bertinetto L, Henriques J F, et al. End-to-end representation learning for correlation filter based tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2805-2813.
[15]	Wang N, Yeung D. Learning a deep compact image representation for visual tracking[C]//Advances in Neural Information Processing Systems, 2013: 809-817.
[16]	Zhang K, Liu Q, Wu Y, et al. Robust visual tracking via convolutional networks without training[C]//IEEE Transactions on Image Processing, 2015: 1779-1792.
[17]	Wang L, Ouyang W, Wang X, et al. Visual tracking with fully convolutional networks[C]//IEEE International Conference on Computer Vision, 2015: 3119-3127.
[18]	Ma C, Huang J, Yang X, et al. Hierarchical convolutional features for visual tracking[C]//IEEE International Conference on Computer Vision, 2015: 3074-3082.
[19]	Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4293-4302.
[20]	Tao R, Gavves E, Smeulders A W. Siamese instance search for tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 903-909.
[21]	Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks[C]//European Conference on Computer Vision, 2016: 749-765.
[22]	Bertinetto L, Valmadre J, Henriques J F, et al. Fully convolutional siamese networks for object tracking[C]//Proceedings of the European Conference on Computer Vision Workshop, 2016: 850-865.
[23]	Guo Q, Feng W, Zhou C, et al. Learning dynamic Siamese network for visual object tracking[C]//IEEE International Conference on Computer Vision, 2017: 1781-1789.
[24]	Yang T, Chan A B. Recurrent filter learning for visual tracking[C]//IEEE International Conference on Computer Vision Workshops, 2018: 2010-2019.
[25]	Zhu Z, Wu W, Zou W, et al. End-to-end flow correlation tracking with spatial-temporal attention[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 548-557.
[26]	He A, Luo C, Tian X, et al. A twofold Siamese network for real-time object tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4834-4843.
[27]	Li B, Yan J, Wu W, et al. High performance visual tracking with siamese region proposal network[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8971-8980.
[28]	Zhu Z, Wang Q, Li B, et al. Distractor-aware siamese networks for visual object tracking[C]//IEEE European Conference on Computer Vision, 2018: 103-119.
[29]	Valmadre J, Bertinetto L, Henriques J F, et al. End-to-end representation learning for correlation filter based tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5000-5008.

	Baseline			Unsupervised
	A-R rank		EAO	Overlap	Speed
	Overlap	Failures	EAO	AUC	Normalized	FPS
Ours	0.601 1	14.515 9	0.383 3	0.533 9	3.496 1	20.245 1
LADCF	0.491 1	9.925 3	0.3811	0.418 2	0.123 0	0.557 3
MFT	0.491 9	10.766 2	0.379 4	0.391 7	0.194 5	0.623 2
DaSiamRPN	0.569 1	18.441 5	0.378 5	0.468 4	17.818 3	64.414 3
UPDT	0.515 4	11.417 2	0.371 9	0.444 4	0.088 4	0.469 7
RCO	0.498 9	10.700 4	0.371 1	0.383 0	0.204 6	0.720 3
SiamRPN	0.591 5	19.632 5	0.369 1	0.456 8	20.342 6	86.784 3
DRT	0.495 8	13.947 6	0.349 0	0.419 1	0.123 7	0.456 8
DeepSTRCF	0.506 2	14.548 6	0.338 3	0.433 3	0.560 5	3.114 4
CPT	0.488 8	16.620 7	0.332 1	0.375 7	0.877 1	5.184 2
SA_Siam_R	0.544 4	16.403 0	0.331 1	0.425 0	6.776 1	32.364 4
DLSTpp	0.529 7	14.937 4	0.321 3	0.497 8	1.293 0	8.175 9

Two-stage object tracking method based on Siamese neural network

doi: 10.3788/IRLA20200491

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views