Abstract:
Through the introduction of deep learning, the accuracy and robustness of object tracking have been greatly improved. Siamese network based trackers can deal with various deformation of target through training on large-scale datasets, but that makes it difficult to eliminate the interference of similar targets. Therefore, a two-stage tracking method based on Siamese network was proposed. Firstly, the modified residual network was used to extract the deep feature with better performance. Through integrating the temporal information, the template of the region proposal network was adaptively updated through correlation filter modulation, so as to filter out the easily distinguished negative samples. Then, the fixed scale features of candidate regions were extracted by the region-of-interest pooling and fed to the verification network for more refined classification and regression. In order to improve the network's ability to discriminate difficultly distinguished samples, joined training method combining the positive and negative samples was adopted to improve the performance of feature matching. The performance of the proposed method was evaluated on the OTB100, VOT standard benchmarks and the UAV123 aerial benchmark. The experimental results demonstrate that the proposed method can significantly improve the performance of the baseline.