-
基于目标重检测机制的相关滤波方法主要分为4个步骤:(1)利用视频初始帧中目标信息,进行特征提取并训练相关滤波器;(2)使用相关滤波器计算图像特征响应值,相关滤波器最大响应值与预设阈值
$\theta $ 作比较,若不低于$\theta $ ,取对应位置作为跟踪结果;(3)若相关滤波器最大响应值低于$\theta $ ,采用粒子滤波对目标重新定位,并进行尺度估计;(4)根据粒子滤波跟踪结果重新训练相关滤波器。文中方法流程图如图1所示。 -
为了解决循环移位操作引起的边界效应,增强滤波器的判别能力,BACF方法选择了更大的图像区域训练相关滤波器。此外,为了避免正样本中包含过多的背景信息,对正样本区域进行“背景剔除”。这一过程可以描述为:
$$E\left( h \right) = \frac{1}{2}\sum\limits_{j = 1}^D {\left\| {y\; - \;\sum\limits_{k = 1}^K {\alpha _k^{\rm{T}}\;{{P}}{x_k}\left[ {\Delta {\tau _j}} \right]\,} } \right\|_2^2} + \frac{\lambda }{2}\;\sum\limits_{k = 1}^K {\left\| {\;{\alpha _{_k}}} \right\|_2^2} $$ (1) 式中:
$y$ 为期望输出;$\;{\alpha _{_k}}$ 为第$k$ 个通道的滤波器;${x_k}$ 为第$k$ 个通道的样本;P为一个由0和1构成,大小为$D \times T$ 的矩阵;$[\Delta {\tau _j}]$ 为循环移位算子;$\lambda $ 为惩罚滤波器的正则项系数。矩阵P的引入,能够实现对正样本的准确提取,从而提高滤波器的判别能力。在视频序列初始帧中,搜索窗的大小为$M \times N$ 。为了提升方法效率,计算过程选择在频率域内进行,公式(1)可转化为:
$$\left\{ \begin{array}{l} E(\alpha ,\hat g) = \dfrac{1}{2}||\hat y - \hat X\,\hat g||_2^2 + \dfrac{\lambda }{2}||\alpha ||_2^2 \\ s.t.\;\hat g = \sqrt S (F{P^{\rm{T}}} \otimes {I_D})\alpha \\ \end{array} \right.$$ (2) 式中:
$\hat X$ 为$S \times DS$ 的特征矩阵;${I_D}$ 为一个$D \times D$ 的单位矩阵;$F$ 为频域计算得到的基本正交基,大小为$S \times S$ ;辅助变量$g$ 经过傅里叶变换后表示为$\hat g$ 。为了获取全局最优解,可以采用増广拉格朗日(ALM)方法将公式(2)进一步转化为:
$$\begin{split} L(h,\alpha ,\hat g) = &\dfrac{1}{2}||\hat y - \hat X\,\hat g||_2^2 + \dfrac{\lambda }{2}||\alpha ||_2^2+ \\ & {{\hat \xi }^{\rm{T}}}(\hat g - \sqrt S (F{P^{\rm{T}}} \otimes {I_D})\alpha ) +\\ &\dfrac{\mu }{2}||\hat g - \sqrt S (F{P^{\rm{T}}} \otimes {I_D})\alpha ||_2^2 \\ \end{split} $$ (3) 式中:
${\hat \xi ^T}$ 为经过傅里叶变换的拉格朗日向量;$\mu $ 为用来控制ALM方法收敛速度的惩罚因子。公式(3)最优解的过程由交替求解方法[21](ADMM)迭代方法完成,通过将全局优化问题拆成两个子问题,降低计算规模。根据公式(4)和公式(5)分别求解${\alpha ^*}$ 和${\hat g^*}$ 。$$\begin{split} {\alpha ^*} = &\mathop {\arg \min }\limits_h \bigg{\{} \dfrac{\lambda }{2}||\alpha ||_2^2 + {{\hat \varsigma }^T}(\hat g - \sqrt S (F{{{P}}^{\rm{T}}} \otimes {I_K})\alpha ) +\\ & \dfrac{\mu }{2}||\hat g - \sqrt S (F{{{P}}^{\rm{T}}} \otimes {I_D})\alpha ||_2^2 \bigg{\}} =\\ & {\left(\mu + \dfrac{\lambda }{{\sqrt S }}\right)^{ - 1}}(\mu g + \varsigma ) \\ \end{split} $$ (4) $$\begin{split} {{\hat g}^*} = &\mathop {\arg \min }\limits_{\hat g} \Bigg{\{} \dfrac{1}{2}||\hat y - \hat X\,\hat g||_2^2 +\\ & {{\hat \xi }^{\rm{T}}}(\hat g - \sqrt S (F{P^{\rm{T}}} \otimes {I_K})\alpha ) +\\ &\dfrac{\mu }{2}||\hat g - \sqrt S (F{P^{\rm{T}}} \otimes {I_K})\alpha ||_2^2 \Bigg{\}} \\ \end{split} $$ (5) 简化公式(5)中的矩阵求逆运算,变形得到公式(6),最终可以求得最优解。利用公式(7)更新相关滤波器,
$\eta $ 代表预先设置的滤波器的学习率。至此,可以根据相关滤波器定位目标所在位置。$$\begin{split} \hat g{\left( s \right)^*} = &{\left( {\hat I\left( s \right)\hat I{{\left( s \right)}^{\rm{T}}} + S\mu {I_K}} \right)^{ - 1}}\left( {\hat y\left( s \right)\hat I\left( s \right) - S\hat \xi \left( s \right) + \mu \hat \alpha \left( t \right)} \right) {\rm{ = }}\\ &\dfrac{1}{\mu }\left( {S\,\hat y\left( s \right)\hat I\left( s \right){\rm{ - }}\hat \xi \left( s \right){\rm{ + }}\mu \hat \alpha \left( t \right)} \right) {\rm{ - }} \\ &\dfrac{{\hat I\left( s \right)}}{{\mu \left( {\hat I\left( s \right){\rm{ + }}S\mu } \right)}}\left( {S\,\hat y\left( s \right)\hat I\left( s \right)\hat I{{\left( s \right)}^{\rm{T}}}{\rm{ - }}\hat I{{\left( s \right)}^{\rm{T}}}\hat \xi {\rm{ + }}\mu \hat I{{\left( s \right)}^{\rm{T}}}\alpha } \right) \\ \end{split} $$ (6) $${\hat x_{{\rm{model}}}^{\left( t \right)}} = (1 - \eta )\,{\hat x_{{\rm{model}}}^{\left( {t - 1} \right)}} + \eta \,{\hat x^{\left( t \right)}}$$ (7) -
粒子滤波利用大量带有权重的粒子可以实现对目标状态的动态拟合,得益于这一特性,将粒子滤波采样机制引入至BACF方法,以缓解目标发生自身运动、尺度变化等情况时,造成滤波器最大响应值弱化造成的跟踪性能下降甚至丢失目标的现象。
跟踪任务中,目标在
$t$ 时刻的状态向量表示为${x_t}$ ,观测向量表示为${{\textit{z}}_t}$ ,目标跟踪问题可以视为根据目标在$t - 1$ 时刻的状态信息预测目标在$t$ 时刻的状态信息,这一过程可以表示为:$$\begin{split} {x_t}\, = &\,{\rm{argmax}}\,p\left( {{x_t}\left| {{{\textit{z}}_{1:t}}} \right.} \right) = \\ &{\rm{ argmax}} {\int {p\left( {{x_t}\left| {{{\textit{z}}_{1:t}}} \right.} \right)p\left( {{x_{t - 1}}\left| {{{\textit{z}}_{1:t - 1}}} \right.} \right){\rm{d}}x} _{t - 1}} \\ \end{split} $$ (8) 根据贝叶斯推论,公式(8)进一步可以表示为:
$$p\left( {{x_t}|{{\textit{z}}_{1:t}}} \right) = \dfrac{{p\left( {{{\textit{z}}_t}|{x_t}} \right)p\left( {{x_{t - 1}}|{{\textit{z}}_{1:t - 1}}} \right)}}{{p\left( {{{\textit{z}}_t}|{{\textit{z}}_{1:t - 1}}} \right)}}$$ (9) 粒子滤波方法利用了n个具有权重的粒子采样获得先验状态分布
$p\left( {{x_t}|{{\textit{z}}_{1:t}}} \right)$ ,所有粒子权重的总和为1。粒子权重的更新方式如下:$$w_t^i = w_{t - 1}^i\dfrac{{p\left( {{{\textit{z}}_t}|x} \right)}}{{q\left( {x_t^i|x_{0:t - 1}^i,{{\textit{z}}_{1:t}}} \right)}}$$ (10) 在实际应用中,通常采用
$q\left( {{x_{t - 1}}|{x_t}} \right)$ 作为建议分布代替$q\left( {x_t^i|x_{0:t - 1}^i,{{\textit{z}}_{1:t}}} \right)$ 简化计算,因此公式(10)可以简化为:$$w_t^i = w_{t - 1}^ip\left( {{{\textit{z}}_t}|x_t^i} \right)$$ (11) 相关滤波方法定位目标依赖于相关滤波器最大响应,目标在运动过程中受到外界干扰引起外观产生不可知变化时,相关滤波最大响应值也会受到影响,此时,需要比较最大响应值与阈值
$\theta $ 的关系,若相关滤波器最大响应值大于等于$\theta $ ,继续使用相关滤波器跟踪,若小于$\theta $ ,则采用离子滤波器生成可靠粒子。利用粒子滤波重新检测目标(如图1中橘黄色矩形框中流程),由于相邻帧间目标变化程度比较轻微,因此,以上一帧跟踪结果所在位置为中心,在当前帧中依照高斯分布生成
$M$ 个粒子,对应$M$ 个与相关滤波器中搜索窗等大的图像块。提取每个图像块对应的HOG特征,并在频域中计算其与相关滤波器的相关响应值,最后选取具有最大响应值的粒子作为预测目标中心。粒子滤波的引入,可以实现在更大范围内搜索最优候选样本,为实现准确跟踪提供了更多可能性。 -
粒子滤波获取目标中心后,需要对目标的尺度信息进行估计。目标在初始帧的尺寸大小表示为
$si{\textit{z}}{e_1} = \left( {{h_1},{w_1}} \right)$ ,由于视频序列具有时空连续性,同一目标在连续两帧之间的运动幅度较小,可根据目标在前一帧中的尺度信息进行计算:$${d_t} = \frac{{max{R_t}}}{{max{R_{t - 1}}}} - \frac{{max{R_{t - 1}}}}{{max{R_{t - 2}}}}$$ (12) 式中:
$max{R_t}$ 表示在第$t$ 帧中相关滤波器的最大响应值。在粒子滤波重检测阶段,引入两个尺度阈值$\phi $ 、$\psi $ 、判别目标尺度的变化趋势。若${d_t} > \phi $ ,表示目标尺度小于前一帧尺寸;若${d_t} < \psi $ ,那么则说明目标尺度逐渐变大。尺度变化的表达方式如下:$$si{\textit{z}}{e_t} = si{\textit{z}}{e_t}\cdot {s_t},\quad {s_t}{\rm{ = }}\left\{ \begin{array}{l} 0.95,\;{d_t} > \phi \\ 1.05,\;{d_t} < \psi \\ 1,\;{\text{其他}} \end{array} \right.$$ (13) 式中:
$si{\textit{z}}{e_t}$ 表示目标在第$t$ 帧的尺寸;${s_t}$ 表示尺度缩放因子。文中$\phi $ 的值设为0.1,$\psi $ 设为−0.1。通过上述设置,完善了目标重新定位后的尺度估计,有利于提高跟踪方法的成功率。 -
为了验证文中方法的有效性,使用OTB2013[22]和OTB2015[23]以及VOT2016[24]3个标准数据集对文中方法进行评估。文中所有实验是在Windows10以及Matlab2017b的软件环境,以及Intel i7-8700k CPU(3.7 GHz),32 G内存的硬件环境下完成的。
-
OTB2013数据集包含50个视频图像序列,在此基础上,OTB2015数据集将数目扩充至100个。这两个数据集利用11种属性标注视频序列,包括低分辨率(LR),超出平面旋转(OPR),尺度变化(SV),遮挡(OCC),外观形变(DEF),运动模糊(MR),快速运动(FM),平面内旋转(IPR),出视野(OV),背景干扰(BC),光照变化(IV)。每个视频可以包含多重属性。OTB2013和OTB2015数据集准确率和成功率两个指标完成对6种方法的综合评估。准确率是方法跟踪结果的中心像素与真实值中心像素的欧氏距离,反映跟踪框与真实框之间的偏离程度。成功率使用跟踪框与真实框交集与并集之间的比值,描述跟踪框对跟踪目标的覆盖程度。
VOT2016数据集包含60组视频序列,并对以6种属性对所有序列加以标注,分别为:相机抖动、光照变化、目标尺寸变化、目标位移模糊和未退化。VOT2016数据集主要采用的评价指标有:精确度A (Accuracy)反应预测框与真实标注框之间的覆盖情况,鲁棒性R (Robustness)统计跟踪算法丢失目标的次数。预期平均覆盖率EAO (Excepted average overlap)计算跟踪算法在同一视频序列非重置重叠区域的期望。此外,VOT2016数据库在评测算法时,采用了跟丢重新启动机制,如果某一帧中检测到跟踪器完全覆盖不到目标,5帧之后将利用真实值重新初始化算法。
-
文中方法是基于BACF的改进工作,因此,通过对比两种算法的实验结果来验证改进方法的有效性,并对选取比较有代表性的序列进行分析,如图2所示,红色矩形框表示文中方法,绿色表示基准BACF方法。在视频序列ironman中,目标发生了较为明显的尺度及旋转变化,基准算法在第11帧时可以成功跟踪目标,57帧后,基准算法完全丢失目标,直至序列截止,基准方法并未重新找到目标正确位置。这说明文中重检测机制预判跟踪结果的有效性,通过预设阈值提前判断跟踪策略,有效避免了由于相关滤波跟踪结果不可信导致的目标持续丢失问题。
图 2 文中方法(红色)与基准方法(绿色)对比
Figure 2. Comparison results of the proposed method (red) and the baseline (green)
Matrix序列展示了目标在光照变化,运动模糊,面内旋转等挑战下的直观效果。目标外观发生丰富多样的变化,对算法捕捉模型变化能力具有一定的要求,在该视频序列的49帧,文中方法出现了轻微偏离,但这一情况在后续帧中得到了改善,证明了粒子滤波采样机制在跟踪任务中的优势。
Skating序列中目标受到了相似背景信息及光照变化等干扰,目标在这一序列中,跟踪目标发生的较为明显的肢体变化,目标从远处逐步进入到人群之后,随后离开人群。文中方法可以有效应对源自不同外观变化,从而得到较为稳定的跟踪结果。
-
(1) OTB2013数据集跟踪结果
OTB2013数据集上选取了5种经典方法作为对比,分别为:ECO[15]、BACF[16],SRDCF[13]、DSST[10]、KCF[25]。根据实验结果绘制的准确率曲线与成功率曲线如图3(a)、3(b)所示。文中方法在准确率和覆盖率上均取得了最高的分数。准确率上,文中方法以0.3%的细微优势领先于ECO方法,对其他方法的领先差距按照BACF、SRDCF、DSST、KCF方法的次序依次增大。在成功率上,文中方法领先BACF方法及ECO方法0.5%,较其余三种方法优势明显。各个属性的跟踪结果如表1所示。表中的每个单元包含两个数据,前者为准确率,后者为成功率。下划线标志表示在所在某属性下指标最好的方法,加粗字体表示指标次好的方法。在超出平面旋转(OPR)属性上准确率超出BACF方法4.8%,成功率领先4.3%。文中方法在目标出视野(OV)属性准确率高于BACF方法1.9%,成功率领先3%。验证了引入粒子滤波重新检测机制的有效性。
表 1 OTB2013数据集各属性对比
Table 1. Attributes comparisons on OTB2013 dataset
Attribute/Name Proposed method ECO DSST SRDCF KCF BACF LR 75.5/68.9 76.0/66.2 54.8/28.8 63.8/60.4 54.6/30.1 70.8/63.9 OPR 77.1/71.1 75.9/69.6 59.5/46.6 69.5/62.8 60.2/47.3 72.3/66.8 SV 78.5/72.4 78.0/71.0 61.6/38.8 72.4/65.8 57.7/37.2 73.8/69.2 OCC 72.2/64.3 77.0/69.3 62.8/43.3 69.2/59.7 59.2/45.1 72.0/64.2 DEF 82.3/70.7 76.5/67.9 62.3/45.9 71.4/62.9 60.5/45.8 76.9/70.5 MB 71.5/68.1 71.0/65.8 52.0/38.2 73.7/66.7 55.4/42.3 69.1/66.2 FM 73.1/68.4 76.1/70.3 49.9/36.1 74.3/68.8 53.9/39.6 73.7/70.3 IPR 73.3/67.7 67.3/59.8 61.6/49.1 62.3/56.6 58.9/46.0 68.5/62.6 OV 71.1/62.0 73.1/61.7 41.6/32.3 57.9/51.9 44.1/37.4 69.2/59.0 BC 74.4/69.1 76.6/71.7 68.9/54.6 73.6/63.6 62.5/50.0 71.5/66.1 IV 76.5/70.0 71.6/66.7 70.8/49.0 74.4/66.4 67.0/46.4 73.3/69.6 (2) OTB2015数据集跟踪结果
图4展示了文中方法与ECO[15]、BACF[16]、SRDCF[13]、DSST[10],KCF[25]等方法在OTB2015数据集上的准确度曲线与成功率曲线,文中方法在OTB2015数据集上准确率曲线逊色于ECO方法,覆盖率曲线表现良好。
6种方法在OTB2015数据集中的11种属性的实验结果如表2所示。下划线标志表示在所在某属性下指标最好的方法,加粗字体表示指标次好的方法。文中方法在11个属性上的准确率测试中,拥有6个最优,3个次优;在成功覆盖率测试中,6个属性达到最优,4个属性表现次优。处理尺度变化(SV)属性的视频序列时,文中方法准确率领先BACF方法4.6%,成功率领先3.4%。平面内旋转(IPR)属性成功率领先BACF方法4.2%,成功率领先4.8%。实验数据证明了基于相关滤波的目标重检测跟踪方法具有很好的鲁棒性。
表 2 OTB2015数据集各属性对比
Table 2. Attributes comparisons on OTB2015 dataset
Attribute/Name Proposed method ECO DSST SRDCF KCF BACF LR 78.4/70.3 80.1/66.2 56.5/32.1 66.8/62.8 57.7/36.5 71.8/65.9 OPR 82.8/76.0 81.7/74.4 66.3/51.2 75.0/67.6 67.6/53.7 78.0/71.8 SV 82.6/75.3 80.8/73.4 66.8/43.1 75.1/67.5 64.6/42.9 78.0/71.9 OCC 75.9/71.5 79.9/74.8 62.8/49.0 72.2/66.8 63.5/53.2 72.9/69.5 DEF 83.1/73.9 81.2/73.9 59.8/46.7 74.8/67.7 63.0/52.0 78.9/71.3 MB 79.5/78.1 77.9/75.2 59.1/50.1 77.5/73.7 60.5/52.4 74.8/73.7 FM 78.3/73.7 80.3/75.1 59.0/46.7 77.3/72.0 62.9/50.4 79.5/76.1 IPR 79.5/72.6 75.3/66.5 71.2/56.3 72.2/64.4 71.0/57.2 75.3/67.8 OV 73.0/66.9 76.3/67.0 45.7/38.0 59.0/53.5 48.7/42.7 72.4/64.6 BC 81.0/76.8 83.6/78.5 72.0/57.2 78.4/70.3 71.6/60.2 77.6/73.5 IV 81.5/77.7 78.7/75.4 73.3/55.3 78.1/73.7 71.3/53.8 80.3/77.9 (3) VOT2016数据集跟踪结果
图5(a)展示了文中方法和BACF、SRDCF、DSST、ECO、KCF以及MOSSE共7种方法在VOT2016数据集中60组序列的EAO曲线对比结果,图5(b)呈现了EAO数值排名情况。表3和表4分记录了7种算法在VOT2016数据集中对应6种视频属性的精确度得分和鲁棒性得分,每种属性得分第一名用下划线标注,第二名用粗体标注。
图 5 文中方法与同类方法在VOT2016数据集上的对比结果
Figure 5. Comparisons results of the proposed method against correlation filter based methods on VOT2016
表 3 VOT2016数据集精确度得分
Table 3. Accuracy scores on VOT2016 dataset
Name/Attribute Camera motion Illumination change Motion change Occlusion Size change Mean Weighted mean Proposed methed 0.5900 0.6627 0.5147 0.4886 0.5215 0.5641 0.5727 BACF 0.4986 0.6924 0.4412 0.4413 0.4586 0.5265 0.4976 SRDCF 0.5909 0.6872 0.4900 0.4206 0.5053 0.5415 0.5377 DSST 0.5544 0.6765 0.4896 0.403 0.5194 0.383 0.5367 KCF 0.5034 0.4540 0.4219 0.4638 0.3625 0.4517 0.4610 ECO 0.5858 0.6597 0.4918 0.4254 0.5151 0.5465 0.5503 MOSSECA 0.4708 0.3911 0.3691 0.3639 0.3392 0.4086 0.4287 表 4 VOT2016数据集鲁棒性得分
Table 4. Robustness scores on VOT2016 dataset
Name/Attribute Camera motion Illumination change Motion change Occlusion Size change Mean Weighted mean Proposed methed 18.00 2.00 22.00 16.00 10.00 13.6667 15.6584 BACF 42.00 9.00 8.00 49.00 18.00 28.00 32.5734 SRDCF 34.00 8.00 31.00 20.00 20.00 21.1667 24.1220 DSST 49.00 6.00 6.00 50.00 18.00 28.00 28.6667 KCF 54.00 8.00 56.00 24.00 28.00 34.00 40.9333 ECO 15.00 0.00 12.00 17.00 7.00 9.1667 10.0788 MOSSECA 55.00 11.00 52.00 20.00 36.00 32.3333 38.1698 观察图5(a)中EAO曲线,可以发现文中方法(红色)明显优于基准BACF方法(绿色),验证了粒子滤波重新检测机制的有效性。图5(b)给出了各个算法EAO得分排序,文中方法的表现仅次于使用深度特征的ECO方法,较其他使用手工特征的相关滤波方法仍具有一定优势。此外,表3及表4中记录数据表明,在精确度及鲁棒性测试中,文中方法均有良好的表现,可以很好应对视频序列中相机抖动、目标抖动及尺寸变化等挑战。
-
图6展示了OTB2013和OTB2015数据集的部分跟踪结果,利用颜色对不同方法加以区分。
Box3序列测试是相似背景干扰和目标出视野时算法的跟踪效果。在该组序列运行一段时间后,只有文中方法可以准确跟踪目标,其他方法跟踪框出现了一定程度的偏离,直至丢失目标。Jumping序列检测算法在快速运动、运动模糊等挑战下的跟踪结果。KCF方法、DSST方法在该序列表现欠佳。Human3、Human9序列测试方法在目标发生形变和尺度变化情况的性能。文中方法在这两组序列上均实现了对目标的准确跟踪。Dragonbaby序列中,目标在与玩偶的互动中具有平面内旋转、平面外旋转、尺度变化等测试属性,文中方法可以很好地捕捉跟踪目标旋转时的特征变换。Bird1序列中候鸟在穿越云层时被完全遮挡,目标出视野,后续序列中目标又重新出现,文中方法实现了对候鸟的准确跟踪,再次验证了重新检测机制的有效性[26]。Tiger2序列具有遮挡、形变等测试属性,所有方法均可以成功跟踪,但跟踪框的覆盖率仍有提升空间。Skiing序列中,滑雪运动员在光照强烈变化的背景下高速地旋转运动,所有方法该序列上均发生漂移,甚至完全丢失目标。
-
图7展示了文中方法与基准方法BACF、深度学习方法SimaFC[27]、SiamRPN[28]、SiamVGG[29]、UpdateNet[30],以及MCPF[31]方法在VOT2016数据集的对比结果,依照图7(b)中的排序,排在第一位的方法是SiamRPN,文中方法位于第三位,相较于其他方法具备一定优势。
-
为了评估粒子数目对算法速度的影响,选取了不同粒子数进行实验,并利用方法在OTB2015数据集上的平均每秒帧数(FPS)来评估算法速度,实验结果如表5所示。
表 5 粒子数目对算法速度的影响
Table 5. Effect of particle numbers on speed performance
Particle numbers 20 30 40 50 Speed/FPS 31.2 28.3 26.8 19.9 -
当目标遇到与相似背景干扰或被遮挡时,文中方法未能很好的实现准确跟踪,如图8所示,造成上述问题的主要有两点原因:第一是相关滤波采用的手工特征并不能为目标提供多层次的特征表达,影响了模型的判别能力。第二是遮挡物信息被视为目标的一部分,更新滤波器的过程中传导了偏差信息,导致跟踪失败。未来的研究工作可通过结合深度模型中的注意力机制更多关注目标自身的特征,以及通过引入未被污染的跟踪模板作为参考,弥补文中方法的不足之处。
Target redetection method for object tracking based on correlation filter
-
摘要: 近年来,相关滤波方法由于具备运算速度快,鲁棒性强的优势,在目标跟踪领域发展迅速。然而,面对复杂场景时,现有模型难以满足实际需求。针对背景感知相关滤波方法(BACF)在目标发生自身旋转、尺度变换、运动出视野等挑战下,相关滤波器最大响应值减弱,造成跟踪精度下降的问题,提出了一种基于相关滤波的目标重检测跟踪方法。在原有背景感知相关滤波方法的基础上,引入滤波器响应检测机制,当判定到相关滤波跟踪结果不可信时,利用粒子滤波采样策略生成大量粒子,感知目标状态,重新确定目标中心位置。在此基础上,利用自适应尺度估计机制重新计算目标尺度信息,从而实现对目标的重新跟踪。为了验证改进算法的有效性,实验选取了OTB2013、OTB2015、VOT2016共3个公开数据集进行测试,同时与相关滤波及深度学习方法进行对比,从视频属性、跟踪精确度、算法鲁棒性等角度展示所有算法的性能。实验结果表明:基于相关滤波的目标重检测跟踪方法在3个公开数据集中取得较好的实验结果,并在目标发生旋转,尺度变换及运动超出视野的情况下,有效提高了BACF的准确率和成功率。Abstract: In recent years, due to the advantages of fast speed and strong robustness, correlation filter based methods have been developed rapidly in the tracking community. However, when the existing models are used to deal with complex scenes, it is difficult to meet the requirements of practical application. The background aware correlation filter (BACF) suffers from the maximum response weakening problem when handling the challenging scenes, such as rotation of the target appearance, scale variation and out of view, thus result in inaccurate tracking result. In order to tackle these problems, a target redetection method for visual tracking based on correlation filter was proposed. On the basis of the background aware correlation filter, a correlation response detection mechanism was introduced to judge the quality of the tracking result generated by the correlation filter. After detecting the tracking result was not credible, a particle filter resampling strategy was exploited to generate abundant particles which was beneficial to perceive the state of the target, and the center of the target could be redetected. On this foundation, an adaptive scale estimation mechanism was adopted to calculate the size information for the target, by which the final tracking result could be obtained. To validate the effectiveness of the improved algorithm, the extensive experiments on three public datasets: OTB2013, OTB2015 and VOT2016 were conducted, meanwhile, several state-of-the-art trackers: correlation filter and deep learning based trackers were also chosen as comparison, and the performance of all the compared trackers was shown from the aspects of annotated video attributes, tracking accuracy, and robustness of the algorithms. Experimental results demonstrate that the proposed target redetection tracker achieve a favorable performance on these three datasets, meanwhile, it effectively improves the accuracy and success rate of the BACF when handling the challenging situations of target rotation, scale variation, and out of view.
-
Key words:
- object tracking /
- correlation filters /
- particle filters
-
表 1 OTB2013数据集各属性对比
Table 1. Attributes comparisons on OTB2013 dataset
Attribute/Name Proposed method ECO DSST SRDCF KCF BACF LR 75.5/68.9 76.0/66.2 54.8/28.8 63.8/60.4 54.6/30.1 70.8/63.9 OPR 77.1/71.1 75.9/69.6 59.5/46.6 69.5/62.8 60.2/47.3 72.3/66.8 SV 78.5/72.4 78.0/71.0 61.6/38.8 72.4/65.8 57.7/37.2 73.8/69.2 OCC 72.2/64.3 77.0/69.3 62.8/43.3 69.2/59.7 59.2/45.1 72.0/64.2 DEF 82.3/70.7 76.5/67.9 62.3/45.9 71.4/62.9 60.5/45.8 76.9/70.5 MB 71.5/68.1 71.0/65.8 52.0/38.2 73.7/66.7 55.4/42.3 69.1/66.2 FM 73.1/68.4 76.1/70.3 49.9/36.1 74.3/68.8 53.9/39.6 73.7/70.3 IPR 73.3/67.7 67.3/59.8 61.6/49.1 62.3/56.6 58.9/46.0 68.5/62.6 OV 71.1/62.0 73.1/61.7 41.6/32.3 57.9/51.9 44.1/37.4 69.2/59.0 BC 74.4/69.1 76.6/71.7 68.9/54.6 73.6/63.6 62.5/50.0 71.5/66.1 IV 76.5/70.0 71.6/66.7 70.8/49.0 74.4/66.4 67.0/46.4 73.3/69.6 表 2 OTB2015数据集各属性对比
Table 2. Attributes comparisons on OTB2015 dataset
Attribute/Name Proposed method ECO DSST SRDCF KCF BACF LR 78.4/70.3 80.1/66.2 56.5/32.1 66.8/62.8 57.7/36.5 71.8/65.9 OPR 82.8/76.0 81.7/74.4 66.3/51.2 75.0/67.6 67.6/53.7 78.0/71.8 SV 82.6/75.3 80.8/73.4 66.8/43.1 75.1/67.5 64.6/42.9 78.0/71.9 OCC 75.9/71.5 79.9/74.8 62.8/49.0 72.2/66.8 63.5/53.2 72.9/69.5 DEF 83.1/73.9 81.2/73.9 59.8/46.7 74.8/67.7 63.0/52.0 78.9/71.3 MB 79.5/78.1 77.9/75.2 59.1/50.1 77.5/73.7 60.5/52.4 74.8/73.7 FM 78.3/73.7 80.3/75.1 59.0/46.7 77.3/72.0 62.9/50.4 79.5/76.1 IPR 79.5/72.6 75.3/66.5 71.2/56.3 72.2/64.4 71.0/57.2 75.3/67.8 OV 73.0/66.9 76.3/67.0 45.7/38.0 59.0/53.5 48.7/42.7 72.4/64.6 BC 81.0/76.8 83.6/78.5 72.0/57.2 78.4/70.3 71.6/60.2 77.6/73.5 IV 81.5/77.7 78.7/75.4 73.3/55.3 78.1/73.7 71.3/53.8 80.3/77.9 表 3 VOT2016数据集精确度得分
Table 3. Accuracy scores on VOT2016 dataset
Name/Attribute Camera motion Illumination change Motion change Occlusion Size change Mean Weighted mean Proposed methed 0.5900 0.6627 0.5147 0.4886 0.5215 0.5641 0.5727 BACF 0.4986 0.6924 0.4412 0.4413 0.4586 0.5265 0.4976 SRDCF 0.5909 0.6872 0.4900 0.4206 0.5053 0.5415 0.5377 DSST 0.5544 0.6765 0.4896 0.403 0.5194 0.383 0.5367 KCF 0.5034 0.4540 0.4219 0.4638 0.3625 0.4517 0.4610 ECO 0.5858 0.6597 0.4918 0.4254 0.5151 0.5465 0.5503 MOSSECA 0.4708 0.3911 0.3691 0.3639 0.3392 0.4086 0.4287 表 4 VOT2016数据集鲁棒性得分
Table 4. Robustness scores on VOT2016 dataset
Name/Attribute Camera motion Illumination change Motion change Occlusion Size change Mean Weighted mean Proposed methed 18.00 2.00 22.00 16.00 10.00 13.6667 15.6584 BACF 42.00 9.00 8.00 49.00 18.00 28.00 32.5734 SRDCF 34.00 8.00 31.00 20.00 20.00 21.1667 24.1220 DSST 49.00 6.00 6.00 50.00 18.00 28.00 28.6667 KCF 54.00 8.00 56.00 24.00 28.00 34.00 40.9333 ECO 15.00 0.00 12.00 17.00 7.00 9.1667 10.0788 MOSSECA 55.00 11.00 52.00 20.00 36.00 32.3333 38.1698 表 5 粒子数目对算法速度的影响
Table 5. Effect of particle numbers on speed performance
Particle numbers 20 30 40 50 Speed/FPS 31.2 28.3 26.8 19.9 -
[1] Mei X, Ling H. Robust visual tracking using L1 minimization[C]//IEEE, International Conference on Computer Vision. DBLP, 2009: 1436-1443. [2] Bao C, Wu Y, Ling H, et al. Real time robust L1 tracker using accelerated proximal gradient approach[C]//IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2012: 1830-1837. [3] Zhang T, Bibi A, Ghanem B. In defense of sparse tracking: Circulant sparse tracker[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3880-3888. [4] Zhan J, Wu H, Zhang H, et al. Cascaded probabilistic tracking with supervised dictionary learning [J]. Signal Processing: Image Communication, 2015, 39: 212-225. doi: 10.1016/j.image.2015.09.002 [5] Zhang T, Jia K, Xu C, et al. Partial occlusion handling for visual tracking via robust part matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1258-1265. [6] Bolme D S, Beveridge J R, Draper B A, et al. Visual object tracking using adaptive correlation filters[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 2544-2550. [7] Danelljan M, Shahbaz Khan F, Felsberg M, et al. Adaptive color attributes for real-time visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1090-1097. [8] Ma C, Huang J B, Yang X, et al. Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 3074-3082. [9] Li Y, Zhu J. A scale adaptive kernel correlation filter tracker with feature integration[C]//European Conference on Computer Vision, 2014: 254-265. [10] Danelljan M, Häger G, Khan F, et al. Accurate scale estimation for robust visual tracking[C]//British Machine Vision Conference, 2014. [11] Hong Z, Chen Z, Wang C, et al. Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 749-758. [12] Ma C, Yang X, Zhang C, et al. Long-term correlation tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5388-5396. [13] Danelljan M, Hager G, Shahbaz Khan F, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision, 2015: 4310-4318. [14] Danelljan M, Robinson A, Khan F S, et al. Beyond correlation filters: Learning continuous convolution operators for visual tracking[C]//European Conference on Computer Vision, 2016: 472-488. [15] Danelljan M, Bhat G, Khan F S, et al. ECO: Efficient Convolution Operators for Tracking[C]//CVPR, 2017, 1(2): 3. [16] Galoogahi H K, Fagg A, Lucey S. Learning background-aware correlation filters for visual tracking[C]//ICCV, 2017: 1144-1152. [17] Xu T, Feng Z H, Wu X J, et al. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5596-5609. DOI: 10.1109/TIP.2019.2919201. [18] Dai K, Wang D, Lu H, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 4670-4679. [19] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. [20] Mueller M, Smith N, Ghanem B. Context-aware correlation filter tracking[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2(3): 6. [21] Wu Y, Lim J, Yang M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. [22] Wu Y, Lim J, Yang M H. Online object tracking: A benchmark[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013: 2411-2418. [23] Wu Y, Lim J, Yang M H. Object tracking benchmark [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. [24] Xu T, Feng Z H, Wu X J, et al. Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5596-5609. [25] Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596. [26] Ma C, Yang X, Zhang C, et al. Long-term correlation tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 5388-5396. [27] Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//European Conference on Computer Vision, 2016: 850-865. [28] Li B, Yan J, Wu W, et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 8971-8980. [29] Li Y, Zhang X. SiamVGG: Visual tracking using deeper siamese networks[J]. arXiv: Computer Vision and Pattern Recognition, 2019. [30] Zhang T, Xu C, Yang M H. Multi-task correlation particle filter for robust object tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4335-4343. [31] Zhang L, Gonzalez-Garcia A, Weijer J, et al. Learning the model update for siamese trackers[C]//Proceedings of the IEEE International Conference on Computer Vision, 2019: 4010-4019.