裴晓敏, 范慧杰, 唐延东. 多通道时空融合网络双人交互行为识别[J]. 红外与激光工程, 2020, 49(5): 20190552. DOI: 10.3788/IRLA20190552
引用本文: 裴晓敏, 范慧杰, 唐延东. 多通道时空融合网络双人交互行为识别[J]. 红外与激光工程, 2020, 49(5): 20190552. DOI: 10.3788/IRLA20190552
Pei Xiaomin, Fan Huijie, Tang Yandong. Two-person interaction recognition based on multi-stream spatio-temporal fusion network[J]. Infrared and Laser Engineering, 2020, 49(5): 20190552. DOI: 10.3788/IRLA20190552
Citation: Pei Xiaomin, Fan Huijie, Tang Yandong. Two-person interaction recognition based on multi-stream spatio-temporal fusion network[J]. Infrared and Laser Engineering, 2020, 49(5): 20190552. DOI: 10.3788/IRLA20190552

多通道时空融合网络双人交互行为识别

Two-person interaction recognition based on multi-stream spatio-temporal fusion network

  • 摘要: 提出一种基于多通道时空融合网络的双人交互行为识别方法,对双人骨架序列行为进行识别。首先,采用视角不变性特征提取方法提取双人骨架特征,然后,设计两层级联的时空融合网络模型,第一层基于一维卷积神经网络(1DCNN)和双向长短时记忆网络(BiLSTM)学习空间特征,第二层基于长短时记忆网络(LSTM)学习时间特征,得到双人骨架的时空融合特征。最后,采用多通道时空融合网络分别学习多组双人骨架特征得到多通道融合特征,利用融合特征识别交互行为,各通道之间权值共享。将文中算法应用于NTU-RGBD人体交互行为骨架库,双人交叉对象实验准确率可达96.42%,交叉视角实验准确率可达97.46%。文中方法与该领域的典型方法相比,在双人交互行为识别中表现出更好的性能。

     

    Abstract: Two-person interaction recognition based on multi-stream spatio-temporal fusion was proposed. Firstly, a method to describe two-person’s skeleton which invariable with angle of view was proposed. Then a two-layer spatio-temporal fusion network model was designed. In the first layer, the spatial correlation features were obtained based on one-dimensional convolutional neural network (1DCNN) and bi-directional long short term memory(BiLSTM). In the second layer, the spatio-temporal fusion features were obtained based on LSTM. Finally, the multi-stream spatio-temporal fusion network was used to obtain the multi-stream fusion features, which learned one kind of feature by one stream and fusion features for all streams together at last. The weights for each stream was shared, and every stream had the same structure. After features were fusion for all streams, it could be used for interaction recognition. By applying this algorithm to NTU-rgbd datasets, the accuracy for two person interaction recognition for cross-subject could reach 96.42%, and the accuracy of two person interaction recognition for cross-view could reach 97.46%. Compared with the state of art methods in this field, this method performed best in two person interaction recognition.

     

/

返回文章
返回