基于深度学习的图像描述研究

杨楠; 南琳; 张丁一; 库涛

doi:10.3788/IRLA201847.0203002

基于深度学习的图像描述研究

doi: 10.3788/IRLA201847.0203002

杨楠^1,2,
南琳^1,2,
张丁一^1,2, ,,
库涛^1,2

1.
中国科学院沈阳自动化研究所,辽宁沈阳 110016;
2.
中国科学院大学,北京 100049

基金项目:

国家科技支撑计划（2015BAF02B01）;中国科学院网络化控制系统重点实验室（2015BAF02B00）

详细信息

作者简介:
杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

通讯作者: 张丁一(1981-),女,副研究员,硕士生导师,博士,主要从事深度学习、模式识别、自然语言处理方面的研究。Email:Dy202@sia.cn

中图分类号: TP3

Research on image interpretation based on deep learning

Yang Nan^1,2,
Nan Lin^1,2,
Zhang Dingyi^{1,2
, ,},
Ku Tao^1,2

1.
Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;
2.
University of Chinese Academy of Sciences,Beijing 100049,China

摘要: 卷积神经网络（Convolution Neural Networks，CNN）和循环神经网络（Recurrent NeuralNetworks，RNN）在图像分类、计算机视觉、自然语言处理、语音识别、机器翻译、语义分析等领域取得了迅速的发展，引起了研究者对计算机自动生成图像描述的广泛关注。目前图像描述存在的主要问题有输入文本数据稀疏、模型存在过拟合、模型损失函数震荡难以收敛等问题。文中使用NIC作为基线模型，针对数据稀疏问题，改变了基线模型中的文本one-hot表示，使用word2vec对文本进行映射，为了防止过拟合，在模型中加入了正则项和使用Dropout技术，并在词序记忆方面取得创新，引入联想记忆单元GRU，用于文本生成。在试验中使用AdamOptimizer优化器进行参数迭代更新。实验结果表明：改进后的模型参数减少且收敛速度大幅加快，损失函数曲线更加平滑，损失最大降至2.91，模型的准确率比NIC提高了接近15%。实验有效地验证了在模型当中使用word2vec对文本进行映射可明显缓解数据稀疏问题，加入正则项和使用Dropout技术可有效防止模型过拟合，引入联想记忆单元GRU能够大幅减少模型训练参数，加快算法收敛速度，进而提高整个模型的准确率。
- 卷积神经网络 /
- 循环神经网络 /
- 门控循环单元 /
- 自然语言处理 /
- 图像描述
Abstract: Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) had developed rapidly in the fields of image classification, computer vision, natural language process, speech recognition, machine translation and semantic analysis, which caused researchers' close attention to computers' automatic generation of image interpretation. At present, the main problems in image description were sparse input text data, over-fitting of the model, difficult convergence of the model loss function, and so on. In this paper, NIC was used as a baseline model. For data sparseness, one-hot text in the baseline model was changed and word2vec was used to map the text. To prevent over-fitting, regular items were added to the model and Dropout technology was used. In order to make innovations in word order memory, the associative memory unit GRU for text generation was used. In experiment, the AdamOptimizer optimizer was used to update parameters iteratively. The experimental results show that the improved model parameters are reduced and the convergence speed is significantly faster, the loss function curves are smoother, the maximum loss is reduced to 2.91, and the model accuracy rate increases by nearly 15% compared with the NIC. Experiments validate that the use of word2vec to map text in the model obviously alleviates the data sparseness problem. Adding regular items and using Dropout technology could effectively prevent over-fitting of the model. The introduction of associative memory unit GRU could greatly reduce the model trained parameters and speed up the algorithm of convergence rate, improve the accuracy of the entire model.
- convolution neural networks /
- recurrent neural networks /
- gated recurrent unit /
- natural language processing /
- image description

[1]	Farhadi A, Hejrati M, Sadeghi M A, et al. Every picture tells a story generating sentences from images[J]. ECCV, 2010, 21(10):15-29.
[2]	Xu Feng, Lu Jiangang, Sun Youxian. Application of neural network in image processing[J]. Chinese Journal of Information and Control, 2003, 4(1):344-351. (in Chinese)许锋, 卢建刚, 孙优贤. 神经网络在图像处理中的应用[J]. 信息与控制, 2003, 4(1):344-351.
[3]	Kulkarni G, Premraj V, Dhar S, et al. Baby talk:Understanding and generating simple image descriptions[J]. CVPR, 2014, 35(12):1601-1608.
[4]	Cho K, van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. EMNLP, 2014, 14(6):1078-1093.
[5]	Vinyals O, Toshev A, Bengio S, et al. Show and tell:A neural image caption generator[C]//Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015:3156-3164.
[6]	Alex Krizhevsky, Ⅱya Sutskever, Geoffrey Hinton. Imagenet classification with deep convolution neural networks[C]//Proceedings of Advances Neural Information Processing Systems(NLPS), 2012:1097-1105.
[7]	Sermanet P, Eigen D, Zhang X, et al. Overfeat:Integrated recognition, localization and detection using convolutional networks[J]. Computer Vision and Pattern Recognition, 2013, arXiv preprint arXiv:1312.6229.
[8]	Gerber R, Nagel H H. Knowledge representation for the generation of quantified natural language description of vehicle traffic in image sequence[C]//Proceeding of the IEEE International Conference on Image Processing, 1996:805-808.
[9]	Yao B Z, Yang X, Lin L, et al. I2t:Image parsing to text description[C]//Proceedings of the IEEE, 2010, 98(8):1485-1508.
[10]	Li S, Kulkarni G, Berg T L, et al. Composing simple image descriptions using web-scale n-grams[C]//Proceeding of the Conference on Computational Natural Language Learning, 2011.
[11]	Aker A, Gaizauskas R. Generating image descriptions using dependency relational patterns[C]//Proceedings of the Meeting of the Association for Computational Linguistics (ACL), 2010:49(9):1250-1258.
[12]	Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task:Data, models and evaluation metrics[C]//International Conference on Artificial Intelligence, 2013, 47(1):853-899.
[13]	Wen Ya, Nan Lin. Research on semantic analysis method of image based on natural language understanding[D]. Shenyang:Shenyang Institute of Automation, Chinese Academy of Sciences, 2017. (in Chinese)温亚, 南琳. 面向自然语言理解的图像语义分析方法研究[D]. 沈阳:中国科学院沈阳自动化研究所, 2017.

[1]	徐瑞书, 罗笑南, 沈瑶琼, 郭创为, 张文涛, 管钰晴, 傅云霞, 雷李华. 基于改进U-Net网络的相位解包裹技术研究 . 红外与激光工程, 2024, 53(2): 20230564-1-20230564-14. doi: 10.3788/IRLA20230564
[2]	陆建华. 融合CNN和SRC决策的SAR图像目标识别方法 . 红外与激光工程, 2022, 51(3): 20210421-1-20210421-7. doi: 10.3788/IRLA20210421
[3]	张逸文, 蔡宇, 苑莉薪, 胡明列. 基于循环神经网络的超短脉冲光纤放大器模型（特邀） . 红外与激光工程, 2022, 51(1): 20210857-1-20210857-7. doi: 10.3788/IRLA20210857
[4]	蒋筱朵, 赵晓琛, 冒添逸, 何伟基, 陈钱. 采用传感器融合网络的单光子激光雷达成像方法 . 红外与激光工程, 2022, 51(2): 20210871-1-20210871-7. doi: 10.3788/IRLA20210871
[5]	齐悦, 董云云, 王溢琴. 基于汇聚级联卷积神经网络的旋转人脸检测方法 . 红外与激光工程, 2022, 51(12): 20220176-1-20220176-8. doi: 10.3788/IRLA20220176
[6]	庄子波, 邱岳恒, 林家泉, 宋德龙. 基于卷积神经网络的激光雷达湍流预警 . 红外与激光工程, 2022, 51(4): 20210320-1-20210320-10. doi: 10.3788/IRLA20210320
[7]	李保华, 王海星. 基于增强卷积神经网络的尺度不变人脸检测方法 . 红外与激光工程, 2022, 51(7): 20210586-1-20210586-8. doi: 10.3788/IRLA20210586
[8]	刘瀚霖, 辛璟焘, 庄炜, 夏嘉斌, 祝连庆. 基于卷积神经网络的混叠光谱解调方法 . 红外与激光工程, 2022, 51(5): 20210419-1-20210419-9. doi: 10.3788/IRLA20210419
[9]	宦克为, 李向阳, 曹宇彤, 陈笑. 卷积神经网络结合NSST的红外与可见光图像融合 . 红外与激光工程, 2022, 51(3): 20210139-1-20210139-8. doi: 10.3788/IRLA20210139
[10]	张少宇, 伍春晖, 熊文渊. 采用门控循环神经网络估计锂离子电池健康状态 . 红外与激光工程, 2021, 50(2): 20200339-1-20200339-8. doi: 10.3788/IRLA20200339
[11]	盛家川, 陈雅琦, 王君, 韩亚洪. 深度学习结构优化的图像情感分类 . 红外与激光工程, 2020, 49(11): 20200269-1-20200269-10. doi: 10.3788/IRLA20200269
[12]	徐云飞, 张笃周, 王立, 华宝成. 非合作目标局部特征识别轻量化特征融合网络设计 . 红外与激光工程, 2020, 49(7): 20200170-1-20200170-7. doi: 10.3788/IRLA20200170
[13]	裴晓敏, 范慧杰, 唐延东. 多通道时空融合网络双人交互行为识别 . 红外与激光工程, 2020, 49(5): 20190552-20190552-6. doi: 10.3788/IRLA20190552
[14]	高泽宇, 李新阳, 叶红卫. 流场测速中基于深度卷积神经网络的光学畸变校正技术 . 红外与激光工程, 2020, 49(10): 20200267-1-20200267-10. doi: 10.3788/IRLA20200267
[15]	薛珊, 张振, 吕琼莹, 曹国华, 毛逸维. 基于卷积神经网络的反无人机系统图像识别方法 . 红外与激光工程, 2020, 49(7): 20200154-1-20200154-8. doi: 10.3788/IRLA20200154
[16]	刘鹏飞, 赵怀慈, 曹飞道. 多尺度卷积神经网络的噪声模糊图像盲复原 . 红外与激光工程, 2019, 48(4): 426001-0426001(9). doi: 10.3788/IRLA201948.0426001
[17]	张秀, 周巍, 段哲民, 魏恒璐. 基于卷积稀疏自编码的图像超分辨率重建 . 红外与激光工程, 2019, 48(1): 126005-0126005(7). doi: 10.3788/IRLA201948.0126005
[18]	姚旺, 刘云鹏, 朱昌波. 基于人眼视觉特性的深度学习全参考图像质量评价方法 . 红外与激光工程, 2018, 47(7): 703004-0703004(8). doi: 10.3788/IRLA201847.0703004
[19]	郭强, 芦晓红, 谢英红, 孙鹏. 基于深度谱卷积神经网络的高效视觉目标跟踪算法 . 红外与激光工程, 2018, 47(6): 626005-0626005(6). doi: 10.3788/IRLA201847.0626005
[20]	张腊梅, 陈泽茜, 邹斌. 基于3D卷积神经网络的PolSAR图像精细分类 . 红外与激光工程, 2018, 47(7): 703001-0703001(8). doi: 10.3788/IRLA201847.0703001

点击查看大图

计量

文章访问数: 627
HTML全文浏览量: 114
PDF下载量: 241
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于深度学习的图像描述研究

doi: 10.3788/IRLA201847.0203002

作者简介:
杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

通讯作者: 张丁一(1981-),女,副研究员,硕士生导师,博士,主要从事深度学习、模式识别、自然语言处理方面的研究。Email:Dy202@sia.cn

Research on image interpretation based on deep learning

计量

基于深度学习的图像描述研究

doi: 10.3788/IRLA201847.0203002

1. 中国科学院沈阳自动化研究所,辽宁沈阳 110016;

2. 中国科学院大学,北京 100049

作者简介:
杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

通讯作者: 张丁一(1981-),女,副研究员,硕士生导师,博士,主要从事深度学习、模式识别、自然语言处理方面的研究。Email:Dy202@sia.cn

English Abstract

Research on image interpretation based on deep learning

1. Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;

2. University of Chinese Academy of Sciences,Beijing 100049,China

全文HTML

目录

留言板

基于深度学习的图像描述研究

doi: 10.3788/IRLA201847.0203002

作者简介: 杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

通讯作者: 张丁一(1981-),女,副研究员,硕士生导师,博士,主要从事深度学习、模式识别、自然语言处理方面的研究。Email:Dy202@sia.cn

Research on image interpretation based on deep learning

计量

出版历程

基于深度学习的图像描述研究

doi: 10.3788/IRLA201847.0203002

1. 中国科学院沈阳自动化研究所,辽宁 沈阳 110016; 2. 中国科学院大学,北京 100049

作者简介: 杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

通讯作者: 张丁一(1981-),女,副研究员,硕士生导师,博士,主要从事深度学习、模式识别、自然语言处理方面的研究。Email:Dy202@sia.cn

English Abstract

Research on image interpretation based on deep learning

1. Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China; 2. University of Chinese Academy of Sciences,Beijing 100049,China

全文HTML

目录

作者简介:
杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

1. 中国科学院沈阳自动化研究所,辽宁沈阳 110016;

2. 中国科学院大学,北京 100049

作者简介:
杨楠(1994-),男,硕士生,主要从事深度学习、自然语言处理方面的研究。Email:yangnan@sia.cn

1. Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;

2. University of Chinese Academy of Sciences,Beijing 100049,China