Mixed-precision quantization for neural networks based on error limit (<i>Invited</i>)

Li Yiduo; Guo Zibo; Liu Kai; Sun Xiaoyao

doi:10.3788/IRLA20220166

Volume 51 Issue 4

May 2022

Turn off MathJax

Article Contents

Article Navigation > Infrared and Laser Engineering > 2022 > 51(4): 20220166

Li Yiduo, Guo Zibo, Liu Kai, Sun Xiaoyao. Mixed-precision quantization for neural networks based on error limit (Invited)[J]. Infrared and Laser Engineering, 2022, 51(4): 20220166. doi: 10.3788/IRLA20220166

Citation:

Li Yiduo, Guo Zibo, Liu Kai, Sun Xiaoyao. Mixed-precision quantization for neural networks based on error limit (Invited)[J]. Infrared and Laser Engineering, 2022, 51(4): 20220166. doi: 10.3788/IRLA20220166

Mixed-precision quantization for neural networks based on error limit (Invited)

doi: 10.3788/IRLA20220166

School of Computer Science and Technology, Xidian University, Xi'an 710071, China

Received Date: 2022-03-10
Rev Recd Date: 2022-04-11
Accepted Date: 2022-04-11
Publish Date: 2022-05-06

Abstract

The deep learning algorithm based on convolutional neural network exhibits excellent performance, but also brings a complex amount of data and calculation. A large amout of storage and computing overhead has alse become the biggest obstacle to the deployment of such algorithms in hardware platforms.The neural network model quantization uses low-precision fixed-point numbers instead of high-precision floating-point numbers in the original model, which can effectively compress the model size, reduce hardware resource overhead, and improve model inference speed on the premise of losing less precision. Most of the existing quantization methods quantize the data of each layer to the same accuracy, while mixed-precision quantization sets different quantization accuracy according to the data distribution of different layers, aiming to achieve a higher model accuracy under the same compression ratio, but finding a suitable mixed-precision quantization strategy is still very difficult. Therefore, a mixed-precision quantization strategy based on error limitation was proposed. By uniformly and proportionally limiting the scaling factors in each layer of the neural network, the quantization accuracy of each layer was determined, and the truncation method was used to linearly quantize the weights and activate to low-precision fixed-point numbers. Under the same compression radio, this method had higher accuracy than the unified precision quantization method. Secondly, the classical object detection algorithm YOLOV5s based on convolutional neural network was used as the benchmark model to test the effect of the method. On the COCO data set and VOC data set, compared with the unified precision quantization, the mean average precision (mAP) of the model compressed to 5 bits was improved by 6% and 24.9%.
- deep learning,
- mixed precision,
- truncated quantization,
- YOLOV5

References

[1]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[2]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint, 2014: 1409.1556.
[3]	He K, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[4]	Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs[C]//Advances in Neural Information Processing Systems, 2011.
[5]	Gupta S, Agrawal A, Gopalakrishnan K, et al. Deep learning with limited numerical precision[C]//International Conference on Machine Learning, 2015, 37: 1737-1746.
[6]	Jacob B, Kligys S, Chen B, et al. Quantization and training of neual networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2704-2713.
[7]	Cai Y H, Yao Z W, Dong Z, et al. ZeroQ: A novel zero shot quantization framework[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 13169-13178.
[8]	Wang K, Liu Z J, Lin Y J, et al. HAQ: Hardware-aware automated quantization with mixed precision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 8612-8620.
[9]	Huang Z Z, Du H M, Chang L B. Mixed-clipping quantization for convolutional neural networks [J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(4): 553-559. (in Chinese) doi: 10.3724/SP.J.1089.2021.18509
[10]	Zeng H Q, Hu H L, Lin X W, et al. Deep neural network compression and acceleration: An overview [J]. Journal of Signal Processing, 2022, 38(1): 183-194. (in Chinese) doi: 10.16798/j.issn.1003-0530.2022.01.021
[11]	Chen W L, Wilson J T, Tyree S, et al. Compressing neural networks with the hashing trick[C]//32nd International Conference on Machine Learning, 2015, 37: 2285-2294.
[12]	Liu Z, Li J G, Shen Z Q, et al. Learning efficient convolutional networks through network sliming[C]//2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2775-2763.
[13]	Xu Y F, Zhang D Z, Wang L, et al. Lightweight feature fusion network design for local feature recognition of non-cooperative target [J]. Infrared and Laser Engineering, 2020, 49(7): 20200170. (in Chinese) doi: 10.3788/IRLA20200170
[14]	Lin M, Ji R, Wang Y, et al. HRank: Filter pruning using high-rank feature map[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 1529-1538
[15]	He Y, Ding Y, Liu P, et al. Learning filter pruning criteria for deep convolutional neural networks acceleration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020: 2006-2015.
[16]	Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding[C]//Conference on Computer Vision and Pattern Recognition, 2016.
[17]	Gong R, Liu X, Jiang S, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), 2019: 4852-4861.
[18]	Zhu F, Gong R, Yu F, et al. Towards unified int8 training for convolutional neural network[C]//Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Virtual, 2020: 1969-1979.
[19]	Redmon J, Farhadi A. YOLOV3: An incremental improvement [J]. arXiv, 2018: 1804.02767. doi: 10.48550/arXiv.1804.02767

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views(406) PDF downloads(37) Cited by()

Proportional views

HTML

4. 结束语

文中深入探讨了不同量化形式与量化方法对于卷积神经网络结果的影响，最终选择基于均方误差的截断方法作为文中的量化方法，同时，通过对网络量化过程中的舍入误差分析，提出基于误差限制的深度学习分层量化策略，采用误差限制因子γ对卷积层误差参数进行等比限制，得到不同卷积层的量化精度，并据此对网络参数进行混合截断量化。最终使用YOLOV5 s网络在COCO数据集和VOC数据集上进行测试并验证，与统一精度量化相比，YOLOV5 s网络混合量化到5位精度分别提升了6%和24.9%。目前，已通过算法实现并验证了混合精度量化方法在目标检测领域的应用，下一步工作将考虑对比尝试更多的量化方法、优化分层策略并在硬件端实现目标检测网络的混合位宽推理。

Reference (19)

[1]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [J]. Communications of the ACM, 2017, 60(6): 84-90.
[2]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint, 2014: 1409.1556.
[3]	He K, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[4]	Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs[C]//Advances in Neural Information Processing Systems, 2011.
[5]	Gupta S, Agrawal A, Gopalakrishnan K, et al. Deep learning with limited numerical precision[C]//International Conference on Machine Learning, 2015, 37: 1737-1746.
[6]	Jacob B, Kligys S, Chen B, et al. Quantization and training of neual networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 2704-2713.
[7]	Cai Y H, Yao Z W, Dong Z, et al. ZeroQ: A novel zero shot quantization framework[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 13169-13178.
[8]	Wang K, Liu Z J, Lin Y J, et al. HAQ: Hardware-aware automated quantization with mixed precision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019: 8612-8620.
[9]	Huang Z Z, Du H M, Chang L B. Mixed-clipping quantization for convolutional neural networks [J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(4): 553-559. (in Chinese)
[10]	Zeng H Q, Hu H L, Lin X W, et al. Deep neural network compression and acceleration: An overview [J]. Journal of Signal Processing, 2022, 38(1): 183-194. (in Chinese)
[11]	Chen W L, Wilson J T, Tyree S, et al. Compressing neural networks with the hashing trick[C]//32nd International Conference on Machine Learning, 2015, 37: 2285-2294.
[12]	Liu Z, Li J G, Shen Z Q, et al. Learning efficient convolutional networks through network sliming[C]//2017 IEEE International Conference on Computer Vision (ICCV), 2017: 2775-2763.
[13]	Xu Y F, Zhang D Z, Wang L, et al. Lightweight feature fusion network design for local feature recognition of non-cooperative target [J]. Infrared and Laser Engineering, 2020, 49(7): 20200170. (in Chinese)
[14]	Lin M, Ji R, Wang Y, et al. HRank: Filter pruning using high-rank feature map[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020: 1529-1538
[15]	He Y, Ding Y, Liu P, et al. Learning filter pruning criteria for deep convolutional neural networks acceleration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2020: 2006-2015.
[16]	Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding[C]//Conference on Computer Vision and Pattern Recognition, 2016.
[17]	Gong R, Liu X, Jiang S, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), 2019: 4852-4861.
[18]	Zhu F, Gong R, Yu F, et al. Towards unified int8 training for convolutional neural network[C]//Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Virtual, 2020: 1969-1979.
[19]	Redmon J, Farhadi A. YOLOV3: An incremental improvement [J]. arXiv, 2018: 1804.02767.

Quantitative method	Operation
${q}\left(w,{b}_{i}\right)=round\left(w/s\right)$	Multiplication
${q}\left(w,{b}_{i}\right)=round\left(w×{2}^{fl}\right)$	Displacement

Network model	Dataset	bit	mAP.5-.95
Network model	Dataset	bit	Displacement	Multiplication
YOLOV5 s	VOC	8	63.4%	77.9%
		7	26.5%	68.8%
		6	4.6%	39.5%
		32	81.8%

bit		8	7	6	5	32
mAP	MAX	78.9%	67.4%	46.7%	4.0%	82.6%
mAP	MSE	82.7%	76.0%	69.0%	31.7%	82.6%

γ	Compression radio	Average bit	mAP
0.08	4.93	6.49	79.6%
0.10	5.13	6.23	77.8%
0.125	5.74	5.57	72.3%
0.142	6.11	5.23	62.8%
0.166	6.31	5.07	63.3%
0.20	7.14	4.48	21.0%

Dataset	Method	bit	γ	mAP@0.5	mAP@0.5-0.95	Model size
COCO	Unified bit	7		0.567	0.345	6.35
		6		0.503	0.301	5.45
		5		0.386	0.215	4.54
	Mixed bit	6.49	0.08	0.602	0.368	5.89
		5.57	0.125	0.546	0.322	5.05
		5.07	0.166	0.446	0.260	4.60
	Ori model	32		0.636	0.411	29.07
VOC2011	Unified bit	7		0.950	0.732	6.35
		6		0.925	0.643	5.45
		5		0.533	0.295	4.54
	Mixed bit	6.49	0.08	0.950	0.706	5.89
		5.57	0.125	0.981	0.669	5.05
		5.07	0.166	0.782	0.456	4.60
	Ori model	32		0.950	0.786	29.07

Mixed-precision quantization for neural networks based on error limit (Invited)

doi: 10.3788/IRLA20220166

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views

Mixed-precision quantization for neural networks based on error limit (Invited)

doi: 10.3788/IRLA20220166

School of Computer Science and Technology, Xidian University, Xi'an 710071, China

HTML

1.1. 模型剪枝

1.2. 参数量化

2.1. 卷积截断量化方法

2.1.1. 量化方法

2.1.2. 截断方法

2.2. 基于误差限制的量化搜索策略

2.3. 相关处理

2.3.1. 激活函数处理

2.3.2. 链接模块与残差模块处理

3.1. 实验环境

3.2. 实验结果

Catalog

Dataset	Method	bit	mAP@0.5	Aeroplane	Bicycle	Bird	Boat	Bottle	Chair	Dog	Person	Sheep	Train	Tvmonitor
VOC2011	Unite	5	0.782	0.753	0.435	0.497	0.995	0.801	0.995	0.249	0.897	0.995	0.995	0.995
VOC2011	Mixed	5	0.533	0.232	0.324	0.497	0.484	0.209	0.995	0.332	0.455	0.995	0.995	0.34

Mixed-precision quantization for neural networks based on error limit (Invited)

doi: 10.3788/IRLA20220166

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views

Mixed-precision quantization for neural networks based on error limit (Invited)

doi: 10.3788/IRLA20220166

School of Computer Science and Technology, Xidian University, Xi'an 710071, China

HTML

1.1. 模型剪枝

1.2. 参数量化

2.1. 卷积截断量化方法

2.1.1. 量化方法

2.1.2. 截断方法

2.2. 基于误差限制的量化搜索策略

2.3. 相关处理

2.3.1. 激活函数处理

2.3.2. 链接模块与残差模块处理

3.1. 实验环境

3.2. 实验结果

Catalog

Export File

Citation

Format

Content