基于误差限制的神经网络混合精度量化方法(特邀)

Mixed-precision quantization for neural networks based on error limit (Invited)

  • 摘要: 基于卷积神经网络的深度学习算法展现出卓越性能的同时也带来了冗杂的数据量和计算量,大量的存储与计算开销也成了该类算法在硬件平台部署过程中的最大阻碍。而神经网络模型量化使用低精度定点数代替原始模型中的高精度浮点数,在损失较小精度的前提下可有效压缩模型大小,减少硬件资源开销,提高模型推理速度。现有的量化方法大多将模型各层数据量化至相同精度,混合精度量化则根据不同层的数据分布设置不同的量化精度,旨在相同压缩比下达到更高的模型准确率,但寻找合适的混合精度量化策略仍十分困难。因此,提出一种基于误差限制的混合精度量化策略,通过对神经网络卷积层中的放缩因子进行统一等比限制,确定各层的量化精度,并使用截断方法线性量化权重和激活至低精度定点数,在相同压缩比下,相比统一精度量化方法有更高的准确率。其次,将卷积神经网络的经典目标检测算法YOLOV5s作为基准模型,测试了方法的效果。在COCO数据集和VOC数据集上,该方法与统一精度量化相比,压缩到5位的模型平均精度均值(mean Average Precision, mAP)分别提高了6%和24.9%。

     

    Abstract: The deep learning algorithm based on convolutional neural network exhibits excellent performance, but also brings a complex amount of data and calculation. A large amout of storage and computing overhead has alse become the biggest obstacle to the deployment of such algorithms in hardware platforms.The neural network model quantization uses low-precision fixed-point numbers instead of high-precision floating-point numbers in the original model, which can effectively compress the model size, reduce hardware resource overhead, and improve model inference speed on the premise of losing less precision. Most of the existing quantization methods quantize the data of each layer to the same accuracy, while mixed-precision quantization sets different quantization accuracy according to the data distribution of different layers, aiming to achieve a higher model accuracy under the same compression ratio, but finding a suitable mixed-precision quantization strategy is still very difficult. Therefore, a mixed-precision quantization strategy based on error limitation was proposed. By uniformly and proportionally limiting the scaling factors in each layer of the neural network, the quantization accuracy of each layer was determined, and the truncation method was used to linearly quantize the weights and activate to low-precision fixed-point numbers. Under the same compression radio, this method had higher accuracy than the unified precision quantization method. Secondly, the classical object detection algorithm YOLOV5s based on convolutional neural network was used as the benchmark model to test the effect of the method. On the COCO data set and VOC data set, compared with the unified precision quantization, the mean average precision (mAP) of the model compressed to 5 bits was improved by 6% and 24.9%.

     

/

返回文章
返回