基于并行多轴自注意力的图像去高光算法

李鹏越; 续欣莹; 唐延东; 张朝霞; 韩晓霞; 岳海峰

doi:10.3788/IRLA20230538

基于并行多轴自注意力的图像去高光算法

doi: 10.3788/IRLA20230538

李鹏越^{1, 2, ,},
续欣莹¹,
唐延东^{3, 4},
张朝霞¹,
韩晓霞¹,
岳海峰²

1.
太原理工大学电气与动力工程学院，山西太原 030024
2.
太原重型机械（集团）有限公司，山西太原 030027
3.
中国科学院沈阳自动化研究所机器人学国家重点实验室，辽宁沈阳 110016
4.
中国科学院机器人与智能制造创新研究院，辽宁沈阳 110016

基金项目: 国家自然科学基金项目(62203319)；山西省自然科学基金项目（202203021212220；202103021224056）；山西省科技合作交流专项（202104041101030）

详细信息

通讯作者: 李鹏越，男，讲师，硕士生导师，博士，主要从事计算机视觉方面的研究。

中图分类号: TP391.4

Image highlight removal method based on parallel multi-axis self-attention

1.
College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
2.
Taiyuan Heavy Machinery (Group) Company, Taiyuan 030027, China
3.
State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
4.
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110016, China

Funds: National Natural Science Foundation of China (62203319); Natural Science Foundation of Shanxi Province (202203021212220, 202103021224056); Shanxi Province Science and Technology Cooperation and Exchange Program (202104041101030)

摘要: 图像高光层模型的模糊性和高光动态范围大的特点，使得图像去高光成为了一个挑战性的视觉任务。纯局部性方法容易导致图像高光区出现伪影，纯全局性方法容易使图像非高光区色彩失真。针对图像去高光中局部和全局特征不平衡导致的上述问题，以及高光层建模的模糊性，提出了基于并行多轴自注意力机制的门限融合U型深度网络图像去高光算法。该方法通过隐式建模避免了高光层模型模糊引入的问题，利用U型网络结构将上下文信息与低层信息融合对无高光图像进行估计，并在U型结构编码器和解码器之间引入门限融合结构进一步提升网络模型的特征表达能力。此外，U型网络的单元结构通过融合局部和全局自注意力平衡了局部和全局特征的编码和解码。定性实验结果表明，文中方法可以更有效地去除图像中的高光，其他对比算法在高光处容易产生伪影和失真。定量实验结果表明，文中方法在PSNR和SSIM指标上优于其他五种典型的图像去高光方法，在三个数据集上，PSNR值分别高于次优方法4.10、7.09、6.58 dB，SSIM值分别取得了4％、9％和3％的增量。
- 图像处理 /
- 去高光 /
- 多轴自注意力 /
- 深度学习
Abstract: Objective Highlights are manifested as high bright spots on the surface of glossy materials under the action of light. The highlights of the image can obscure background information with different degrees. The ambiguity of the image highlight layer model and the large dynamic range of highlights enable highlight removal to be still a challenging visual task. The purely local methods tend to result in artifacts in the highlight areas of the image, and the purely global methods tend to produce color distortion in highlight-free areas of the image. To address the issues caused by the imbalance of local and global features in image highlight removal and the ambiguity of highlight layer modeling, we propose a threshold fusion U-shaped deep network based on parallel multi-axis self-attention mechanism for image highlight removal. Methods Our method avoids the ambiguity of highlight layer modeling by implicit modeling. It uses the U-shaped network structure to combine the contextual information with the low-level information to estimate the highlight-free image, and introduces a threshold fusion structure between the encoder and decoder of the U-shape structure to further enhance the feature representation capability of the network. The U-shaped network uses the contraction convolution strategy to extract the contextual semantic information faster. It gradually recovers the low-layer information of the image by expanding, and connects the features of the various stages of the contraction path in the corresponding stages of the expansion path. The threshold mechanism between the encoder and decoder is used to adjust the information flow in each channel of the encoder, which allows the encoder to extract features related to highlights as much as possible at channel level. The threshold structure first performs high- and low-frequency decoupling and feature extraction for the input features, then fuses the two types of features by pixel-wise multiplication, and finally uses the residual pattern to learn the low-level features complementary. In addition, the parallel multi-axis self-attention mechanism is used as the unit structure of the U-shaped network to balance the learning of local and global features, which eliminates the distortion and artifacts of the recovered highlight-free images caused by the imbalance extraction of local and global features. The local self-attention calculates local interactions within a small P*P window to form local attention. After the correlation calculation of the small window, the window image is mapped to an output image with the same dimension as the input image by the inverse operation of the window segmentation operation. Similarly, the global self-attention divides the input features into G*G grids with larger receptive fields. Each grid is a cell for calculating correlation, which has an adaptive size of the window space. The larger receptive field window of calculating correlation facilitates the extraction of global semantic information. For the loss function, the squared loss and the mean absolute error loss are the widely used loss functions in the image restoration field. The squared penalty magnifies the difference between large and small errors. It usually results in excessively smooth restored images. Therefore, the mean absolute error loss is used as the loss function to train our network. Results and Discussions Qualitative experiments on real highlight images show that our method can remove highlights from images more effectively, and other compared methods usually cannot remove highlights accurately and efficiently. They are prone to produce artifacts and distortion in highlight-free areas of the image. Quantitative experiments on real-world highlight image datasets show that our method outperforms five other typical image highlight removal methods in both PSNR and SSIM metrics. The PSNR values are higher than those of the second-best method by 4.10 dB, 7.09 dB, and 6.58 dB on the datasets of SD1, RD, and SHIQ, respectively. The SSIM values of our method also outperform those of the second-best method with gains of 4%, 9%, and 3% on three datasets. In addition, we also conduct ablation studies for the network structure, and the experiment verifies the effectiveness of the threshold fusion module and the parallel multi-axis self-attention module; The threshold fusion module can increase the PSNR by 0.68 dB and the SSIM by 1%, and the multi-axis self-attention module can increase the average PSNR value by 0.55 dB and the SSIM by 1%. It can also be seen from the visual results of each ablation experimental model that with the gradual optimization of the network structure, the results of image highlight removal are visually improved. The outputs of the pure convolution-based deep network models of MI and M2 have more highlight residuals and produce distortion in the highlight-free areas of the image. The models of M3, M4 and M5 combining CNN with the self-attention module visually achieve better results. Conclusions The experimental results show that good visual results for highlight removal on both public natural and textual image datasets are achieved with our method, which outperforms other methods in terms of quantitative evaluation metrics.
- image processing /
- highlight removal /
- multi-axis self-attention /
- deep learning

图 1 网络整体结构图

Figure 1. The overall structure of the proposed network

下载: 全尺寸图片幻灯片

图 2 各算法在SD1数据库上的视觉结果

Figure 2. The visual results of different methods on SD1 dataset

下载: 全尺寸图片幻灯片

图 3 各算法在RD数据库上的视觉结果

Figure 3. The visual results of different methods on RD dataset

下载: 全尺寸图片幻灯片

图 4 各算法在SHIQ数据库上的视觉结果

Figure 4. The visual results of different methods on SHIQ dataset

下载: 全尺寸图片幻灯片

图 5 消融模型去高光视觉结果对比

Figure 5. Comparison of visual results by ablation models for image highlight removal

下载: 全尺寸图片幻灯片

表 1 三个公开高光数据集上的数据分布情况

Table 1. Data distribution of three public highlight datasets

Dataset	RD	SD1	SHIQ
Training dataset	1800	12000	9825
Testing dataset	225	2 000	1000

下载: 导出CSV

表 2 不同方法在三个公开数据集上的定量结果对比

Table 2. Comparison of quantitative results of different methods on three public datasets

Dataset	SD1		RD		SHIQ
Dataset	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
DCSR	18.48	0.89	18.60	0.80	27.64	0.92
SHRBF	8.01	0.31	9.66	0.35	21.66	0.75
NMF	18.41	0.58	16.11	0.59	22.82	0.62
SCS	9.77	0.30	12.96	0.28	13.47	0.49
SHRRI	12.92	0.66	12.35	0.61	16.34	0.69
Ours	22.58	0.93	25.69	0.89	34.22	0.95

下载: 导出CSV

表 3 门限融合模块和并行多轴自注意力模块功效验证消融实验

Table 3. Ablation experiments of the contribution of the threshold fusion structure and parallel multi-axis self-attention mechanism

Model	M0	M1	M2	M3	M4	M5
PSNR	30.15	30.78	31.46	33.67	33.15	34.22
SSIM	0.91	0.92	0.93	0.94	0.94	0.95

下载: 导出CSV

[1]	Zhang Z D, Xue Z Y, Chen Y, et al. Boosting verified training for robust image classifications via abstraction[C]//Proc of the IEEE Conference on Computer Vision & Pattern Recognition, 2023: 16251-16260.
[2]	Muhammad F N, Muhammad G Z A K, Xian Y Q, et al. I2mvformer: large language model generated multi-view document supervision for zero-shot image classification[C]//Proc of the IEEE Conference on Computer Vision & Pattern Recognition, 2023: 15169-15179.
[3]	Zhu Y, Tang J, Li S, et al. Derendernet: intrinsic image decomposition of urban scenes with shape-(in)dependent shading rendering[C]//Proc of the 2021 IEEE International Conference on Computational Photography (ICCP), 2021: 1-11.
[4]	Zhang F, Jiang X, Xia Z, et al. Non-local color compensation network for intrinsic image decomposition [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(1): 132-145. doi: 10.1109/TCSVT.2022.3199428
[5]	Minaee S, Boykov Y, Porikli F, et al. Image segmentation using deep learning: a survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(7): 3523-3542.
[6]	Clough J R, Byrne N, Oksuz I, et al. A topological loss function for deep-learning based image segmentation using persistent homology [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 44(12): 8766-8778.
[7]	王冬冬, 张炜, 金国锋, 等. 尖点突变理论在红外热波检测图像分割中的应用 [J]. 红外与激光工程, 2014, 43(3): 1009-1015. doi: 10.3969/j.issn.1007-2276.2014.03.060 Wang Dongdong, Zhang Wei, Jin Guofeng, et al. Application of cusp catastrophic theory in image segmentation of infrared thermal waving inspection [J]. Infrared and Laser Engineering, 2014, 43(3): 1009-1015. (in Chinese) doi: 10.3969/j.issn.1007-2276.2014.03.060
[8]	Zou Z, Shi Z, Guo Y, et al. Object detection in 20 years: a survey[J]. Proceedings of the IEEE, 2023, 111(3): 257-276.
[9]	Li X, Lv C, Wang W, et al. Generalized focal loss: towards efficient representation learning for dense object detection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3139-3153.
[10]	Kong Y, Fu Y. Human action recognition and prediction: a survey [J]. International Journal of Computer Vision, 2022, 130(5): 1366-1401. doi: 10.1007/s11263-022-01594-9
[11]	Sun Z, Ke Q, Rahmani H, et al. Human action recognition from various data modalities: A review [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3200-3225.
[12]	Seidenschwarz J, Brasó G, Elezi I, et al. Simple cues lead to a strong multi-object tracker[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition, 2023: 13813-13823.
[13]	Hu W, Wang Q, Zhang L, et al. Siammask: a framework for fast online object tracking and segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089.
[14]	陈法领, 丁庆海, 罗海波, 等. 采用时空上下文的抗遮挡实时目标跟踪 [J]. 红外与激光工程, 2021, 50(1): 20200105. doi: 10.3788/IRLA20200105 Chen Faling, Ding Qinghai, Luo Haibo, et al. Anti-occlusion real time target tracking algorithm employing spatio-temporal context [J]. Infrared and Laser Engineering, 2021, 50(1): 20200105. (in Chinese) doi: 10.3788/IRLA20200105
[15]	李博, 张心宇. 复杂场景下基于自适应特征融合的目标跟踪算法 [J]. 红外与激光工程, 2022, 51(10): 20220013. Li Bo, Zhang Xinyu. Target tracking algorithm based on adaptive feature fusion in complex scenes [J]. Infrared and Laser Engineering, 2022, 51(10): 20220013. (in Chinese)
[16]	Shafer S A. Using color to separate reflection components [J]. Color Research & Application, 1985, 10(4): 210-218.
[17]	Yang Q, Tang J, Ahuja N. Efficient and robust specular highlight removal [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(6): 1304-1311. doi: 10.1109/TPAMI.2014.2360402
[18]	Fu G, Zhang Q, Song C, et al. Specular highlight removal for real-world images [J]. Computer Graphics Forum, 2019, 38(7): 253-263. doi: 10.1111/cgf.13834
[19]	Kim H, Jin H, Hadap S, et al. Specular reflection separation using dark channel prior[C]//Proc of the IEEE Conference on Computer Vision & Pattern Recognition, 2013: 1460-1467.
[20]	Tan R T, Ikeuchi K. Separating reflection components of textured surfaces using a single image [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(2): 178-193. doi: 10.1109/TPAMI.2005.36
[21]	Liu Y, Yuan Z, Zheng N, et al. Saturation-preserving specular reflection separation[C]//Proc of the IEEE Conference on Computer Vision & Pattern Recognition, 2015: 3725-3733.
[22]	Suo J, An D, Ji X, et al. Fast and high quality highlight removal from a single image [J]. IEEE Transactions on Image Processing, 2016, 25(11): 5441-5454. doi: 10.1109/TIP.2016.2605002
[23]	Yang Q X, Wang S N, Ahuja N. Real-time specular highlight removal using bilateral filtering[C]//Proc of the 11th European Conference on Computer Vision, 2010: 87-100.
[24]	Fu G, Zhang Q, Lin Q, et al. Learning to detect specular highlights from real-world images[C]//Proc of the ACM International Conference on Multimedia, 2020: 1873-1881.
[25]	Shi J, Dong Y, Su H, et al. Learning non-lambertian object intrinsics across shapenet categories[C]//Proc of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1685-1694.
[26]	Yi R, Tan P, Lin S. Leveraging multi-view image sets for unsupervised intrinsic image decomposition and highlight separation[C]//Proc of the AAAI Conference on Artificial Intelligence, 2020: 12685-12692.
[27]	Huang Z, Hu K, Wang X. M2-NET: multi-stages specular highlight detection and removal in multi-scenes [DB/OL]. (2022-07-20) [2024-02-20].https://arxiv.dosf.top/abs/2207.09965.
[28]	Hou S, Wang C, Quan W, et al. Text-aware single image specular highlight removal[C]//Proc of the 4th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2021: 115-127.
[29]	Jimenez-Martin L, Perez D A V, Asteasuainzarra A S M, et al. Specular reflections removal in colposcopic images based on neural networks: Supervised training with no ground truth previous knowledge [DB/OL]. (2020-06-21) [2024-02-20].https://doi.org/10.48550/arXiv.2106.02221.
[30]	Fu G, Zhang Q, Zhu L, et al. A multi-task network for joint specular highlight detection and removal[C]//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 7748-7757.
[31]	Li K, Wang Y, Zhang J, et al. Uniformer: unifying convolution and self-attention for visual recognition [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2023(1): 1-18.
[32]	Gulati A, Qin J, Chiu C C, et al. Conformer: convolutionaugmented transformer for speech recognition [DB/OL].(2020-05-16)[2024-02-20].https://doi.org/10.48550/arXiv.2005.08100.
[33]	Jiang Z H, Yu W, Zhou D, et al. Convbert: improving bert with span-based dynamic convolution [J]. Advances in Neural Information Processing Systems, 2020, 33: 12837-12848.
[34]	Wu H, Xiao B, Codella N, et al. Cvt: introducing convolutions to vision transformers[C]//Proc of the IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[35]	Saint-Pierre C A, Boisvert J, Grimard G, et al. Detection and correction of specular reflections for automatic surgical tool segmentation in thoracoscopic images [J]. Machine Vision & Applications, 2011, 22(1): 171-180.
[36]	Akashi Y, Okatani T. Separation of reflection components by sparse non-negative matrix factorization[C]//Proc of the Asian Conference on Computer Vision, 2015: 611-625.
[37]	Yamamoto T, Nakazawa A. General improvement method of specular component separation using high-emphasis filter and similarity function [J]. ITE Transactions on Media Technology and Applications, 2019, 7(2): 92-102. doi: 10.3169/mta.7.92

[1]	柯岩, 傅云, 周玮珠, 朱伟东. 基于Transformer的复合材料多源图像实例分割网络 . 红外与激光工程, 2023, 52(2): 20220338-1-20220338-13. doi: 10.3788/IRLA20220338
[2]	赵晓枫, 徐叶斌, 吴飞, 牛家辉, 蔡伟, 张志利. 基于并行注意力机制的地面红外目标检测方法（特邀） . 红外与激光工程, 2022, 51(4): 20210290-1-20210290-8. doi: 10.3788/IRLA20210290
[3]	庞忠祥, 刘勰, 刘桂华, 龚泿军, 周晗, 罗洪伟. 并行多特征提取网络的红外图像增强方法 . 红外与激光工程, 2022, 51(8): 20210957-1-20210957-9. doi: 10.3788/IRLA20210957
[4]	王嘉业, 李艺璇, 张玉珍. 基于学习的光栅图像噪声抑制方法 . 红外与激光工程, 2022, 51(2): 20220006-1-20220006-10. doi: 10.3788/IRLA20220006
[5]	夏信, 何传亮, 吕英杰, 王守志, 张博, 陈晨, 陈海鹏, 李美萱. 深度学习驱动的智能电网运行图像数据压缩技术 . 红外与激光工程, 2022, 51(12): 20220097-1-20220097-6. doi: 10.3788/IRLA20220097
[6]	王志远, 赖雪恬, 林惠川, 陈福昌, 曾峻, 陈子阳, 蒲继雄. 基于深度学习实现透过浑浊介质图像重构（特邀） . 红外与激光工程, 2022, 51(8): 20220215-1-20220215-10. doi: 10.3788/IRLA20220215
[7]	黄宜帆, 贺岩, 胡善江, 侯春鹤, 朱小磊, 李凯鹏, 刘芳华, 陈勇强, 郭守川. 海洋激光雷达图像处理提取海水深度的方法 . 红外与激光工程, 2021, 50(6): 20211034-1-20211034-8. doi: 10.3788/IRLA20211034
[8]	张旭, 于明鑫, 祝连庆, 何彦霖, 孙广开. 基于全光衍射深度神经网络的矿物拉曼光谱识别方法 . 红外与激光工程, 2020, 49(10): 20200221-1-20200221-8. doi: 10.3788/IRLA20200221
[9]	付伟伟, 黄坤. 基于微纳器件的全光图像处理技术及应用 . 红外与激光工程, 2020, 49(9): 20201040-1-20201040-14. doi: 10.3788/IRLA20201040
[10]	李宁, 赵永强, 潘泉. 时空自适应的分焦平面偏振视频PCA去噪 . 红外与激光工程, 2019, 48(10): 1026001-1026001(7). doi: 10.3788/IRLA201948.1026001
[11]	唐聪, 凌永顺, 杨华, 杨星, 路远. 基于深度学习的红外与可见光决策级融合检测 . 红外与激光工程, 2019, 48(6): 626001-0626001(15). doi: 10.3788/IRLA201948.0626001
[12]	梁欣凯, 宋闯, 赵佳佳. 基于深度学习的序列图像深度估计技术 . 红外与激光工程, 2019, 48(S2): 134-141. doi: 10.3788/IRLA201948.S226002
[13]	李方彪, 何昕, 魏仲慧, 何家维, 何丁龙. 生成式对抗神经网络的多帧红外图像超分辨率重建 . 红外与激光工程, 2018, 47(2): 203003-0203003(8). doi: 10.3788/IRLA201847.0203003
[14]	刘天赐, 史泽林, 刘云鹏, 张英迪. 基于Grassmann流形几何深度网络的图像集识别方法 . 红外与激光工程, 2018, 47(7): 703002-0703002(7). doi: 10.3788/IRLA201847.0703002
[15]	耿磊, 梁晓昱, 肖志涛, 李月龙. 基于多形态红外特征与深度学习的实时驾驶员疲劳检测 . 红外与激光工程, 2018, 47(2): 203009-0203009(9). doi: 10.3788/IRLA201847.0203009
[16]	姚旺, 刘云鹏, 朱昌波. 基于人眼视觉特性的深度学习全参考图像质量评价方法 . 红外与激光工程, 2018, 47(7): 703004-0703004(8). doi: 10.3788/IRLA201847.0703004
[17]	唐聪, 凌永顺, 郑科栋, 杨星, 郑超, 杨华, 金伟. 基于深度学习的多视窗SSD目标检测方法 . 红外与激光工程, 2018, 47(1): 126003-0126003(9). doi: 10.3788/IRLA201847.0126003
[18]	张秀玲, 侯代标, 张逞逞, 周凯旋, 魏其珺. 深度学习的MPCANet火灾图像识别模型设计 . 红外与激光工程, 2018, 47(2): 203006-0203006(6). doi: 10.3788/IRLA201847.0203006
[19]	赵永强, 李宁, 张鹏, 姚嘉昕, 潘泉. 红外偏振感知与智能处理 . 红外与激光工程, 2018, 47(11): 1102001-1102001(7). doi: 10.3788/IRLA201847.1102001
[20]	马媛花, 胡炳樑, 李然, 孙朗, 孙念, 王峥杰. 采用Gyrator变换的泰伯效应及图像去噪 . 红外与激光工程, 2014, 43(2): 665-670.

点击查看大图

图(5) / 表(3)

计量

文章访问数: 40
HTML全文浏览量: 3
PDF下载量: 19
被引次数: 0

全文HTML

0. 引　言

高光是生活中常见的一种光学物理现象，通常表现为光照作用下有光泽材料表面的高亮点。场景成像后图像中的高亮点会对背景信息形成不同程度的遮挡，特别是在一些文字图像中高光容易引起图像关键信息的丢失。因此，图像去高光一直是计算机视觉和图像处理领域的基本问题。通过去除图像中的高光不仅可以复原图像中丢失的关键信息，而且可以提高许多计算机视觉任务的性能，如图像分类^[1-2]、本征图像分解^[3-4]、图像分割^[5-7]、目标检测^[8-9]、行为识别^[10-11]和目标跟踪^[12-15]。

早期的图像去高光算法通常是基于不同先验信息约束对无高光图像进行最优估计，进而对高光进行去除，例如双色反射模型^[16-17]、稀疏先验^[18]、暗通道先验^[19]、色度分布先验^[20-22]和滤波^{[17, 23]}等约束条件。这类方法由于先验信息的局部性，无法去除大范围的高光，且它们过度依赖先验信息，容易将图像中白色像素误认为高光区域，将白色像素误去除，导致这些方法的准确性降低。自然光照图像中存在丰富的纹理、复杂的材料表面和阴影，这就导致图像高光层和非高光层模型的模糊性，这种模糊性容易给经典的去高光算法引入模型误差，所以图像去高光仍然是一个具有挑战性的图像复原任务。近年来，随着深度学习在视觉领域的迅速发展，出现了一些基于深度学习的图像去高光方法。尽管目前基于深度学习的图像去高光方法已经取得了显著的进展^[24-30]，但它们仍然存在一些局限性。首先，这些方法通常是在合成数据或少量的真实数据上进行训练，训练数据和测试数据之间的域差异，可能导致这些方法在真实高光图像上去高光的泛化性不强。为此，文中在较大的真实高光图像数据库上对深度网络进行了训练，以提升深度网络模型的泛化性。其次，现有的去高光深度网络主要是通过设计新的卷积拓扑结构来提升算法性能。这类网络模型无法避免纯卷积结构的缺陷（归纳偏置对模型的局限、固定卷积核对感受野的局限、特征局部性对语义的局限等），并且结构越来越复杂。引入新的卷积结构对基于深度网络的图像去高光方法在理论意义和实际效果上都出现了瓶颈。为此，文中采用卷积和自注意力机制相融合的复合结构避免纯卷积结构的固有缺陷，提升基于深度网络的图像去高光算法的性能。

综上，文中开发了一种基于多轴自注意力机制的U型深度去高光网络模型(UMAVTB)。采用了较大规模的真实高光图像数据集对图像去高光深度网络算法进行了训练和综合评估。同时，为了避免图像高光层和非高光层模型的模糊性给经典的去高光算法引入的误差，采用了变量归一化的非线性高光图像模型。此外，为了避免纯卷积网络的局限性，开发了一种基于多轴自注意力的U型深度去高光网络模型。该深度网络通过并行多轴自注意力机制的U型门限结构提取高光图像高低层的局部和全局特征，以准确地去除不同范围和特征的高亮区域。在自然图像和文字图像数据集上的实验结果表明，文中的深度网络模型在高光去除方面优于目前典型的方法。

3. 结　论

图像高光层和非高光层模型的模糊性导致了典型的基于先验信息的优化算法在去高光任务上的局限性，因此，文中利用深度学习强大的特征提取和非线性拟合能力，融合了卷积和自注意力机制在特征提取和表征上的优势，建立了基于复合型深度网络结构的图像去高光方法。通过门限融合U型网络将上下文信息与低层信息更好地融合提高像素级估计的准确性。通过并行多轴自注意力机制融合局部和全局稀疏型自注意力平衡了局部和全局特征的提取和解码。在真实高光数据集上的定量和定性实验结果表明，文中方能够获得较好的高光去除视觉效果，并在量化评价指标上优于其它主流的方法。虽然文中方法在公开的数据库上取得了可观的效果，但因这些数据库没有给出高光区域大小和高光强弱等信息，所以没有对这些因素与去高光性能之间的关系做定量的分析，这方面需要进一步优化。

参考文献 (37)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于并行多轴自注意力的图像去高光算法

doi: 10.3788/IRLA20230538

通讯作者: 李鹏越，男，讲师，硕士生导师，博士，主要从事计算机视觉方面的研究。

Image highlight removal method based on parallel multi-axis self-attention

计量

基于并行多轴自注意力的图像去高光算法

doi: 10.3788/IRLA20230538

通讯作者: 李鹏越，男，讲师，硕士生导师，博士，主要从事计算机视觉方面的研究。

English Abstract

Image highlight removal method based on parallel multi-axis self-attention

全文HTML

1.1. 高光图像模型

1.2. 去高光深度网络

1.2.1. U型去高光网络

1.2.2. 门限特征融合结构

1.2.3. 并行多轴自注意力模块

2.1. 实验设置与评价指标

2.2. 定性与定量实验结果对比分析

2.3. 消融实验分析

目录

留言板

基于并行多轴自注意力的图像去高光算法

doi: 10.3788/IRLA20230538

通讯作者: 李鹏越，男，讲师，硕士生导师，博士，主要从事计算机视觉方面的研究。

Image highlight removal method based on parallel multi-axis self-attention

计量

出版历程

基于并行多轴自注意力的图像去高光算法

doi: 10.3788/IRLA20230538

通讯作者: 李鹏越，男，讲师，硕士生导师，博士，主要从事计算机视觉方面的研究。

English Abstract

Image highlight removal method based on parallel multi-axis self-attention

全文HTML

1.1. 高光图像模型

1.2. 去高光深度网络

1.2.1. U型去高光网络

1.2.2. 门限特征融合结构

1.2.3. 并行多轴自注意力模块

2.1. 实验设置与评价指标

2.2. 定性与定量实验结果对比分析

2.3. 消融实验分析

目录