Optimization of routing and wavelength optimization algorithm for optical transport network based on reinforcement learning

Kong Yinghui; Yang Jiazhi; Gao Huisheng; Hu Zhengwei

doi:10.3788/IRLA20220084

Volume 51 Issue 11

Nov. 2022

Turn off MathJax

Article Contents

Article Navigation > Infrared and Laser Engineering > 2022 > 51(11): 20220084

Kong Yinghui, Yang Jiazhi, Gao Huisheng, Hu Zhengwei. Optimization of routing and wavelength optimization algorithm for optical transport network based on reinforcement learning[J]. Infrared and Laser Engineering, 2022, 51(11): 20220084. doi: 10.3788/IRLA20220084

Citation:

Kong Yinghui, Yang Jiazhi, Gao Huisheng, Hu Zhengwei. Optimization of routing and wavelength optimization algorithm for optical transport network based on reinforcement learning[J]. Infrared and Laser Engineering, 2022, 51(11): 20220084. doi: 10.3788/IRLA20220084

Optimization of routing and wavelength optimization algorithm for optical transport network based on reinforcement learning

doi: 10.3788/IRLA20220084

Kong Yinghui^{1, 2
,},
Yang Jiazhi^{1
,
,},
Gao Huisheng^{1, 2},
Hu Zhengwei^{1, 2}

1.
Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, China
2.
Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, China

Received Date: 2022-02-07
Rev Recd Date: 2022-07-23
Publish Date: 2022-11-30

Abstract

Aiming at the routing and wavelength problems of dynamic services in optical transport network, a deep routing wavelength assignment algorithm based on reinforcement learning is proposed. The algorithm is based on a software defined network architecture, flexibly adjusts and controls the optical transport network through reinforcement learning, and realizes the optimization of the optical network routing wavelength assignment strategy. For the problem of routing selection, combined with the wavelength usage on the link, the A3C algorithm is used to select the appropriate route to minimize the blocking rate; for the problem of wavelength assignment, the first fit algorithm is used to select the wavelength. Considering multiple indicators such as blocking rate, resource utilization, policy entropy, value loss, execution time, and speed of algorithm convergence, the 14-node NSFNET network topology simulation experiment is implemented. The results show that when the channel contains 18 wavelengths, compared with the traditional KSP-FF algorithm, the blocking rate of this routing wavelength assignment algorithm is reduced by 0.06, and the resource utilization rate is increased by 0.02, but the execution time is increased. When the number of wavelengths exceeds 45, compared with KSP-FF, the proposed algorithm maintains the blocking rate and resource utilization, while the execution time begins to decrease. When the number of wavelengths is 58, compared with KSP-FF, the proposed algorithm's execution time is reduced by 0.07 ms. It can be seen that the proposed algorithm optimizes the routing and wavelength assignment.
- optical transport network,
- routing and wavelength optimization,
- reinforcement learning,
- routing selection,
- wavelength assignment

References

[1]	Nath P K, Venkatesh T. Lightpath routing and wavelength assignment for static demand in translucent optical networks [J]. Photonic Network Communications, 2020, 39(7): 103-119.
[2]	Yang Xiuqing, Chen Haiyan. Application of optical communication technique in the internet of things [J]. Chinese Optics, 2014, 7(6): 889-896. (in Chinese)
[3]	Li Haitao. Technical approach analysis and development prospects of optical communication technology in China deep space TT&C network [J]. Infrared and Laser Engineering, 2020, 49(5): 20201003. (in Chinese) doi: 10.3788/IRLA20201003
[4]	Yang Junbo, Yang Jiankun, Li Xiujian, et al. Choice and control of routes in crossover optical interconnection network [J]. Optics and Precision Engineering, 2010, 18(6): 1249-1257. (in Chinese)
[5]	Sun Zhaowei, Liu Xuekui, Wu Xiande, et al. Path planning based on ant colony and genetic fusion algorithm for communication supporting spacecraft [J]. Optics and Precision Engineering, 2013, 21(12): 3308-3316. (in Chinese) doi: 10.3788/OPE.20132112.3308
[6]	Guo Xiuzhen, Hou Lixin, Yin Zhaotai, et al. All-optical routing control based on coherently induced high reflection band and high transmission band in a medium of cold atoms [J]. Chinese Optics, 2011, 4(4): 355-362. (in Chinese)
[7]	Zhang Min, Xu Bo, Cai Yi, et al. Routing and wavelength assignment based on genetic algorithm in large scale WDM network [J]. Optical Communication Technology, 2018, 42(11): 1-4. (in Chinese)
[8]	Wang Weilong, Li Yongjun, Zhao Shanghong, et al. Routing and wavelength assignment based on load balance for optical satellite network [J]. Laser & Optoelectronics Progress, 2021, 58(7): 0706004. (in Chinese)
[9]	Shi Xiaodong, Li Yongjun, Zhao Shanghong, et al. Ant colony optimization routing and wavelength technology for software-defined satellite optical networks [J]. Infrared and Laser Engineering, 2021, 51(7): 20200125. (in Chinese) doi: 10.3788/IRLA20200125
[10]	Martín I, Troia S, Hernández J A, et al. Machine learning based routing -and wavelength assignment in software-defined optical networks [J]. IEEE Transactions on Network and Service Management, 2019, 16(3): 871-883. doi: 10.1109/TNSM.2019.2927867
[11]	Mnih V, Kavukcuoglu K, Silver D, et al. Human level control through deep reinforcement learning. [J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[12]	Li Zhongtao. Wireless communication node coverage optimization based double deep Q-learning [J]. Electronic Technology & Software Engineering, 2021(14): 1-3. (in Chinese)
[13]	Rao Ning, Xu Hua, Qi Zisen, et al. Communication interference resource allocation method of deep reinforcement learning based on maximum policy entropy [J]. Journal of Northwestern Polytechnical University, 2021, 39(5): 1077-1086. (in Chinese) doi: 10.1051/jnwpu/20213951077
[14]	Zhao Zipiao, Zhao Yongli, Ma Haoli, et al. Cost-efficient routing, modulation, wavelength and port assignment using reinforcement learning in optical transport networks [J]. Optical Fiber Technology, 2021, 64: 102571.
[15]	Li Xin, Zhao Yongli, Li Yajie, et al. Multi-objective routing and resource allocation based on reinforcement learning in optical transport networks[C]//2020 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC), 2020: 1-3.
[16]	Chen Xiaoliang, Li Baojia, Proietti Roberto, et al. DeepRMSA: A deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks [J]. Journal of Lightwave Technology, 2019, 37(16): 4155-4163. doi: 10.1109/JLT.2019.2923615

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views(170) PDF downloads(43) Cited by()

Proportional views

HTML

0. 引　言

网络业务的激增对骨干网传输带宽提出了更高的要求。如何在有限资源网络中为业务选择合适的路由和分配优化的波长对于提升网络资源的利用效果、优化管理和灵活控制都有较大的影响。所以路由与波长分配(Routing and Wavelength Assignment，RWA)成为光传送网中的核心问题之一^[1-3]。

RWA问题一般被分为路由问题和波长两个方面，首先选取合适的路由作为链路，然后为链路分配波长^[4-6]。常用的路由算法有最短路径算法(Shortest Pathes，SP)、K条最短路径算法(K Shortest Pathes，KSP)；其中SP根据源节点-目的节点计算最短路径，当业务请求到来选择最短路径路由。这种方法计算复杂度低，但会导致网络阻塞率高。KSP是在SP的基础上，在源节点-目的节点计算K条路径并且按照距离排序。当业务到达时，可按照优先级顺序选择可用路由。常用波长分配算法有随机分配 (Random Assignment, RA)、首次命中 (First Fit, FF)等。其中RA是在可用波长的集合随机选择一个波长传输资源，该算法实现简单，被使用波长的随机性较大。而FF按照优先级搜索可用波长，使用首次找到可用的波长传输信息。FF计算开销较小、阻塞率较低。参考文献[7]对解决RWA问题波长分配的常用方法做了对比实验，仿真表明波长分配算法对RWA问题的解决影响较小，因此FF是光网络中目前应用较多的典型算法。

由于目前的云计算、数据互联等新业务呈现动态特性，上述路由方法由于缺乏灵活性不再适用，需要根据业务特性快速按需部署网络资源，并借助智能算法为光网络的路由选择提供灵活的优化管理与控制方案，软件定义网络(Software Defined Network，SDN)与深度强化学习的思想可以为上述方案实施提供支持。

近几年，基于机器学习的光网络路由和波长智能分配算法引起了学者的广泛关注。参考文献[7]提出一种遗传算法解决光网络中RWA问题，较传统算法具有更低的的网络阻塞率。参考文献[8]提出一种蚁群RWA算法，实现了卫星光网络的负载均衡，但是所提出的蚁群算法易陷入局部最优，收敛速度较慢。参考文献[9]考虑卫星光网络传输延迟和波长连续性限制，提出改进蚁群算法的RWA方法，降低了计算复杂度，但是也增加了阻塞率。参考文献[10]使用监督学习的机器学习方法解决RWA问题，将RWA问题映射为分类问题，该算法大大减少了生成RWA策略的计算时间，但训练数据集难以得到。

自2015年开始，深度强化学习已用于解决通信网络的优化问题^[11]。参考文献[12]中，作者针对空中移动无线通信节点与水面通信节点信息交互的应用场景，提出基于深度强化学习的方法引导空中无线通信节点移动路径，实现最小代价水面通信节点覆盖。参考文献[13]中，作者针对通信组网对抗中干扰资源分配的优化问题，提出了一种基于最大策略熵深度强化学习的干扰资源分配方法。参考文献[14]中，作者提出一种基于深度强化学习算法用于解决光传送网中路由、调制、波长和端口分配问题，目的是降低成本。但是该算法处理复杂的拓扑时，由于可用的路由路径较多，将导致模型训练时间较长。在参考文献[15]中，作者提出一种基于深度强化学习的路由和资源分配算法，该算法可以选择跳数最小的路径和数量最少的波长转换器，降低光网络运维成本，简单实验拓扑验证算法可以达到预期目标，但实验所需的网络拓扑及评价指标需要进一步扩展。在参考文献[16]中，作者针对弹性光网络提出一种基于深度强化学习的路由-模式-频谱分配算法，根据业务需求动态分配带宽，进而提高频谱利用率，有效降低弹性光网络中业务请求的阻塞概率，对光传送网中RWA问题的解决有一定借鉴意义。

针对光网络中的路由选择和波长分配问题，借鉴弹性光网络中频谱优化的思想，提出一种深度路由波长分配算法(Deep Routing and Wavelength Assignment, DeepRWA)，该算法采用SDN框架灵活控制光传送网络的路由选择和波长分配，基于深度强化学习策略实现RWA的智能化处理。深度强化学习使用异步优势行动-评论算法(Asynchronous Advantage Actor-Critic, A3C)算法并考虑波长使用情况选择路由；在此基础上使用FF实现波长分配，使路由阻塞率最小，提升资源利用率。

4. 结　论

为了适应大量动态业务的需求，针对光网络中的RWA问题进行研究，考虑阻塞率、资源利用率两个目标提出一种基于强化学习的路由波长分配算法DeepRWA，采用SDN网络架构实现光网络灵活控制，通过强化学习实现路由选择和波长分配的优化。针对路由选择问题，结合链路上的波长使用情况，使用A3C算法选择合适的路由，使得阻塞率最小；针对波长分配问题，使用首次选中算法选择波长。利用NSFNET网络拓扑下进行了仿真实验，结果表明文中所提的DeepRWA算法阻塞率更低，改善了资源利用率，提升了网络的性能；当链路波长数较多时，与KSP-FF算法相比，文中所提DeepRWA算法运行时间更短，适应性更好。后续结合实际网络和业务进行进一步的研究和测试，为实际应用提供有力的支持。

Reference (16)

[1]	Nath P K, Venkatesh T. Lightpath routing and wavelength assignment for static demand in translucent optical networks [J]. Photonic Network Communications, 2020, 39(7): 103-119.
[2]	Yang Xiuqing, Chen Haiyan. Application of optical communication technique in the internet of things [J]. Chinese Optics, 2014, 7(6): 889-896. (in Chinese)
[3]	Li Haitao. Technical approach analysis and development prospects of optical communication technology in China deep space TT&C network [J]. Infrared and Laser Engineering, 2020, 49(5): 20201003. (in Chinese)
[4]	Yang Junbo, Yang Jiankun, Li Xiujian, et al. Choice and control of routes in crossover optical interconnection network [J]. Optics and Precision Engineering, 2010, 18(6): 1249-1257. (in Chinese)
[5]	Sun Zhaowei, Liu Xuekui, Wu Xiande, et al. Path planning based on ant colony and genetic fusion algorithm for communication supporting spacecraft [J]. Optics and Precision Engineering, 2013, 21(12): 3308-3316. (in Chinese)
[6]	Guo Xiuzhen, Hou Lixin, Yin Zhaotai, et al. All-optical routing control based on coherently induced high reflection band and high transmission band in a medium of cold atoms [J]. Chinese Optics, 2011, 4(4): 355-362. (in Chinese)
[7]	Zhang Min, Xu Bo, Cai Yi, et al. Routing and wavelength assignment based on genetic algorithm in large scale WDM network [J]. Optical Communication Technology, 2018, 42(11): 1-4. (in Chinese)
[8]	Wang Weilong, Li Yongjun, Zhao Shanghong, et al. Routing and wavelength assignment based on load balance for optical satellite network [J]. Laser & Optoelectronics Progress, 2021, 58(7): 0706004. (in Chinese)
[9]	Shi Xiaodong, Li Yongjun, Zhao Shanghong, et al. Ant colony optimization routing and wavelength technology for software-defined satellite optical networks [J]. Infrared and Laser Engineering, 2021, 51(7): 20200125. (in Chinese)
[10]	Martín I, Troia S, Hernández J A, et al. Machine learning based routing -and wavelength assignment in software-defined optical networks [J]. IEEE Transactions on Network and Service Management, 2019, 16(3): 871-883.
[11]	Mnih V, Kavukcuoglu K, Silver D, et al. Human level control through deep reinforcement learning. [J]. Nature, 2015, 518(7540): 529-533.
[12]	Li Zhongtao. Wireless communication node coverage optimization based double deep Q-learning [J]. Electronic Technology & Software Engineering, 2021(14): 1-3. (in Chinese)
[13]	Rao Ning, Xu Hua, Qi Zisen, et al. Communication interference resource allocation method of deep reinforcement learning based on maximum policy entropy [J]. Journal of Northwestern Polytechnical University, 2021, 39(5): 1077-1086. (in Chinese)
[14]	Zhao Zipiao, Zhao Yongli, Ma Haoli, et al. Cost-efficient routing, modulation, wavelength and port assignment using reinforcement learning in optical transport networks [J]. Optical Fiber Technology, 2021, 64: 102571.
[15]	Li Xin, Zhao Yongli, Li Yajie, et al. Multi-objective routing and resource allocation based on reinforcement learning in optical transport networks[C]//2020 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC), 2020: 1-3.
[16]	Chen Xiaoliang, Li Baojia, Proietti Roberto, et al. DeepRMSA: A deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks [J]. Journal of Lightwave Technology, 2019, 37(16): 4155-4163.

Parameters	Value
Average arrival time of dynamic service/s	1/12
Continuing times of dynamic service/s	13
Available wavelength of channel	18

Optimization of routing and wavelength optimization algorithm for optical transport network based on reinforcement learning

doi: 10.3788/IRLA20220084

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views