李清格, 杨小冈, 卢瑞涛, 王思宇, 范继伟, 夏海. 基于GCI-CycleGAN风格迁移的跨模态地理定位方法[J]. 红外与激光工程, 2023, 52(7): 20220875. DOI: 10.3788/IRLA20220875
引用本文: 李清格, 杨小冈, 卢瑞涛, 王思宇, 范继伟, 夏海. 基于GCI-CycleGAN风格迁移的跨模态地理定位方法[J]. 红外与激光工程, 2023, 52(7): 20220875. DOI: 10.3788/IRLA20220875
Li Qingge, Yang Xiaogang, Lu Ruitao, Wang Siyu, Fan Jiwei, Xia Hai. Cross-modal geo-localization method based on GCI-CycleGAN style translation[J]. Infrared and Laser Engineering, 2023, 52(7): 20220875. DOI: 10.3788/IRLA20220875
Citation: Li Qingge, Yang Xiaogang, Lu Ruitao, Wang Siyu, Fan Jiwei, Xia Hai. Cross-modal geo-localization method based on GCI-CycleGAN style translation[J]. Infrared and Laser Engineering, 2023, 52(7): 20220875. DOI: 10.3788/IRLA20220875

基于GCI-CycleGAN风格迁移的跨模态地理定位方法

Cross-modal geo-localization method based on GCI-CycleGAN style translation

  • 摘要: 近年来基于视觉的飞行器自主视觉定位技术发展迅速,是飞行器导航制导、态势感知和自主决策的关键技术之一。针对现有跨模态地理定位任务中存在模态差异大、匹配难度大、定位鲁棒性差等问题,提出了一种基于GCI-CycleGAN风格迁移的跨模态地理定位方法,通过将风格迁移算法、特征匹配算法和地理定位方法相结合,实现了飞行器跨模态地理定位。首先,获取无人机航拍的正下视实时红外图像和地理位置信息已知的可见光图像;其次,基于生成对抗网络图像风格转换的思想,设计新的生成对抗损失函数,构建并训练了GCI-CycleGAN模型,将可见光图像转换为红外图像;然后,利用SIFT、SURF、ORB、LoFTR、DFM匹配算法对生成的红外图像与实时红外图像进行匹配;最后,通过透视变换获得实时红外图像中心点在生成图像中的位置,再将该定位点映射到相应的可见光图像上,得到最终的地理定位结果。实验表明,GCI-CycleGAN相比CycleGAN网络可以有效提高图像风格迁移质量,与DFM智能匹配算法结合的匹配成功率最高可达99.48%,比原始跨模态匹配结果提高了4.73%,平均地理定位误差仅为1.37 pixel,取得了更加精确、鲁棒的地理定位结果。

     

    Abstract:
      Objective   The purpose of this research is to propose a cross-modal image geo-localization method based on GCI-CycleGAN style translation for vision-based autonomous visual geo-localization technology in aircraft. The technology is essential for navigation, guidance, situational awareness, and autonomous decision-making. However, existing cross-modal geo-localization tasks have issues such as significant modal differences, complex matching, and poor localization robustness. Therefore, real-time infrared images and visible images with known geo-location information are acquired with the proposed method, and a GCI-CycleGAN model is trained to convert visible images into infrared images using generative adversarial network image style translation. The generated infrared images are matched with real-time infrared images using various matching algorithms, and the position of the real-time infrared image center point in the generated image is obtained through perspective transformation. The positioning point is then mapped to the corresponding visible image to obtain the final geo-localization results. The research is crucial as it provides a solution to the challenges faced by existing cross-modal geo-localization tasks, improving the quality and robustness of geo-localization outcomes. A higher matching success rate and a more accurate average geo-localization error are achieved with the GCI-CycleGAN and DFM intelligent matching algorithms. The proposed method has significant practical implications for vision-based autonomous visual geo-localization technology in aircraft, which plays a crucial role in navigation and guidance, situational awareness, and autonomous decision-making.
      Methods   The research describes a proposed method for cross-modal image geo-localization based on GCI-CycleGAN style translation (Fig.1). First, the real-time infrared and visible light images of the drone's direct down view aerial photography are obtained (Fig.10). The GCI-CycleGAN model structure (Fig.3) and the generated confrontation loss function were designed and trained on the RGB-NIR scene dataset (Fig.5). The trained GCI-CycleGAN model is utilized to perform style transfer on visible light images, resulting in more realistic pseudo infrared images (Fig.8). Using various matching algorithms, including SIFT, SURF, ORB, LoFTR (Fig.6), and DFM (Fig.7), the generated pseudo infrared image is matched with the real-time infrared image to obtain the feature point matching relationship (Fig.9). The homography transformation matrix is determined based on the matching relationship of feature points. Based on the homography transformation matrix, perspective transformation is performed on the center point of the real-time infrared image to determine the pixel points corresponding to the center point in the pseudo infrared image. Then the pixel points corresponding to the center point in the pseudo infrared image are mapped to the visible light image, and the mapping points in the visible light image are determined (Fig.11). Finally, based on the geographic location information corresponding to the mapping points in the visible light image, the geographic positioning results of the drone are obtained (Fig.12).
      Results and Discussions   The experiment results demonstrate that compared to CycleGAN, GCI-CycleGAN pays more attention to the expression of detailed texture features, generates infrared images without distortion, and is closer to the target infrared image in brightness and contrast, effectively improving the quality of image style translation (Tab.1). The combination of GCI-CycleGAN and DFM intelligent matching algorithm can achieve a matching success rate of up to 99.48%, 4.73% higher than the original cross-modal matching result, and the average geo-localization error is only 1.37 pixel, achieving more accurate and robust geo-localization outcome.
      Conclusions   This article studies the geographic positioning problem of cross-modal image matching through style translation between infrared and visible light images captured by aircraft aerial photography. A cross-modal image intelligent matching method based on GCI-CycleGAN is proposed, which combines generative adversarial networks with matching algorithms to solve the geographic positioning problem based on visible light and infrared aerial image matching. First, a new loss function is designed to construct a GCI-CycleGAN model to transfer the style of visible images, and then LoFTR and DFM intelligent matching algorithms are used to achieve effective matching between the generated image and real-time infrared images. Finally, the matching relationship is mapped to the original cross-modal image pair to obtain the final geographical positioning result. The experimental results show that the proposed method effectively achieves cross-modal transformation of images and significantly improves the success rate of matching algorithms, demonstrating the value and significance of this geographic positioning method. In the future, how to deploy the proposed algorithm in embedded edge computing devices and balance cost, power consumption, and computing power to make the algorithm meet the effectiveness and real-time is a challenging problem in current practical engineering applications.

     

/

返回文章
返回