Objective The purpose of this research is to propose a cross-modal image geo-localization method based on GCI-CycleGAN style translation for vision-based autonomous visual geo-localization technology in aircraft. The technology is essential for navigation, guidance, situational awareness, and autonomous decision-making. However, existing cross-modal geo-localization tasks have issues such as significant modal differences, complex matching, and poor localization robustness. Therefore, real-time infrared images and visible images with known geo-location information are acquired with the proposed method, and a GCI-CycleGAN model is trained to convert visible images into infrared images using generative adversarial network image style translation. The generated infrared images are matched with real-time infrared images using various matching algorithms, and the position of the real-time infrared image center point in the generated image is obtained through perspective transformation. The positioning point is then mapped to the corresponding visible image to obtain the final geo-localization results. The research is crucial as it provides a solution to the challenges faced by existing cross-modal geo-localization tasks, improving the quality and robustness of geo-localization outcomes. A higher matching success rate and a more accurate average geo-localization error are achieved with the GCI-CycleGAN and DFM intelligent matching algorithms. The proposed method has significant practical implications for vision-based autonomous visual geo-localization technology in aircraft, which plays a crucial role in navigation and guidance, situational awareness, and autonomous decision-making.
Methods The research describes a proposed method for cross-modal image geo-localization based on GCI-CycleGAN style translation (Fig.1). First, the real-time infrared and visible light images of the drone's direct down view aerial photography are obtained (Fig.10). The GCI-CycleGAN model structure (Fig.3) and the generated confrontation loss function were designed and trained on the RGB-NIR scene dataset (Fig.5). The trained GCI-CycleGAN model is utilized to perform style transfer on visible light images, resulting in more realistic pseudo infrared images (Fig.8). Using various matching algorithms, including SIFT, SURF, ORB, LoFTR (Fig.6), and DFM (Fig.7), the generated pseudo infrared image is matched with the real-time infrared image to obtain the feature point matching relationship (Fig.9). The homography transformation matrix is determined based on the matching relationship of feature points. Based on the homography transformation matrix, perspective transformation is performed on the center point of the real-time infrared image to determine the pixel points corresponding to the center point in the pseudo infrared image. Then the pixel points corresponding to the center point in the pseudo infrared image are mapped to the visible light image, and the mapping points in the visible light image are determined (Fig.11). Finally, based on the geographic location information corresponding to the mapping points in the visible light image, the geographic positioning results of the drone are obtained (Fig.12).
Results and Discussions The experiment results demonstrate that compared to CycleGAN, GCI-CycleGAN pays more attention to the expression of detailed texture features, generates infrared images without distortion, and is closer to the target infrared image in brightness and contrast, effectively improving the quality of image style translation (Tab.1). The combination of GCI-CycleGAN and DFM intelligent matching algorithm can achieve a matching success rate of up to 99.48%, 4.73% higher than the original cross-modal matching result, and the average geo-localization error is only 1.37 pixel, achieving more accurate and robust geo-localization outcome.
Conclusions This article studies the geographic positioning problem of cross-modal image matching through style translation between infrared and visible light images captured by aircraft aerial photography. A cross-modal image intelligent matching method based on GCI-CycleGAN is proposed, which combines generative adversarial networks with matching algorithms to solve the geographic positioning problem based on visible light and infrared aerial image matching. First, a new loss function is designed to construct a GCI-CycleGAN model to transfer the style of visible images, and then LoFTR and DFM intelligent matching algorithms are used to achieve effective matching between the generated image and real-time infrared images. Finally, the matching relationship is mapped to the original cross-modal image pair to obtain the final geographical positioning result. The experimental results show that the proposed method effectively achieves cross-modal transformation of images and significantly improves the success rate of matching algorithms, demonstrating the value and significance of this geographic positioning method. In the future, how to deploy the proposed algorithm in embedded edge computing devices and balance cost, power consumption, and computing power to make the algorithm meet the effectiveness and real-time is a challenging problem in current practical engineering applications.