Abstract:
Objective Hyperspectral images can acquire continuous spectral bands integrated into a three-dimensional data set, which is rich in spectral information and capable of distinguishing different types of materials. They are widely used in various remote sensing surveying fields. However, with the rapid development of deep learning, hyperspectral image classification has made great progress, but still faces some difficulties. The annotation of hyperspectral images requires a significant amount of manpower, financial resources, and time. And the number of available labeled samples is limited, making it difficult to achieve accurate classification results through training. Therefore, the classification of hyperspectral images with only a small number of labeled samples is a challenge. Researching hyperspectral image classification in scenarios with few samples is of great practical significance for promoting the application of hyperspectral technology.
Methods In recent years, Self-supervised Learning (SSL) has emerged as an effective approach to reduce the reliance on costly data annotation for hyperspectral image classification. SSL methods have achieved high classification accuracy in natural image classification by learning latent features that arise from different views of the same image. To explore the potential of SSL methods in hyperspectral image classification, a self-supervised hyperspectral image classification method under the Bootstrap Your Own Latent (BYOL) framework, referred to as BSSL, has been proposed. This method leverages the self-supervised image feature learning framework of BYOL, which can train the network and fine-tune parameters without the need for negative sample pairs, utilizing spatial-spectral similar pairs of the same category to extract more discriminative features. Specifically, the method mainly includes four parts: pre-training of BYOL, superpixel clustering, re-training of BYOL based on similar pairs, and final classification. In the BYOL model, the encoder employs a spectral-spatial transformer network to extract joint spatial and spectral features. The superpixel clustering utilizes a global measurement method for superpixel clustering based on binary edge maps, which can achieve more accurate clustering effects in edge areas. On the basis of clustering spatial features, the spectral similarity is calculated using the Spectral Angle Distance, ultimately obtaining a set of similar pairs for retraining the BYOL and fine-tuning the network parameters. Finally, classification is performed using a classical Support Vector Machine classifier.
Results and Discussions To verify the effectiveness of the proposed method, tests were conducted on three public datasets and compared with five advanced unsupervised and self-supervised classification methods: SuperPCA, S3PCA, ContrastNet, SSCL, and N2SSL. On the Indian Pines and Salinas datasets, the BSSL method achieved superior values in overall classification accuracy (OA), average classification accuracy (AA), Kappa coefficient, recall, and f1-score (Tab.1, Tab.3). Specifically, on the Indian Pines dataset, the OA was improved by 1.32%, 1.05%, 5.68%, 3.12%, and 1.27% compared to SuperPCA, S3PCA, ContrastNet, SSCL, and N2SSL, respectively. On the University of Pavia dataset, while the BSSL method did not perform as outstandingly, it still demonstrated the best overall classification performance (Tab.2). This is because, although the University of Pavia dataset has a considerable number of samples for each category, the distribution is quite scattered, and some ground object category areas are elongated, which is very unfriendly to superpixel segmentation.
Conclusions A BYOL-based self-supervised learning for hyperspectral image classification method (BSSL) was proposed. The method, by referencing the self-supervised feature learning framework BYOL, can train and fine-tune the network using spatial-spectral similar intra-class sample pairs, thereby extracting more discriminative features. The experimental results demonstrate that the BSSL method exhibits superior classification performance across all three datasets. It also indicates that the method is more suitable for scenarios where the area of the ground objects is relatively large and the distribution is more concentrated, as this is more favorable for superpixel clustering.