High dimensional feature selection in near infrared spectroscopy classification
-
-
Abstract
With regard to the large number of irrelevant and redundant features exist in the near infrared spectra, a novel feature selection method based on random forest and principal component analysis (RF-PCA) was proposed in this paper. By using the RF-PCA, a classification model of cigarettes qualitative evaluation was developed and also compared with other methods. The result shows that RF-PCA effectively classifies the samples of high dimensional data and can be used to evaluate quality and authenticity of the cigarettes. RF feature selection removes irrelevant features of the classification, while PCA further eliminates the influence of redundant features and also reduces the feature dimensionalities. The experiments show that RF-PCA effectively removes noise and redundant features in the NIR spectra and the classification accuracy is improved as well.
-
-