高维特征选择方法在近红外光谱分类中的应用

High dimensional feature selection in near infrared spectroscopy classification

  • 摘要: 针对卷烟近红外光谱高噪和高冗余特点,提出了一种基于随机森林(RF)和主成分分析(PCA)的特征优选方法RF-PCA,建立了5种不同质量级别卷烟的分类模型,并和其他方法进行了比较。该方法能够有效地对高维数据样本进行分类,用于甄别卷烟品质真伪。特征选择可以过滤与分类不相关的特征,而通过PCA方法可以消除冗余特征的不良影响,并可进一步降低特征维数。实验表明:RF-PCA方法能有效地剔除近红外光谱数据中的噪声特征和冗余特征,提高了分类效率。

     

    Abstract: With regard to the large number of irrelevant and redundant features exist in the near infrared spectra, a novel feature selection method based on random forest and principal component analysis (RF-PCA) was proposed in this paper. By using the RF-PCA, a classification model of cigarettes qualitative evaluation was developed and also compared with other methods. The result shows that RF-PCA effectively classifies the samples of high dimensional data and can be used to evaluate quality and authenticity of the cigarettes. RF feature selection removes irrelevant features of the classification, while PCA further eliminates the influence of redundant features and also reduces the feature dimensionalities. The experiments show that RF-PCA effectively removes noise and redundant features in the NIR spectra and the classification accuracy is improved as well.

     

/

返回文章
返回