-
特征选择即从原始数据获得相关特征的子集。在文中收集的LIBS数据具有12 248个变量,其中大多数是低强度和无信息的变量。适当的特征选择有助于减少训练时间,克服维数的冗余并提高预测性能。阈值方法(Threshold Method,TM)是一种简单的功能选择方法。它保留所有值大于给定阈值(T)的变量。考虑到LIBS数据的峰值具有最重要的信息,设置阈值以提取强度大于T的变量,以达到降维目的。在文中,采用TM从LIBS数据中提取特征,计算一种矿石的每一维数据平均值,依据平均值大小,选取10维数据作为相应矿石的特征数据。这样可以获得10种矿石的10维特征数据,作为后续算法的特征输入,特征提取如图2所示。
-
K-近邻(K-Nearest Neighbor, KNN)分类算法是通过不同特征值之间的距离值进行分类的机器学习算法,属于有监督学习的一种,训练数据都有相应的标签类别。算法的基本思想是根据欧几里得距离判定未知数据特征与哪一类数据特征最接近,进而确定分类归属,原理如图3所示。红色圆点属于一类,蓝色三角型属于另一类,绿色正方形属于测试样本,当K=2时,绿色正方形属于红色圆点一类,当K=3时,绿色正方形属于蓝色三角形一类,KNN分类算法精度与K值的选取有关。
根据训练样本类别来决策测试样本的类别。KNN算法可以用于分类,通过找出一个样本的K个最近邻居,将这些邻居属性的平均值赋给该样本,就可以得到该样本的属性。欧几里得距离公式为:
$$d(x,y) = \sqrt {\sum\limits_{i = 1}^n {{{({x_i} - {y_i})}^2}} } $$ (1) 式中:
${x_i}$ 为训练样本集中第i类样本;${y_i}$ 为测试样本的第i类样本;n为空间维数。KNN分类算法主要步骤如下:(1)确定K值,计算未知对象与训练集中样本对象的距离;
(2)找到K个最近距离对象;
(3)
K个近邻中出现类别最多的即为未知对象的类别属性。 -
随机森林(Random Forest, RF)是机器学习的先进算法。它是一个分类器,由树型分类器的集合组成。每个树型分类器都使用自主抽样法构造唯一训练集。基于自主抽样技术可用来连续生成训练和测试集。训练集生成带有RF的多分类树。基于该组合的最终预测结果通过单个分类树的简单多数表决获得。RF中类信息定义如下:
$$ I(X = {x_i}) = - {\log _2}p({x_i}) $$ 式中:
$I(x)$ 为随机变量;$p({x_i})$ 为概率。RF通过组合多个分类器进行预测,示意图如图4所示。RF算法步骤如下:
(1)利用自主抽样方法从原始光谱数据中选取
n个样品构成训练集; (2)重复步骤(1),得到
T1个训练集,产生一片“森林”; (3)利用“森林”中的T1颗树对样本进行测试;
(4)针对T1个分类结果,采用投票方式决定分类归属。
-
支持向量机(Support Vector Machine,SVM)是一种基于统计学理论的强大的学习方法。SVM最初的目的是解决二进制分类问题,为了促使数据类别更好地分离,它在多维空间中使用了线性函数的假设平面。SVM与传统的学习算法不同,传统的学习算法旨在降低经验风险,SVM是基于结构风险进行算法构建,因此SVM更具普及性。SVM解决分类问题关键在于找一个超平面,使两不同的特征的样品分开,如图5中二维特征,对于特定数据集
${({x}_{i},{y}_{i})\;,} \; i= {1}{,} $ $ {2},\cdots n,x\in {R}^{n},y\in {{}+{1},{-1}}$ ,存在一条直线使特征样品分布在其两侧,标准支持向量分类器为:$$\begin{split} &\\ y_i[w \cdot x_i + b] - 1 \geqslant 0,i = 1,2 \cdots ,N \end{split}$$ (2) 实际应用中分类样本是非线性的,需要将训练样本映射到高维空间,在其高维空间中进行超平面构造,这种映射关系通过设计核函数来实现。在分类中常用的核函数有以下四种类型:
(1)线性核函数:
$K(x,x') = (x \cdot x')$ (2)多项式核函数
$K(x,x') = {((x \cdot x') + 1)^q}$ (3) RBF核函数
$K(x,x') = \exp { \Bigg(}\dfrac{{{{\left\| {x - x'} \right\|}^2}}}{{2{\sigma ^2}}}{{ \Bigg)}}$ (4) sigmoid函数
$K(x,x') = \tanh (x \cdot x') + c)$ 对于高维的LIBS数据,传统的化学计量算法(例如PLS、PCA等)无法有效处理高维特征向量。但是,由于SVM具有良好的全局收敛性,因此能够灵活地确定高维特征空间中的边界,因为SVM在处理非线性、高维和小样本数据等方面具有出色的性能和便利性,因此在化学计量学中被广泛用于解决各种问题。SVM算法主要步骤如下:
(1)对于不同的矿石样品数据,任意两样本数据构造一个SVM,k类需建立K (K−1)/2个分类器;
(2)通过分类器表决决定分类结果。
文中上述算法均在python3.8.2b版本、jupyter notebook开发环境实现。将LIBS特征数据随机分成两部分,分别以占总数据70%的训练集和占总数据30%的测试集来检验模型的准确率。
Classification of iron ore based on machine learning and laser induced breakdown spectroscopy
-
摘要: 铁矿石是非常重要的矿产资源,它的开发利用对钢铁产业的发展有很大的影响,铁矿石的选检与分类是冶金行业必不可少的环节,不同种类的铁矿石及其品质会直接影响与其他物质的配比,因此对铁矿石的选检分类研究在冶金行业具有重要意义。激光诱导击穿光谱技术(LIBS)是近年来发展起来的一项成分检测技术,具有无损、快速、原位在线检测等优点,在化学成分检测及样品分类领域有一定的优势。为了提高铁矿石的分类精度,提出将激光诱导击穿光谱技术与机器学习相结合对赤铁矿、褐铁矿、菱铁矿、云母赤铁矿、磁铁矿、磁赤铁矿、鲕状赤铁矿、黄铁矿、钴磁铁矿、磁黄铁矿等10种天然铁矿石进行分类研究。在研究中,首先通过激光诱导击穿光谱技术烧蚀10种天然铁矿石样品获得其对应的光谱数据;然后通过设定阈值的方法选定最大光谱强度对应的10个光谱特征;最后通过KNN、RF、SVM机器学习模型对选定的特征光谱进行分类训练及测试。结果表明:KNN、RF、SVM三种机器学习模型的分类准确度分别为83.0%、80.7%、90.3%。从分类准确度可以看出,激光诱导击穿光谱技术与机器学习相结合可以实现对铁矿石的快速、精确分类,这将为冶金行业的铁矿石选检分类提供一种全新的方法。Abstract: Iron ore is a very important mineral resource. Its development and utilization have a great impact on the development of the iron and steel industry. The selection and classification of iron ore is an indispensable link in the metallurgical industry. Different types of iron ores and its grade will directly affect the ratio of other substances, so the research on the selection and classification of iron ore is of great significance in the metallurgical industry. Laser-induced breakdown spectroscopy (LIBS) is a recently developed component detection technology. It has the advantages of non-destructive, fast, in-situ online detection, etc., and has certain advantages in the field of chemical composition detection and sample classification. In order to study the method of improving the classification accuracy of iron ores, 10 kinds of natural iron ores, including hematite, limonite, siderite, mica hematite, magnetite, maghmite, oolitic hematite, pyrite, cobalt-bearing magnetite, pyrrhotine, were classified with LIBS and machine study. In this study, 10 kinds of natural iron ores, were ablated by LIBS to obtain their corresponding spectral data; then the 10 features corresponding to the maximum spectral intensity were obtained by setting a threshold; the classification training and testing on selected feature spectra were performed with KNN, RF, and SVM models. The results show that the classification accuracy of the three machine learning models: KNN, RF and SVM are 83.0%, 80.7%, and 90.3%, respectively. It can be seen from the classification accuracy that combination of LIBS and machine learning can achieve rapid and accurate classification of iron ores, which will provide a new method for classification of iron ores in the metallurgical industry.
-
Key words:
- LIBS /
- machine learning /
- ore classification /
- RF /
- SVM
-
-
[1] Zhang Bo, Min Hong, Liu Shu, et al. X-Ray fluorescence spectroscopy combined with discriminant analysis to identify imported iron ore origin and brand: Application development [J]. Spectroscopy and Spectral Analysis, 2020, 41(1): 258-291. (in Chinese) [2] Chen Jinzhong, Ma Ruiling, Chen Zhenyu, et al. Enhancement effect of carbon chamber confinement on laser plasma radiation [J]. Optics and Precision Engineering, 2013, 21(8): 1942-1948. (in Chinese) doi: 10.3788/OPE.20132108.1942 [3] Choi S U, Han S C, Yun J I. Hydrogen isotopic analysis using molecular emission from laser-induced plasma on liquid and frozen water [J]. Spectrochimica Acta Part B: Atomic Spectroscopy, 2019, 162: 105716. doi: 10.1016/j.sab.2019.105716 [4] Vanselow C, Stöbener D, Kiefer J, et al. Revealing the impact of laser-induced breakdown on a gas flow [J]. Measurement Science and Technology, 2019, 31(2): 027001. [5] Mei Yaguang, Cheng Yuxin, Cheng Shusen, et al. Simultaneous analysis of Si, Mn and Ti segregation in pig iron by laser-induced breakdown spectroscopy [J]. Infrared and Laser Engineering, 2018, 47(8): 0806003. (in Chinese) doi: 10.3788/IRLA201847.0806003 [6] Wang Xianshuang, Guo Shuai, Xu Xiangjun, et al. Fast recognition and classification of tetrazole compounds based on laser-induced breakdown spectroscopy and raman spectroscopy [J]. Chinese Optics, 2019, 12(4): 888-895. (in Chinese) doi: 10.3788/co.20191204.0888 [7] Li Ang'ze, Wang Xianshuang, Xu Xiangjun, et al. Fast classi-fication of tobacco based on laser-induced breakdown spectroscopy [J]. Chinese Optics, 2019, 12(5): 1139-1146. (in Chinese) doi: 10.3788/co.20191205.1139 [8] Li Yeqiu, Sun Chenglin, Li Qian, et al. Analysis of the heavy metals in atmospheric particulate matter using dual-pulsed laser-induced breakdown spectroscopy [J]. Infrared and Laser Engineering, 2019, 48(10): 1005006. (in Chinese) doi: 10.3788/IRLA201948.1005006 [9] Gazeli O, Bellou E, Stefas D, et al. Laser-based classification of olive oils assisted by machine learning [J]. Food Chemistry, 2020, 302(1): 1-7. [10] Peng Haobin, Chen Guohua, Chen Xiaoxian, et al. Hybrid classification of coal and biomass by laser-induced breakdown spectroscopy combined with K-means and SVM [J]. Plasma Science and Technology, 2019, 21(3): 64-72. [11] Diaz D, Hahn D W, Molina A, et al. Evaluation of Laser-Induced Breakdown Spectroscopy (LIBS) as a measurement technique for evaluation of total elemental concentration in soils [J]. Applied Spectroscopy, 2012, 66(1): 99-106. doi: 10.1366/11-06349 [12] Li Xiaohui, Yang Sibo, Fan Rongwei, et al. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers [J]. Optics and Laser Technology, 2018, 102: 233-239. doi: 10.1016/j.optlastec.2018.01.028 [13] Wang P, Li N, Yan C, et al. Rapid quantitative analysis of the acidity of iron ore by laser-induced breakdown spectroscopy (LIBS) technique coupled with variable importance measurement-random forest (VIM-RF) [J]. Analytical Methods, 2019, 11(27): 1-10. [14] Zhao Yun, Guindo M L, Xu Xing, et al. Deep learning associated with laser-induced breakdown spectroscopy (LIBS) for the prediction of lead in soil [J]. Applied Spectroscopy, 2019, 73(5): 565-573. doi: 10.1177/0003702819826283