天体光谱包含着许多重要的关于天体的物理和化学信息,如天体表面的有效温度、重力加速度以及化学丰度等,天体光谱的处理和分析对天文研究具有重要的科学意义。一些大型巡天计划的实施(如SDSS,LAMOST等)使我们获得了海量的天文光谱数据,因此天文光谱数据的自动分类成为重要的科学研究课题,然而面对如此海量的光谱数据,一些传统的光谱自动分类方法已经不适用,迫切需要寻找高效率的光谱自动分类技术。研究了基于局部均值的K-近质心近邻(local mean-based K-nearest centroid neighbor,LMKNCN)算法在恒星(Star)、星系(Galaxy)和类星体(Quasar,QSO)的光谱分类中的应用。LMKNCN算法的基本思想是根据近质心近邻原则,从每一类训练样本集中为待测样本点选取k个近质心近邻点,然后根据每一类中所选取的k个近质心近邻点的均值点到待测样本点x的距离来判别x的所属类别。针对美国SDSS-DR8的天体光谱数据,对比了K-近邻、K-近质心近邻、LMKNCN三种算法在恒星、星系和类星体的光谱分类中所表现的性能,结果表明三种方法中,LMKNCN算法对这三种光谱的识别率高于其他两种算法或者与其相当,而且其平均分类正确率高于另外两种算法,特别是在类星体的识别率上表现的更好。表明了该算法对天文光谱大数据的快速处理和有效利用具有重要的意义。
In the present paper ,a local mean‐based K‐nearest centroid neighbor (LMKNCN) technique is used for the classifica‐tion of stars ,galaxies and quasars(QSOS) .The main idea of LMKNCN is that it depends on the principle of the nearest centroid neighborhood(NCN) ,and selects K centroid neighbors of each class as training samples and then classifies a query pattern into the class with the distance of the local centroid mean vector to the samples .In this paper ,KNN ,KNCN and LMKNCN were experimentally compared with these three different kinds of spectra data which are from the United States SDSS‐DR8 .Among these three methods ,the rate of correct classification of the LMKNCN algorithm is higher than the other two algorithms or com‐parable and the average rate of correct classification is higher than the other two algorithms ,especially for the identification of quasars .Experiment shows that the results in this work have important significance for studying galaxies ,stars and quasars spectra classification .