大规模光谱巡天将产生海量的光谱数据,为搜寻一些奇异甚至于未知类型的光谱提供了机会,对这些特殊天体的研究有助于揭示宇宙的演变规律和生命起源,巡天数据的离群数据挖掘有助于这些特殊的光谱的发现。利用线指数对光谱数据进行降维能够在尽可能多的保留光谱物理特征的同时,有效解决高维光谱数据聚类分析中运算复杂度较高的问题。提出了基于线指数特征的海量恒星光谱离群数据挖掘及分析的方法,以恒星光谱的Lick线指数作为光谱数据的特征,利用聚类搜寻离群数据的方法在海量光谱巡天数据搜寻离群数据,以此为基础并给出线指数特征空间内离群光谱数据的分析方法。实验结果证明:(1)以线指数作为光谱的特征值能快速的完成对高维光谱数据的离群数据挖掘,可以解决高维光谱数据运算复杂度高的问题;(2)该方法是在聚类结果上进行的离群数据挖掘,能够有效的挖掘出数量较少的发射线恒星、晚M型恒星、极贫金属星、缺失数据光谱等数据;(3)线指数特征空间的离群数据挖掘可以得到线指数特征空间内特殊恒星的发现规则。本文所提出的基于线指数特征的离群数据挖掘及分析方法可以应用到巡天数据的相关研究中。
Large scale spectrum survey will produce mass spectral data and offer chances for searching rare and unknown types of spectra,which is contribute to revealing the evolution law of the universe and the origin of life.Data mining in outlier data in sky survey can serve the purpose of finding special spectra.Line index can be used in spectra data dimension reduction,keeping the spectral physical characteristics as much as possible,and at the same time,it can effectively solve the high dimensional spectral data clustering analysis in the high computation complexity.This paper proposed a method outlier data mining and analysis for massive stellar spectrum survey data based on line index characteristics,according to this,an outlier spectral data analysis method was proposed using line index characteristics space.Experimental results demonstrated that(1)using line index as the characteristic value of the spectrum can quickly perform the outlier data mining for high dimensional spectral data,and it can solve the problem of high computation complexity of the high dimensional spectral data.(2)this outlier data mining method was conducted based on the clustering results;it can effectively finding out emission stars,late type stars,late M type stars,extremely poor metal stars,and even finding spectra data missing certain data.(3)outlier data mining in line index feature space can help to analysis of rules of special stars found in the feature space.The mothed proposed in this paper based on the characteristics of line index outlier data mining and analysis method can be applied to the study of survey data.