在宇宙中寻求未知天体是人类探索宇宙奥妙所追求的目标之一,离群数据挖掘是发现未知天体光谱数据的一种有效途径。文章首先以VC++和Oracle9i为开发工具,设计与实现了面向LAMOST的恒星光谱离群数据挖掘系统,并给出了其软件体系结构和模块功能。其次,对基于中值滤波器的恒星光谱数据预处理、基于距离的恒星光谱数据聚类、基于距离支持度的恒星光谱数据离群数据挖掘、基于主分量分析法PCA的恒星光谱数据离群数据的三维可视化等主要关键技术进行了详细描述。最后,基于SDSS恒星光谱数据的运行结果表明,利用该系统寻找天体光谱离群数据是可行的,从而为寻找未知的、特殊的天体光谱数据提供了一种新途径。
To find unknown celestial bodies is one of main goals in mankind's universe exploration, and outlier mining is a kind of effective way of finding unknown celestial bodies from mass spectrum data. In the present work, using VC++ and Oracle9i as development tools, an outlier mining system for star spectra is designed and realized, and its software architecture and function modules are outlined. At the same time, the system's key components such as star spectrum data preprocessing based on median filters, clustering of star spectrum data based on distance, outlier mining of star spectrum data based on distance support and three-dimensional visualization of star spectrum outlier based on PCA, are elaborated. The preliminary experimental results based on SDSS star spectrum data show that the system is workable for outlier mining of celestial body spectrum data, and a new kind of effective way of finding unknown and peculiar celestial body spectrum data.