东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于完整似然最短信息长度准则的高斯混合模型聚类

ISSN号：1003-7985
期刊名称：Journal of Southeast University (english Edition)
时间：2013.3.1
页码：43-47
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]东南大学仪器科学与工程学院,南京210096, [2]南京农业大学工学院,南京210031
相关基金：Foundation items： The National Natural Science Foundation of China （No.61105048, 60972165）, the Doctoral Fund of Ministry of Educa- tion of China （ No. 20110092120034）, the Natural Science Foundation of Jiangsu Province （No. BK2010240）, the Technology Foundation for Selected Overseas Chinese Scholar, Ministry of Human Resources and Social Security of China （No. 6722000008）, and the Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control （No. YCCK201005）.
相关项目：面向高维数据集成降维的半监督聚类方法研究

关键词：高斯混合模型, 非高斯分布, 模型选择, 期望最大化算法, 完整似然最短信息长度准则, Gaussian mixture model, non-Gaussian distribution, model selection, expectation-maximization algorithm, completed likelihood minimum message length criterion

中文摘要：

针对数据真实的概率分布不符合事先假设的高斯混合模型的情形，提出了一种鲁棒的基于高斯混合模型的聚类方法．首先，提出了一种新的模型选择准则，即完整似然最短信息长度准则．该准则不仅能衡量模型对数据的拟合优度，还能度量该模型对数据分组的性能．然后，将该准则作为聚类的代价函数．提出了一种新的期望最大化算法来估计模型参数．与标准的期望最大化算法相比，新算法能较好地避免不理想的局部最优解．实验结果表明：当数据概率分布模型不符合假设的高斯混合模型时，所提方法可克服现有的基于高斯混合模型聚类方法过拟合的缺点，鲁棒地得到准确的聚类结果．

英文摘要：

An improved Gaussian mixture model （GMM）- based clustering method is proposed for the difficult case where the true distribution of data is against the assumed GMM. First, an improved model selection criterion, the completed likelihood minimum message length criterion, is derived. It can measure both the goodness-of-fit of the candidate GMM to the data and the goodness-of-partition of the data. Secondly, by utilizing the proposed criterion as the clustering objective function, an improved expectation- maximization （EM） algorithm is developed, which can avoid poor local optimal solutions compared to the standard EM algorithm for estimating the model parameters. The experimental results demonstrate that the proposed method can rectify the over-fitting tendency of representative GMM-based clustering approaches and can robustly provide more accurate clustering results.

同期刊论文项目