东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于聚类和Ripper的稀有类分类方法

期刊名称：暨南大学学报(自然科学与医学版)
时间：0
页码：143-147
语言：中文
分类：TP311.12[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]广东外语外贸大学信息学院,广东广州510006, [2]广东蓝鸽科技有限公司,广东广州510540
相关基金：国家自然科学基金项目（60673191）;广东省高等学校自然科学研究重点项目（062012）;广东外语外贸大学科研创新团队项目（GW2006-TA-005）
相关项目：面向数据流的异常挖掘算法研究

关键词：数据挖掘, 稀有类分类, 一趟聚类, data mining, rare-class , classification , One-pass Clustering

中文摘要：

稀有类分类在许多领域有重要应用，针对稀有类在数据中所占比例少，容易被忽略的特点，提出一种基于聚类和Ripper的稀有类分类方法，该方法在一趟聚类的结果中，通过将在整个数据集中所占的比例低于15％的聚类标识为少数类，再应用Ripper分类算法分别对少数类和多数类分别进行分类建模，并按照一定的组合方式调整得出整个数据集的最终规则集。在UCI数据集上的测试结果表明，基于一趟聚类和Ripper的稀有类分类方法对稀有类可产生高质量的分类效果。可以将该方法应用于现实生活的领域中进行稀有数据的分类。

英文摘要：

The rare-class classification is an important issue in many real life applications; this paper considers the rare-class datasets are easily ignored in the classification because of its low proportion of the whole datasets. We apply a rare-class classification approach based on clustering and Ripper. This approach is trying to find out the rare-class datasets after Cluster through recognizing every cluster whose proportion of the whole datasets is lower than 15 % as the rare-class datasets. After that, Ripper algorithm is used to classify both the rare-class datasets and the normal-class datasets separately. The rule set of the whole datasets will be created by the certain method of this approach according to the model which has already been set up above. The experiments carried on benchmark datasets from the UCI Machine Learning Repository show that this approach creates high quality classifying. This approach can also be implemented to classify the rare-class datasets in some practical life applications.

同期刊论文项目