东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于Rough Set的海量数据分割算法

ISSN号：1003-6059
期刊名称：《模式识别与人工智能》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]重庆邮电学院计算机科学与技术研究所,重庆400065
相关基金：国家自然科学基金项目（No.60373111）、教育部科学技术研究重点项目、重庆市应用基础研究基金项目和重庆市教委科学技术研究项目资助

关键词：粗糙集, 数据分割, 分布式处理, Rough Set, Data Partition, Distributed Information Processing

中文摘要：

处理海量数据一直是数据挖掘要解决的一个重要问题．目前已有许多并行或串行的算法来处理海量数据，然而这些算法通常都不能很好地解决速度和正确率之间的矛盾．分布式运算在处理数据上具有明显优势，因此本文考虑将一个原始的海量数据集分割成许多个独立的小数据集进行分布式处理．本文首先根据Rough Set的特点提出最佳分割的定义，然后提出一种海量数据分割算法来寻找最佳分割．通过实验测试证明结合本文提出的数据分割算法的分布式处理方案能够快速处理海量数据，而且与处理整个数据集的算法相比，正确性较高．

英文摘要：

Processing huge data sets is an important topic in data mining nowadays . Although many serial or parallel algorithms have been developed to deal with huge data sets, most of them are not ideal to resolve the conflict between speed and accuracy. In this paper, the whole huge data set is partitioned into many small subsets for the advantage of distributed computing. At first, a definition of best partition is proposed. Then, a rough-set-based partition algorithm is developed to look for the best partition. Experimental results prove that the distributed information processing method based on the rough-set-based partition algorithm is an effective method in dealing with huge data sets. It is faster than original rough-set-based algorithms and its performance is as good as those processing the original data set as a whole.

同期刊论文项目