东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于交互信息的数据集特征结构研究

ISSN号：1003-6059
期刊名称：《模式识别与人工智能》
时间：0
分类：TP391.4[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]温州大学物理与电子信息工程学院温州325035
相关基金：国家自然科学基金项目（No.60970065,61272018）、浙江省自然科学基金项目（No.R1110261）、温州大学研究生创新基金项目（No.31606036010138）资助

关键词：分类算法, 交互信息, 数据集特征结构, Classification Algorithm, Interaction Information, Dataset Feature Structure

中文摘要：

机器学习分类领域提出大量的分类算法，如何为数据集找到合适的分类算法成为研究的重要内容之一．文献[8]提出一种新的数据集离散化方法用来刻画数据集的特征，且在推荐方法方面取得较好的结果．本文在此基础上利用交互信息理论刻画数据集的属性与属性及属性与类标签之间协作关系，提出基于二变量和基于三变量的交互信息特征结构．通过12种分类算法在UCI数据库中的98个数据集上的性能实验，结果表明与文献[8]的方法相比，两种方法都能明显提高推荐方法的精度和命中率，且对于适应性较差的数据集，基于三变量的交互信息方法更为有效．

英文摘要：

In machine learning area, classification algorithms are widely studied and a large number of different types of algorithms are proposed. How to select appropriate ones from so many classification algorithms for the datasets becomes a crucial problem. Recently, a new method in reference [8 ~ is proposed to characterize datasets and achieve better resuks in algorithm recommendation. In this paper, two methods are presented to characterize datasets under the theory of interaction information. The performance of 12 different types of classification algorithms on the 98 UCI datasets illustrates that both two-variable and three-variable interaction information methods can improve the precision and the hit rate of recommended algorithms. Furthermore, the latter performs even better under datasets with poor adaptability.

同期刊论文项目