东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于最近邻互信息的特征选择算法

ISSN号：1002-8331
期刊名称：《计算机工程与应用》
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]漳州职业技术学院计算机工程系,福建漳州363000, [2]闽南师范大学计算机学院,福建漳州363000
相关基金：国家自然科学基金(No.61303131);福建省自然科学基金(No.2013J01028);福建省教育厅科技项目(No.JA14192,No.JAT60866).

关键词：特征选择, 最近邻, 互信息, 邻域互信息, feature selection, nearest-neighbor, mutual information, neighborhood mutual information

中文摘要：

针对邻域信息系统的特征选择模型存在人为设定邻域参数值的问题。分别计算样本与最近同类样本和最近异类样本的距离，用于定义样本的最近邻以确定信息粒子的大小。将最近邻的概念扩展到信息理论，提出最近邻互信息。在此基础上，采用前向贪心搜索策略构造了基于最近邻互信息的特征算法。在两个不同基分类器和八个UCI数据集上进行实验。实验结果表明：相比当前多种流行算法，该模型能够以较少的特征获得较高的分类性能。

英文摘要：

Feature selection of neighborhood information system is constrained by the neighborhood size. First, this paper calculates the distance between a given sample and its nearest samples with the same and different labels to define the concept of nearest-neighbor, and determines the size of nearest neighbor simultaneously. Second, the notion of nearest-neighbor is extended to Shannon information theory, and the concept of nearest neighbor mutual information is presented. Then, a forward greedy strategy is used to construct feature selection algorithm based on nearest-neighbor mutual information.Finally, experiments are conducted on eight UCI data sets and two different base classifiers. Experimental results show that the proposed algorithm selects a few features and effectively improves classification performance compared with other popular algorithms.

同期刊论文项目