位置:成果数据库 > 期刊 > 期刊详情页
一种结合独立性模型与差异评估的Co-Training改进方案
  • ISSN号:1000-1239
  • 期刊名称:计算机研究与发展
  • 时间:0
  • 页码:59-66
  • 语言:中文
  • 分类:TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]大连海事大学信息科学技术学院,辽宁大连116026, [2]烟台职业学院计算机与信息工程系,山东烟台264670
  • 相关基金:国家自然科学基金项目(60773084,J0724003,60603023);高等学校博士学科点专项科研基金项目(20070151009)
  • 相关项目:手语识别中自适应问题的研究
中文摘要:

Co-Training算法要求两个特征视图满足一致性和独立性,但是,许多应用中不存在自然划分且满足这种假设的两个视图.为此,提出利用互信息(MI)或者CHI统计量评估特征之间的相互独立性,建立特征相互独立性模型(MID-Model).基于该模型,提出了新的特征子集划分方法PMID—MI与PMID-CHI算法,能有效地将一个特征集合划分成两个独立性较强的子集.并且利用多种差异评估法,进一步验证两个子集的独立性.基分类器之间的差异性能够减少两个基分类器给同一个未标注文本都标注错误的可能性.最后,提出了对Co-Training的改进算法SC—PMID.实验结果表明SC—PMID算法能够明显提高半监督分类精度.

英文摘要:

Co-training algorithm is constrained by the assumption that the features can be split into two subsets which are both compatible and independent. However, the assumption is usually violated to some degree in real-world application. The authors propose two methods to evaluate the mutual independence utilizing conditional mutual information or conditional CHI statistics, and present a method to construct a mutual independence model (MID-Model)for initial features set. Based on MID- Model, two novel feature partition algorithms PMID-MI and PMID-CHI are developed. The former utilizes conditional mutual information to evaluate the mutual independence between two features; the latter utilizes conditional CHI statistics. As a result, a feature set can be divided into two conditional independent subsets using PMID-MI or PMID-CHI. Compared with the random splitting method, both PMID-MI and PMID-CHI accomplish better performance. In addition, the conditional independence between two subsets is verified by several diversity measures such as Q statistic, correlation coefficient ρ, disagreement, double fault, and the integrative measure DM. Then, combining MID-Model and diversity measures, an improved semi-supervised categorization algorithm named SC-PMID is developed. Two classifiers can be co-trained on a pair of independent subsets. The independence of two subsets can reduce the chance of both classifiers agreeing on erroneous label of an unlabeled example. Experimental results show that the SC-PMID algorithm can significantly improve the semi-supervised categorization precision.

同期刊论文项目
期刊论文 36 会议论文 8 专利 1
同项目期刊论文
期刊信息
  • 《计算机研究与发展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院计算技术研究所
  • 主编:徐志伟
  • 地址:北京市科学院南路6号中科院计算所
  • 邮编:100190
  • 邮箱:crad@ict.ac.cn
  • 电话:010-62620696 62600350
  • 国际标准刊号:ISSN:1000-1239
  • 国内统一刊号:ISSN:11-1777/TP
  • 邮发代号:2-654
  • 获奖情况:
  • 2001-2007百种中国杰出学术期刊,2008中国精品科...,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 俄罗斯文摘杂志,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:40349