不确定数据的'PU学习在现实世界的许多应用中,如在传感器网络、市场分析和医学诊断等领域普遍存在,提出了针对不确定数据Pu学习的决策树算法。基于POSC45中信息增益的计算方法,引入UDT中处理连续属性的不确定数据时用到的不确定数据区间及概率分布函数的概念,提出了一种能处理连续属性的不确定数据PU学习的决策树算法DTU—PU(Decision Tree for Uncertain data with PU—leaming)。在UCI数据集上的实验表明,DTU-PU具有较好的分类准确率和健壮性。
In many real world applications, such as sensor network, market analysis and medical diagnosis, uncertain data with PU-learning scenarios are common in emerging applications. Based on the information gain algorithm in POSC45 and considering the uncertain data interval and probability distribution proposed in UDT, this paper proposes a decision tree algorithm DTU-PU (Decision Tree for Uncertain data with PU-learning), which can handle uncertain data with uncertain numerical attribute. Experimental results on UCI datasets demonstrate that the proposed algorithm has good classification accuracy and it is robust against data uncertainty.