为了解决传统分类方法计算复杂度高及可扩展性差的问题,提出了互依赖和等效半径的概念,并将两者相结合,提出新的分类算法——基于互依赖和等效半径、易更新的分类算法IER.IER算法根据互依赖作为特征选择的量度,通过较长特征值的选择降低维度,通过重心和等效半径来建立分类模型.算法分析显示IER计算复杂度较低,扩展性能较好,适用于大规模场合.将IER算法应用于中文文本分类,并与kNN算法和类中心向量法进行比较,结果表明,在提高分类精度的同时,IER还可以大幅度提高分类速度,有利于对大规模信息样本进行实时在线的自动分类.
To improve the traditional classifying methods, such as vector space model (VSM)-based methods with highly complicated computation and poor scalability, a new classifying method ( called IER) is presented based on two new concepts: interdependence and equivalent radius. In IER, the attribute is selected according to the value of interdependence, and the classifying rule is based on equivalent radius and center of gravity. The algorithm analysis shows that IER is good at classifying a large number of samples with higher scalability and lower computation complexity. After several experiments in classifying Chinese texts, the conclusion is drawn that IER outperforms k-nearest neighbor (kNN) and classifcation based on the center of classes (CCC) methods, so IER can be used online to automatically classify a large number of samples while keeping higher precision and recall.