短文本具有不同于普通文本的独有特点,例如文本长度较短,特征选择分散不一,这使得短文本文类需要处理这些特殊的问题.本文使用了基于主题本体的特征扩展方法,考虑了特征之间的语义关联,达到了较好的分类性能.同时,通过GC(扩展能力)算法使用了案例维护学习,在K-近邻算法中减少样例个数,从而可以提高搜索近邻样例的效率.数值型实验证明了这种学习算法的有效性.
Short text classification problem need to face some special problems to be solved different from traditional text classification, such as short text length, features sparse. This paper uses the feature extended method based on theme Ontology. It can get better classification performance by considering the semantic relations. Meanwhile, using case-base maintenance learning via the GC (generalization capabili- ty) algorithm, which can reduce the case number into K-NN algorithm, can improve efficiency when inde- xing near neighbor in K-Nearest Neighbor algorithm. The numerical experiments prove the validity of this learning algorithm.