短文本具有特征稀疏、描述概念信号弱等特点,传统方法对短文本进行分类很难取得较好结果。针对上述问题,提出了一种基于自身特征扩展的短文本分类方法 SC-FE。该方法首先基于类内离散度从每个类中选取高类别指示性的特征组成特征空间;其次对样本的特征,在已选的特征空间中选取其相关度最大的特征加入短文本中进行扩充。在实际数据集上的实验结果表明,该方法可有效提高短文本的分类效果。
Short text is characterized of the sparseness and the weak description of concept, the traditional method of short text classification is difficult to achieve good results. Motivated by this, this paper proposed a short text classification method SC- FE based on extension with its own features. Firstly, it composed a feature space by selecting the features with high indicative ability. Secondly, for each feature in a text, the method selected the most similar feature to expand the short text. Experimental results conducted on real data sets show that the method can effectively improve the effect of the short text classification.