以智慧城市管理应用系统中的案件上报短文本为对象,研究有效的特征生成和特征选择方法,实现案件快速准确地自动分类。根据案件描述短文本的特点,提出一种互邻特征组合算法,以生成描述力更强的组合特征;为进一步约减特征并优化特征空间,提出一种新的隶属度函数来为分类体系中的每个类别构建一个类别特征域,然后利用类别特征域进一步优化选择原始特征与组合特征,最终得到对分类贡献最高的特征表示集合。以南宁市青秀区“城管通”App中的案例分类为实例,验证提出的特征生成及选择方法,实验表明相对于文档频率、互信息和信息增益,提出的方法对案件分类的准确率更高,引入组合特征能显著提升分类准确率。
This paper aims to provide effective methods for feature generation and selection so as to automatically categorizingshort text of urban management cases reported by smart city management application system.By analyzing thecharacteristics of the short text,a new adjacent feature combination algorithm is proposed to generate combined featuresbearing more descriptive power.To further reduce and optimize the feature space,a new membership function to constructa class feature domain for each category in the classification system is presented.Then,the category feature domain isapplied to further select both original and combining features to attain optimal feature representation collection with highestsignificance to classification.The city management application of Qingxiu District of Nanning city is used as an example,experimental results show that the feature generation and selection method proposed in this paper has better performancein short text classification compared to the document frequency,mutual information and information gain and other methods,and the introduction of combined features can significantly improve the performance of the short text classification system.