对中文微博中主客观分类特征的选取进行了研究.通过词典与统计相结合的方法提取了基础情感词、语气词、程度词等8个候选特征.对提取的候选特征,提出了一种基于粗糙集与概率加权的特征选择算法,通过该算法最终选取了基础情感词、!或!、网络观点词、语气词、形容词、程度词作为分类特征.实验结果表明,提出的方法能达到较好的分类效果.
The feature selection of subjective and objective classification of Chinese micro blog habeen stud- ied. Fothe featurein Chinese micro blog, the combination of lexicon and statisticiused to extraccandidate features. By thimethod, eighcandidate featureare extracted. And feature selection algorithm based on rough setand probability-weighted iproposed. Using the algorithm, six featureare selected. The experimenresulshowthathese featureselected by the algorithm achieve good resulin subjective and objective classification of Chinese micro blog.