针对短文本的文本特点,提出一种基于词模型索引的短文本在线过滤方法.采用词模型索引存储已知类别的短文本.在线训练时,把新增加的语料增量更新到索引结构中;在线分类时,通过短文本中的词汇查询索引结构,检索出那些和当前短文本最相关的标注语料,用它们快速训练出的分类模型预测当前短文本.在真实手机短信过滤上的实验结果,说明本方法能够增强训练集的内容内聚性,使模型更加精细;集成多个精细模型的分类结果能够提高过滤性能.
Previous approaches to text filtering are tested,because lengths of short texts limit their feather traction.From text characters of short text,a word-model-index-based short text online filtering approach is proposed.The main idea is applying a word-model-index to store labeled short texts.When online training,new labeled short text is incrementally updated into the index.When online classifying,firstly the index is queried by the words in current unlabeled short text,secondly the labeled corpus related with the current short text is retrieved,lastly a classification model is trained from the corpus and the model is applied to predict the current short text.The experimental results from real short message service filtering show that the proposed approach could reach higher on real short message filtering show that the proposed word-model-index-based approach can enhance the content cohesion of training set to refine the model,and ensemble results of multiple fine models can improve filtering performance.