信息时代给人们的生活带来巨大改善,但同时也伴随一系列问题的产生,其中如何对网络中产生的大数据量的言论信息进行过滤的问题是研究的一大难点。传统的屏蔽法效率较低而且不够准确,因此文中提出了一种新的关键词屏蔽技术。主要采用二元语法模型结合层叠隐马可夫分词技术,首先运用二元语法模型在大量语料中得到普通词和关键词的构成概率,建立一个有普通词和关键词分类的词典,再结合层叠隐马可夫模型对具体句子进行分词处理,对分词后的结果计算其关键词屏蔽概率,最终得到一个科学的屏蔽概率,可以大大提高关键词屏蔽的准确性。
The information age brings a huge improvement in people's lives,but also accompanied by a series of problems arising,in which how to filter a large amount of information the network's remarks generated is a major difficulty. The traditional method of shiel-ding has low efficiency and is not accurate enough,so propose a new keyword shielding technology. Mainly use binary syntax model combined with layered hidden Markov model segmentation techniques,first utilize binary syntax model to get the constitute probability of the common words and keywords in a large corpus,creating a dictionary of common words and keywords classified,then combined casca-ding hidden Markov model for the specific sentence word processing,calculate the probability of its keywords shield for segmented result, finally get a scientific shielding probability,which can greatly improve the accuracy of keyword shield.