文本语块识别在自然语言处理领域具有重要作用。以WINNOW、支持向量机和感知器三种典型的语块识别方法为对象,从模型和特征两方面对每种方法进行了剖析,并比较和分析了三种方法与隐马尔科夫模型的优缺点,指出如果为了避免数据稀疏而只采用“词性”特征来识别多种语块,那些对于“词”敏感的短语准确率将会很低。因此针对不同的语块采用不同的特征和策略,不同短语的识别相互借鉴,把不同语块的识别集成在一起,将会起到很好的效果。
Text chunking acts as critical function in the field of natural processing field. WINNOW,SVM and perceptron are the study object in this paper. For each algorithm, model and feature are anatomized. And the advantages and disadvantages between these three algorithms and hidden Markov model are compared. The proceedings that should be pay more attention in future text chunking are pointed out. All above can be used for reference for relative research people.