针对隐马尔可夫(HMM)词性标注模型状态输出独立同分布等与语言实际特性不够协调的假设,对隐马尔可夫模型进行改进,引入马尔可夫族模型。,该模型用条件独立性假设取代HMM模型的独立性假设。将马尔可夫族模型应用于词性标注,并结合句法分析进行词性标注。用改进的隐马尔可夫模型进行词性标注实验。实验结果表明:与条件独立性假设相比,独立性假设是过强假设,因而基于马尔可夫族模型的语言模型更符合语言等实际物理过程;在相同的测试条件下,马尔可夫族模型明显好于隐马尔可夫模型,词性标注准确率从94.642%提高到97.126%。
In order to defy the unrealistic assumption of the part-of-speech tagging method based on hidden Markov models that successive observations are independent and identically distributed within a state, Markov family model (MFM) was introduced. Independence assumption in HMM was placed by conditional independence assumption in MFM Markov Family model was applied to part-of-speech tagging, and syntactic parsing was combined with part-of-speech tagging. The part-of-speech tagging experiments show that Markov family models (MFMs) have higher performance than hidden. From the view of the statistics, the assumption of independence is stronger than the assumption of conditional independence, so language model based on MFM is more realistic than HMM language mode. Markov models (HMMs) under the same testing conditions, the precision is enhanced from 94.642% to 97.126%.