文章试图从歧义字段本身的特点,即从伪歧义与真歧义这两个角度,以规则库为辅助手段,构建相应概率统计模型来解决歧义字段切分的问题.概率模型中特征的选取考虑了相邻词语和相邻词语的语义信息.实验表明该模型在解决歧义切分问题上是有效的.
We build a probabilistic statistical model combined with a rule base to solve the problem. The model is built based on the characters of ambiguity strings, including true-ambiguity and pseudoambiguity,and the semantic information of the neighboring words is considered in the feature selection. The relative experimental results show that the model is effective in the segmentation of ambiguity string.