针对数据稀疏问题,采用适应度函数较灵活的遗传算法做语义的自动标注;以《同义词词林》的层次式结构为依据,提出了利用语义层次的提升来改善适应度函数中参数的估计质量;定义了语义层次的两个基本概念;阐述了语义提升的原理,并采用选择受限策略来解决因语义提升而引起的模型辨别力下降问题;实现了用于语义标注的遗传算法。实验表明该算法在语义标注中能适应不同训练数据量,具有一定的可行性。
A genetic algorithm with adaptive evaluation function is presented to deal with data sparseness problem in automatic semantic tagging. Taking advantage of the hierarchy structure of Synonymy Thesaurus, semantic induction is used to improve the quality in estimating the parameters of the function in genetic algorithm. Based on the definitions of two fundamental concepts, the principle of semantic induction is described. Restrictive selection policy is applied to reverse the decline of model's discernment caused by the induction. Finally, the genetic algorithm is implemented and testing results show that the algorithm is feasible to different training data sizes.