分析了实体识别过程中存在的问题,并在基于本体的语义信息集成中模式异构和上下文异构已经得到解决的基础上,提出一个基于两阶段特征向量处理的解决方案来提高分布环境下实体识别的效率.最后针对实体识别中比较函数主要考虑英文字符串特点导致中文字符串精度较低的特点设计了一个基于公共子串的比较函数,实验证明该函数与基于编辑距离的比较函数比较,具有更高的查全率、查准率和更低的时间复杂度.
Analyzed the problems existed in entity identification processing. On the basis of the schematic and xontext heterogeneity already being resolved a resolution of two-stage feature vector processing is proposed for increase of efficiency. Finallly, aimed at the problem that most of comparison functions in entity identification consider the main characteristic of English character string which lead to low precision in comparing Chinese character string, a function based on common substring is designed. Experiment had proven that this function compared with function based on the edition distance has a higher recall, the accuracy ratio and the lower time order of complexity.