为了能比较不同方法的性能,常常希望在公共的训练集和测试集上进行语块识别。但是,用于实验的公共训练集和测试集往往规模较小而且具有领域的局限性。因而,在跨领域的真实语料情况下,语块识别的精确率有很大的下降。采用真实开放语料,设计多组实验研究不同的词性标注结果、不同领域的语料和不同的知识库对语块识别的影响,考察基于多Agent结构的分布式英语语块识别策略在实际系统中应用的可能性。实验表明,基于多Agent结构的分布式英语语块识别策略在真实开放语料下F测度达到了92%.基本能够满足实际应用的需要。
Public corpus is often used to do research in order to compare the performance of different method.But the public corpus is only for experimentation, so its size is usually small and the field of public corpus is local.So the veracity of chunking descends on real different field corpus.Several experiments are designed to study the influence to chunking with different result of part of speech,different field corpus and different repository in this paper.The feasibility of distributed multi-agent English chunking strategy used to real application system is reviewed.Through testing on the real public corpus,F score of English chunking using multi-agent model achieves to 92%,which almost satisfies the practical need.