在构建助词知识库、标注大规模语料过程中使用了基于规则的助词用法自动标注的方法;对标注后的语料,发现基于规则的助词用法自动标注方法能够自动发现语料的部分词性、分词错误。这些错误的发现对研制高质量的语料库起到了积极的促进作用,并将语料加工深度向前推进。
During the construction of auxiliary words knowledge base,used rule-based automatic annotation on auxiliary word's usage.After automatic annotation,found words part-of-speech and segmentation errors in annotated corpus.The discovery is benefit for the high quality chinese corpus and the development of the processing depth.