尝试将依存树转化为短语结构树,并基于规则的方法自动检测出人工标注结果中的错误。将该方法应用于已经过两遍人工校对的北京大学多视图依存树库,从50275个句法树中发现1529处错误,正确率为100%。进一步,所有错误可以分为3个层次:分词错误、词性与句法角色不符、句法角色错标。该方法可以有效提高依存树库的质量,并且适用于各类型的依存树库。
The authors try to transform dependency tree into phrase structure tree, and detect annotation errors automatically based on manual rules. The method is used in processing Peking University Multi-view Chinese Treebank(PMT). Although PMT has been manually checked twice before processed by this method, 1529 errors are detected among the 50275 sentences and the precision is 100%. The errors mainly belong to three types: word segmentation error, mismatching between POS and syntactic role, and syntactic role error. This method can further improve treebank quality, and be applied to other dependency treebanks.