为了能在小规模特定领域语料库上进行有效的概念聚类,提出了一种基于动词依存集的领域概念聚类方法。根据同类领域概念与特定的领域动词共现这一特征,在领域专家的辅助下制定动词依存集,通过计算在主谓结构和动宾结构中与动词依存集共现的概念动词依存度,将依存度高于阈值的概念聚为一类。实验证明,该方法在小规模特定领域语料库上较为实用,聚类结果的概念重合率优于基于LSI和基于搜索引擎的概念聚类方法。
In order to process the small-scale domain corpus,a domain concept clustering method based on the verb dependency set was proposed. According to the feature that the same cluster of domain concept appears together with specific domain verbs,the verb dependency set was developed with the assistance of domain experts. Next,the verb dependency value of concept which appeared together with verb dependency set in subject-predicate and verbobject structure was calculated,then the concepts that had higher dependency value than threshold were clustered.Experimental results showed that this method gets higher concept coincide than the LSI-based and the search engine-based concept clustering method,and it just adapts to processing the small-scale domain corpus.