术语抽取是层次体系构建的首要子任务。目前的术语抽取研究主要集中在文本语料并且混合多个主题,存在知识获取的瓶颈和术语表述的模糊与歧义的问题。为了解决这些问题,本文提出一种基于边权重的主题核心术语抽取方法,从社会化标签中抽取主题核心术语。考虑到社会化标签丰富的语义关联特征,本文提出结合具体主题的局部共现和资源集合中所有主题的全局语义相似度的边权重。新颖的边权重将传统的随机游走方法分解成多个主题相关的随机游走,并针对每个具体主题排序相关的候选术语。排序靠前的术语被抽取作为主题核心术语。实验结果表明本文提出的方法显著优于前人的相关工作。
Term extraction is a primary subtask of hierarchy construction. Existing studies for term extraction mainly focus on text corpora and indiscriminately mix numerous topics,which may lead to a knowledge acquisition bottleneck and misconception. To deal with these problems,this paper proposes a method of topic key term extraction based on edge weight to extract topic key term from folksonomy. In view of semantic association characteristics of folksonomy,the edge weight which combines the local co- occurrence in a specific topic with the global semantic similarity over all the topic dimensions in the whole collection considered is proposed. The new edge weight can decompose a traditional random walk into multiple random walks specific to various topics,and each of these walks outputs a list of terms ordered on the basis of importance score. Then,the top- ranking terms are extracted as the topic key terms for each topic. Experiments show that the proposed method outperforms other state- of- the- art methods.