大众分类是Web2.0环境下产生的一种新型信息分类法,标签是其中的核心要素,但标签的多样性、模糊性、结构扁平化等缺陷严重影响了信息检索的效率。本文以“豆瓣读书”为例,通过分析标签的统计学规律,挖掘标签间的相互关系,并利用聚类算法对标签进行聚类,构建标签概念空间,从而实现对标签的重新组织,为用户提供更好地标签导航和浏览机制。实验证明,本文提出的算法模型能够较好地构建标签概念空间。
The folksonom is a new information classification method under Web2.0 environment and tag is the core element in it. But the defects of tag, like diversity, ambiguity and flat structure deeply affect the efficiency of information retrieval. This paper takes Douban Reading as an example, analyses the statistics law of tag and mines the relationships between tags. Clustering tag automatically by using clustering algorithm, constructing the concept space of tag. Further achieving the tag reorganization to provide better navigation and browsing mechanism for users. The algorithm model proposed in this paper is proved to be effective by empirical study.