在社会化标记系统中,常采用聚类等数据挖掘技术来解决标签冗余和语意模糊的问题.现有标签聚类算法大多根据不同标签在对象中共同出现的次数来计算它们之间的相似度,但是这种方法聚类的精确度与召回率并不高.针对此问题,提出一种新的标签聚类算法,充分考虑标签的标记信息,采用基于对象的特征向量来精确地表征一个标签,根据余弦相似度公式得到较为准确的标签相似度,然后采用K-Means算法将用户标签进行聚类.实验结果表明该算法能够得到更加精确的聚类结果.
In the social tagging systems,it often uses data mining techniques,such as clustering,to remedy the problems of tag redundancy and ambiguity.The current tag clustering algorithms are mainly based on the tag co-occurrence in different items,but these algorithms′ clustering precision and recall are relatively low,which can only calculate the similarity between two tags.This paper proposes a new tag clustering algorithm,which introduces an object-based feature vector to characterize a single tag.This feature vector can represent a tag exactly and can get a more accurate similarity between two tags by using cosine similarity formula.K-Means algorithm is used to cluster the users′ tags.The experiment shows that the algorithm proposed in this paper can get a more accurate clustering result.