为解决单个帖子线索的多话题性问题,识别聚类中的孤立点,提出一种基于模糊聚类的网络论坛(BBS)热点话题挖掘算法.采用模糊聚类进行话题识别,使得一个帖子线索可以隶属于多个话题,而对于隶属度远小于类内平均隶属度的帖子线索,则当作孤立点来处理.此外,还给出了一种面向BBS文本的特征表示方法,并结合隶属度给出基于模糊划分的话题热度评分公式.实验结果验证了该算法的有效性.
A bulletin board system(BBS) hot topic mining algorithm based on fuzzy clustering was developed to solve the problem of the post thread with multiple topics and identifying the outliers in clustering. Fuzzy clustering was used to make one post thread belonging to many topics and the post thread whose membership degree being far less than the in-class average membership degree was treated as the outlier. Moreover, a kind of feature representation for BBS texts was given, and the formula to evaluate the hotness of topics based on fuzzy partition was also given in consideration of the membership degree. Experimental results verify the efficiency of the proposed algorithm.