典型的文本聚类算法是一种硬划分,但是实际上由于中文文本的多样性和大量性更适合进行软划分,模糊集理论的提出为这种软划分提供了有力的分析工具。传统的模糊聚类方法大都是通过对隶属度的矩阵逐步迭代得到模糊等价矩阵或模糊划分的方法实现聚类,这个过程需要大量的存储空间。基于模糊粒度计算的文本聚类算法是在文档集合的模糊粒度空间上给定一个归一化的距离函数d(di,d)j,对距离小于粒度dλ的文本进行动态聚类。通过实验证明此方法在解决文本聚类问题时具有降低计算复杂度和空间复杂度,适于大量文本的聚类处理。
The typical algorithm of text clustering is a"Hard Partition"one.Actually,Chinese text is better to treat with"Soft Partition"for its diversity and largeness.The fuzzy-set theory supplies a powerful analyzing tool to this"Soft partition".Traditional fuzzy text clustering methods mostly get the fuzzy equivalent matrix or fuzzy division by iterating the matrix of membership degree.Huge storage space is necessary for that process.The text clustering based on fuzzy granular computing will work as:First a normalized distance function d(di,d)j in the fuzzy granularity space of text set is provided,and then the function is used to do a dynamic clustering work to text who has a less distance than granularity dλ.Approved by the test,this method has such advantages on reducing the computing complexity and space complexity,suitable for the status that many samples need to be processed.