东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Document Clustering Based on Constructing Density Tree

时间：0
分类：TP3[自动化与计算机技术—计算机科学与技术]
作者机构：[1]School of Computer Science and Technology, Tianjin University, Tianjin 300072, China, [2]School of Computer Software, Tianjin University, Tianjin 300072, China）
相关基金：Supported by Science and Technology Development Project of Tianjin（No. 06FZRJGX02400）, National Natural Science Foundation of China （No.60603027）.
相关项目：基于信息几何方法的维数约减和信息抽象模型研究

关键词：文档处理, 树形结构, 向量空间, 计算机技术, document handling clustering, tree structure, vector space model

中文摘要：

这篇论文集中于文件由基于 DEnsityTree (CABDET ) 聚类算法改进聚类的精确性聚类。CABDET 方法由动态地根据本地密度调整邻居的半径为每潜在的簇构造基于密度的树结构。它避免与噪音(DBSCAN ) 的应用程序的基于密度的空间聚类的一个的全球密度参数和还原剂输入参数。真实文件的实验的结果证明 CABDET 完成比 DBSCAN 方法聚类的更好的精确性。CABDET 算法获得最大 F 措施与根节点邻居 0.80 的半径珍视 0.347，它比有邻居 0.65 的半径和目标的最小的数字的 0.332 DBSCAN 高 6。

英文摘要：

This paper focuses on document clustering by clustering algorithm based on a DEnsityTree （CABDET） to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise （DBSCAN）＇s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node＇s radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.

同期刊论文项目