针对互联网舆情管控领域信息量大,时效性强,往往偏重于某些方向,如社会热点、焦点,或反动、黄色言论等的特点,文中把基于密度的聚类思想引入传统K-Means算法,提出全新的DK聚类算法,并且基于DK算法构建中文文本聚类模型,重点对互联网媒体发布信息进行主动热点发现研究。用实验验证中文聚类模型的具体性能,证实了该模型的有效性和实用性。
In the information booming era, Intemet informtion control and supervision always need to deal with numerous update information and focusc on some specific areas such as social focus, hot topics, anti - social statement and pomo information. Considering all these features, create a Chinese text clustering model and specialized in Interact information hotspots discovery on initiative. It proposes the density based DK solution also combined the strength of K - Means algorithm and the feasibility is justified in the experiment.