为解决短文本聚类时文本的高维稀疏性问题.提出一种基于堆叠降噪自动编码器的短文本特征提取算法。该算法利用深度学习网络形式,把多个降噪自动编码器网络逐层堆叠起来,将高维、稀疏的短文本空间向量变换到新的低维、本质特征空间。实验结果表明,将提取的文本特征应用于短文本聚类,显著提高聚类的效果。
The primary difficulty of text clustering lies in the multi-dimensional sparseness of texts. Proposes a short text clustering algorithm whmh based on the stack noise automatically reduction encoder. The proposed algorithm utilizes deep learning network form to stack up multinetwork of noise automatically reduction encoder step by step, and transforms the high dimensional and sparse short text space vector into a new low dimensional and essential feature s pace vector. The experimental results show that the extracted text characteristic is applied to short text clustering, which improves the clustering performance significantly.