当现有训练数据过期,而新数据又非常少时,运用迁移学习能够有效提高分类器性能。本文提出一种基于聚类的文本迁移学习算法,给出了算法的主要思想及实现步骤。然后,在中文文本语料库上进行了实验,并与非迁移学习算法进行了比较。实验证明该方法能有效提高分类器性能。
Transfer learning can improve the performance of classifier effectively, when the training data are out of date, but the new data are very few. In this paper, we propose a transfer learning algorithm for text classification based on clustering. We describe the main idea and the step of the algorithm. Then have experiment on text corpus of Chinese, and compare the algorithm with transfer-unaware algorithm. The experiments demonstrate that this algorithm significantly outperforms the others.