近年来,随着生物医学文献数量的大量增加,对生物医学文献进行搜索和挖掘,查找有用的信息成为了生物信息学的一个重要研究方向。聚类作为一种无监督的自动化程度很高的机器学习方法,在信息检索和生物信息学领域中获得了广泛的运用。针对生物医学文本的特点提出了基于距离学习的聚类算法,实验结果证明了该方法的有效性。
In recent years,along with the rapid accumulation of biomedical literatures,how to search and mine useful information from biomedical literatures has become an important research direction of the bioinformatics.As an unsupervised machine learning methods,clustering has been widely used in information retrieval and bioinformatics.Considering the characteristic of biomedical text,we introduce a new clustering algorithm based on metric learning,and experiments results have proved the effectiveness of the proposed method.