提出一种改进的标签传播算法,并将其应用于基因表达谱数据分析中.首先使用概率矩阵表示基因表达数据,将少量样本标记为已知,同时定义一个标记序列表示样本的类别属性;然后通过迭代公式更新标记序列,得到标记序列的收敛解,并证明了该收敛解的唯一性;最后采用正负标记的方式,根据标记序列各分量的符号差异实现数据类别的划分.经过癌症数据集实验的验证,证明了提出的方法可以快速有效地实现基因表达数据的聚类.
In this paper,an improved label propagation algorithm was proposed and introduced into the analysis of gene expression profiles. First,the probability transition matrix was constructed with gene expression profiles.Meanwhile,the label sequence which indicates the class information was defined and several samples were marked as labeled data. Then,the label sequence was updated by an iterative formula and the convergence solution of the label sequence was obtained,which was proved to be the unique solution. Finally,the clustering problem was solved by using plus- minus label which was on the basis of the signs of the label sequence. Experiments on the cancer data demonstrate our method is feasible and effective.