东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

联合聚类非线性相关的时序基因表达数据

期刊名称：计算机研究与发展, 2008, 45(11):1865-1873
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]东南大学计算机科学与工程学院,南京210096
相关基金：国家自然科学基金项目（60572112）
相关项目：基于数据挖掘的医学图像分类研究

中文摘要：

为聚类非线性相关的数据对象，引入广义信息论中二次互信息作为相似性度量，利用矩阵理论降低了二次互信息的计算量，并结合滑动窗口技术，建立了一种时序数据非线性相关模型．在此基础上提出了适用于时序基因表达数据的确定性联合聚类算法MI—TSB．该算法将时序数据转化为抽象字符序列，然后插入到MI-泛化后缀树中，避免了穷举各种组合，从而快速索引全部聚类结果．实验结果显示MI—TSB算法具有良好的运行性能，成功聚类出非线性相关的对象；利用Gene Ontology对聚类结果进行基因注释，也验证了聚类结果的生物学意义．

英文摘要：

The biclustering algorithms focus on clustering correlated patterns in sub-spaces. However, most of the biclustering algorithms nowadays address only the linearly correlated pattern or a certain linearly similar pattern, leaving the nonlinearly correlated patterns untouched, which are often hidden in a great many of real data sets. In this paper, a novel biclustering algorithm called MI TSB is proposed to find and report all nonlinearly correlated patterns in time series gene expression data. It first deduces an efficient calculating formula of quadratic mutual information with matrix theory, and then based on the quadratic mutual information and sliding window technology, a time series data nonlinearly similar model and a simple general suffix tree variation version are introduced. Using suffix tree as index structure, the MI-TSB algorithm explores all of biclusters effectively and efficiently. Compared with general biclustering algorithms, the ability of discovering the nonlinearly correlated patterns in sliding window is one of the most important advantages of the MI-TSB algorithm.Additionally, experiments on real gene expression dataset and synthetic dataset show that the MI-TSB algorithm successfully discovers some nonlinearly correlated patterns which can not be found by other ordinary biclustering algorithms. Besides, gene annotating by gene ontology demonstrates that the MI-TSB algorithm can find biologically meaningful results.

同期刊论文项目