有定序的染色质 immunoprecipitation (ChIP-Seq ) 的联合是为获得的一个有效方法一在里面 vivo 有 DNA 的蛋白质的相互作用的染色体宽的侧面。与高产量的短定序技术的戏剧的发展,几个新算法被开发了处理 ChIP-Seq。然而,为基于 immunoprecipitated (IPed ) 的尺寸选择, DNA 碎片主要是的 ChIP-Seq 的报导分析工具在 Solexa 系统上采用了。作为有最高的产量的一个音序器, ChIP-Seq 的很少研究基于稳固的系统被报导了。固体和 Solexa 系统的主要差别在准备定序图书馆期间在 DNA 碎片的长度存在。如果以便为 emulsion-PCR (ePCR ) 满足 DNA 碎片的长度要求处理 IPed DNA 碎片的进一步的 sonication,稳固的系统有相对短的 DNA 碎片。这个工作试图从 ChIP-Seq 在数据分析上调查 DNA 碎片长度的影响。典型 bimodal 达到顶点的以前的研究表演能在 Solexa ChIP-Seq 数据被观察,但是基于在这的真实稳固的 ChIP-Seq 数据的分析学习,我们发现没有两倍山峰与明显在一个本地充实的区域读移动,本地人读山峰的分发被正常分发测试。用真实、模仿的 ChIP-Seq 数据,三个主要 ChIP-Seq 算法(CisGenome, SISSR 和 MACS ) 被调查了。我们发现算法为处理从 Solexa 图书馆协议产生的 ChIP-Seq 数据发展了,不能高效地从稳固的图书馆捕获 ChIP-Seq 数据的特征。那些分析工具的误用将是为稳固的系统上的 ChIP-Seq 的失败的一个可能的原因。因此,为 IPed DNA 碎片的 extra-sonication 的新 ChIP-Seq 分析策略需要被发展。
The combination of chromatin immunoprecipitation with sequencing (ChlP-Seq) is an effective method for obtaining an in vivo genome-wide profile of the interaction of a protein with DNA. With the dramatic development of high-throughput short sequenc- ing technologies, several new algorithms have been developed to process ChlP-Seq. However, the reported analytical tools for ChlP-Seq based on size selection of immunoprecipitated (IPed) DNA fragments are mainly adopted on the Solexa system. As a sequencer with the highest throughput, few studies of ChlP-Seq based on SOLID system have been reported. The main difference of the SOLID and Solexa systems exists in the length of DNA fragments during preparing sequencing libraries. The SOLID sys- tem has relatively short DNA fragments if it processes a further sonication of IPed DNA fragments in order to meet the length requirement of DNA fragments for emulsion-PCR (ePCR). This work aims to investigate the influences of DNA fragment length on data analysis from ChlP-Seq. Previous studies show that typical bimodal peaks can be observed in Solexa ChlP-Seq data, but based on the analysis of the real SOLID ChlP-Seq data in this study, we found that there were no double peaks with apparent reads shift in a local enriched region and the local reads distribution of peaks were tested by normal distribution. Using real and simulated ChlP-Seq data, three main ChlP-Seq algorithms (CisGenome, SISSRs and MACS) have been investigated. We found that algorithms developed for processing ChlP-Seq data generated from Solexa library protocol, cannot efficiently capture the feature of the ChlP-Seq data from SOLID library. Misuse of those analytical tools would be a possible reason for failure of ChlP-Seq on the SOLID system. Therefore, a new ChlP-Seq analytical strategy for an extra-sonication of IPed DNA fragments needs to be developed.