由于微阵列数据集行(样本)少列(基因)多的特征,使得采用传统列枚举方法对其进行频繁闭合模式挖掘较为困难,基于行枚举方法,提出超链接结构HT-struct,并基于该结构提出频繁闭合模式挖掘新算法HTCLOSE,算法采用深度优先搜索策略,结合高效的修剪技术和巧妙的链表组织技术,在时间和空间上均得到了优化,实验表明,HTCLOSE算法通常快于行枚举算法CARPENTER。
Because the microarray datasets contain a large number of columns (genes) and a small number of rows (samples), mining frequent closed patterns in microarray datasets pose a great challenge for traditional algorithms based on the column enumeration space. Based on the row enumeration space, a hyperlink structure,HT-struct was suggested and a new algorithm, HTCLOSE was proposed for mining frequent closed patterns. HTCLOSE searched the row enumeration space in depth-first, combined efficient pruning and ingenious hyperlink organizing. Several experiments on real-life microarray datasets showed that HTCLOSE was faster than CARPENTER, an algorithm based on the row enumeration space.