通过基因表达数据发现与特定疾病相关的基因表达规则,对于疾病辅助诊断有重要意义。针对现有关联规则兴趣度度量的不足,提出了基于最大间隔的基因表达规则筛选策略。该筛选策略综合考虑了基因表达规则与同类及异类样本的距离,具有较强的基因表达规则筛选能力。结合最大间隔准则和递增式关联规则挖掘算法设计的关联规则挖掘算法,能够高效地发现Top-K最大间隔基因表达规则。在实际基因表达数据集上的实验结果,验证了最大间隔基因表达规则筛选策略的有效性和挖掘算法的高效性。
Discovery of disease related gene expression rules is of great importance to the computer aided diagnosis.The max margin based interesting measure is proposed,which improves the generalization ability of association rules by taking both the distance among inner and outer classes into consideration.A Top-K max margin association rules mining procedure is also devised to efficiently discover the interesting rules in the high dimensional gene expression data.The experimental results show the effectiveness of the interestingness measure and the efficiency of the mining procedure.