频繁项目集发现一直都是关联规则研究领域中最关键的问题.文章给出了一个新的频繁项目集发现算法,该算法的特别之处在于事先利用有向图进行的一次数据预处理,在预处理过程中将数据库预先存贮为每个结点都有一个域来记录其支持度的项目集邻接网络,从而把复杂的频繁项目集的发现问题转化为简单的图中搜索问题,这就大大提高了频繁项目集发现过程的效率.同时为了有效地解决预处理过程中的项目集支持度计算问题,采用了一种纵向的数据库表示格式.最后对所采用的算法给出实验结果.
It is well known that the task of finding frequent itemsets in large database is the bottleneck problem in the research of association rules mining. A new algorithm for mining frequent itemsets is proposed in this paper. Based on the graph theory, the algorithm converts the origin transaction database to an itemsets adjacent lattice in the preprocessing, where each itemset vertex has a label to save its support. The algorithm changes the complicated task of mining frequent itessets in the database to a simpler one of searching vertex in the lattice, which can speed up greatly the mining process. Furthermore, to compute the support of each itemset, the algorithm uses a vertical tid-list database format, where each itemset is associated with a list of transactions in which it occurs. At the end, we carried out the algorithm, and analyzed the result of the experiment.