东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Improved Pattern Tree for Incremental Frequent-Pattern Mining

ISSN号：1006-4982
期刊名称：《天津大学学报：英文版》
时间：0
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术] TP311.12[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]School of Mechanical Engineering, Tianjin University, Tianjin 300072, China
相关基金：Supported by National Natural Science Foundation of China （No.50975193） and Specialized Research Fund for Doctoral Program of Higher Education of China （No.20060056016）.

关键词：增量更新, 挖掘模式, 数据结构, 电源端口, 即时通讯, 执行时间, 数据库, 数据集, data mining, association rules, improved pattern tree, incremental mining

中文摘要：

由分析存在前缀树数据结构，一棵改进模式树为处理新交易被介绍。它第一在一棵词典的顺序树上存储了交易然后由在一份下降频率的订单排序每条路径重构树。当更新改进模式树时，到没有需要重新扫描全部新数据库或重建为增长更新的一棵新树。测试与 100,000 宗交易和 870 个项目在合成数据集 T10I4D100K 上被执行。试验性的结果看那越小最小的支持阀值，改进模式树为所有数据集在 CanTree 上完成越多 faster。当最小的支持阀值从 2% ～ 3.5% 增加了，运行时刻从 452.71 s 减少了到 186.26 s。同时， CanTree 要求的运行时刻从 1,367.03 s 减少了到 432.19 s。当数据库被更新时，改进模式树的执行时间由原来的改进模式树和起始的树的重建的建设组成了。实验结果证明运行时刻被大约 15% 与 CanTree 的相比节省。当交易的数字增加了，改进模式树的运行时刻比 FP 树的突然是大约 25% 。改进模式树也比 CanTree 要求了更少的记忆。

英文摘要：

By analyzing the existing prefix-tree data structure, an improved pattern tree was introduced for processing new transactions. It firstly stored transactions in a lexicographic order tree and then restructured the tree by sorting each path in a frequency-descending order. While updating the improved pattern tree, there was no need to rescan the entire new database or reconstruct a new tree for incremental updating. A test was performed on synthetic dataset T1014D100K with 100 000 transactions and 870 items. Experimental results show that the smaller the minimum sup- port threshold, the faster the improved pattern tree achieves over CanTree for all datasets. As the minimum support threshold increased from 2% to 3.5%, the runtime decreased from 452.71 s to 186.26 s. Meanwhile, the runtime re- quired by CanTree decreased from 1 367.03 s to 432.19 s. When the database was updated, the execution time of im- proved pattern tree consisted of construction of original improved pattern trees and reconstruction of initial tree. The experiment results showed that the runtime was saved by about 15% compared with that of CanTree. As the number of transactions increased, the runtime of improved pattern tree was about 25% shorter than that of FP-tree. The improved pattern tree also required less memory than CanTree.

同期刊论文项目