为实现在大型事务数据库中挖掘有价值的序列数据,提出了一种基于位图的高效的序列模式挖掘算法(SMBR)。SMBR算法采用位图表示数据库的方法,提出一种简化的位图表示结构。该算法首先由序列扩展和项扩展产生候选序列,然后通过原序列位图和被扩展项位图位置快速运算生成频繁序列。实验表明,应用于大型事务数据库,该方法不仅能有效地提高挖掘效率,而且挖掘处理过程中产生的临时数据所需的内存大大降低,能够高效地挖掘序列模式。
For mining valuable sequence data in large transaction databases, the paper proposes an algorithm for sequential pattern mining based on bitmap representation (SMBR). The SMBR algorithm uses bitmaps to represent databases, and presents a simplified bitmap structure. First the algorithm generates candidate sequences by sequence extension (SE) and item extension (IE), and then obtains all frequent sequences by comparing the original bitmap and the extended item bitmap. The experiments show that when using the algorithm in large transaction databases the required memory size for storing temporal data during mining process is greatly decreased, and all sequential patterns can be efficiently mined.