为解决大数据量情况下的网络用户行为分析的时效性、准确性,针对Apriori算法对数据库反复扫描和候选集过大的问题,提出了一种将压缩矩阵和事务权值引入的改进型Apriori算法,并将改进后的算法运用于云计算平台Spark。实验证明,改进后的算法的性能和效率都更高,在网络用户行为分析中具有优势。
In view of the repeated scanning of the database and the potential massive candidate sets involved in the Apriori algorithm, an improved method, with the compressed matrix and the transaction value introduced in the process, is proposed to solve such problems as the timeliness and accuracy of the analysis of network user behaviors, with a further application of the improved algorithm to Spark, a cloud computing platform. The experimental results verify the better performance and higher efficiency of the proposed method, with evident advantages in the user behavior analysis.