协作过滤作为一种有效的个性化推荐技术受到了广泛的关注.在实际应用中除算法本身效率外,影响算法性能发挥的一个重要的因素就是日志数据的存储与读取速度.针对该问题,文中提出一种多级分布式的日志数据存取方案,采用双哈希表的内存组织与分布式的持久存储相结合的方式来缓解日志I/O的瓶颈问题.实验结果表明,该方法比直接从硬盘读取数据以及串行读取数据的性能有明显提高.
Collaborative filtering as an effective technique for personalized recommendation has attracted many attention,but the current research focuses on the efficiency of the algorithm,in the real world application the storage and access speed of log data plays an important role in the performance.One multi-level distributed data storage method is proposed in which dual hash table and distributed memory storage is combined to relax the I/O bottleneck.Experimental result shows that the performance can be improved significantly.