在分布式信息网数据库管理系统中,数据是否被合理划分会影响系统的负载均衡以及节点之间的通信开销。为此,提出一种基于查询的动态数据划分算法。根据历史查询信息挖掘数据之间潜在的关联性,将关联性较大的数据动态调整到同一个处理节点上,使查询在较少的节点上处理完成,减少不必要的通信开销。实验结果表明,在保证系统负载均衡的情况下,该算法可减小通信开销,加快查询速度,优化分布式环境的整体性能。
For database management system in distributed information network,whether data is partitioned reasonably affects not only load balancing of the system but also the communication overhead between nodes. Aiming at this problem,this paper proposes a query-based dynamic data partition algorithm. According to the historical query information,it mines the potential relevance between data and dynamically adjusts the data with larger relevance to one processing node, so as to make the query processing completed in fewer nodes and reduce the unnecessary communication overhead. Experimental results show that,in the case of system load balancing,this algorithm can reduce the communication overhead, speed up the query and optimize the overall performance of the distributed environment.