为满足大数据高效处理与节省成本的需求,针对云计算网络可计算节点资源的多维属性特点,提出了一种面向云计算的多属性Entropy-KNN层次聚类算法的节点选择策略.在节点资源动态描述机制的基础上,借助P2P动态路由完成节点资源的信息交换与实时更新,通过属性信息熵及相似度距离的计算,首先对已知节点按属性进行多次聚类,获得给定阈值的子类;接着选取距离较小的K个近邻节点为候选节点,根据分类与可信度的验证,最终选取与需求节点间相似度距离较小的节点作为候选节点.结果表明,该算法在一定程度上提升了计算节点的定位准确率,同时也能进一步提高大数据的处理效率.
In order to satisfy the demands of the high efficiency of big data processing and the cost saving in cloud computing, a policy of node choosing using Entropy-KNN hierarchical clustering on the multi-attribute information was proposed focusing on the characteristics of multi-dimension attributes of the computational resources. Based on the dynamic description mechanism of node resources, we can implement the message exchange and real-time updating referencing router mechanism in P2P. This study firstly calculates the entropy of the attribute information and their similar distance, then proceeds multi-time clustering for the known nodes and obtains the sub-class set which needs the given threshold values. Secondly, the K neighbor nodes with smaller value are selected as candidate nodes. Then we choose the nodes with the smallest similarity distance between demand nodes as the final optional nodes by verifying its credibility. Experiment result shows that our strategy enhances the precision of the node loca- tion and improves the efficiency of the big data processing.