在现实中的许多领域产生大量不确定的图结构的数据,例如分子化合物、蛋白质交互网络等.同时现实中有很多应用例如推荐系统中的推荐过滤、欺诈检测和社会网络的链接预测等,需要查询给定节点的k个最相似节点,针对这一问题,提出了用基于SimRank度量的方法来求解.由于图的动态演变和不确定性导致用现有的SireRank计算方法求k个最近邻的代价昂贵,因此提出一个有效算法,在保证一定准确性的前提下,通过引入路径闽值,算法只需考虑查询点的邻居区域无需考虑整个图从而达到明显的剪枝效果,该方法在确定图和不确定图上都可以适用.在此基础上为了进一步提高效率,算法在不确定图上引入采样技术.最后从理论、实验说明验证了算法的高效性和有效性.
In many areas, a lot of data have been modeled by graphs which are subject to uncertainties, such as molecular compounds and protein interaction networks. While many real applications, for example, collaborative filtering, fraud detection, and link prediction in social networks etc, rely on efficiently answering k nearest neighbor queries (kNN), which is the problem of computing the most "similar" k nodes to a given query node. To solve the problem, in this paper a novel method based on measurement of SimRank is proposed. However, because graphs evolve over time and are uncertainly, the computing cost can be very high in practice to solve the problem using the existing algorithms of SimRank. So the paper presents an optimization algorithm. Introducing path threshold, which is suitable in both determined graph and uncertain graph,the algorithm merely considers the loeal neighborhood of a given query node instead of whole graph to prune the search space. To further improving efficiency, the algorithrm adopts sample technology in uncertain graph. At the same time, theory and experiments interpret and verify that the optimization algorithm is efficient and effective.