东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

容错的分布式系统通用死锁模型检测解除算法

ISSN号：1000-1239
期刊名称：《计算机研究与发展》
时间：0
分类：TP316.4[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001, [2]公安部上海消防研究所,上海200032
相关基金：国家自然科学基金青年基金项目（60503015）.

关键词：死锁检测, 容错, 通用模型, 分布式系统, AND—OR模型, deadlock detection, fault tolerance, generalized model, distributed system, AND-OR model

中文摘要：

分布式系统技术为采用低成本购建高性能系统提供了有效的途径，但是由于资源的分配与需求可能产生冲突，造成系统中发生死锁，导致系统运行陷入停滞．在不可靠的分布式系统中，故障会干扰正常的死锁检测，但现有的死锁检测算法不具有容错功能．对失效形式进行了归类，提出一个容错的死锁检测解除算法．算法建立在通用的AND—OR模型基础上，采用扩散计算和集中规约方式，不仅能够检测到死锁，而且能给出死锁环的全部成员．若死锁拓扑处于静态且为环状，算法的消息复杂度的上限为e＋n-1，时间复杂度为d，其中e为死锁等待图中边的个数，n和d为构成死锁环的节点的个数，分析表明算法性能等于或优于同类算法．

英文摘要：

Distributed system provides an effective method to build high performance computing system with relative low cost. During the running of distributed computing, deadlock would happen if the resource allocation and requirement confliction occur. All the processes are waiting each other for the holding resources releasing, and then the system running stops. In an unreliable distributed system, failures may prevent deadlock detection algorithms from properly detecting deadlocks. Few of the algorithms proposed in the literatures address the issue of handing these failures. In this paper, three types of failures are identified and a fault tolerance deadlock detection and resolution algorithm is proposed. Failures are treated as different detection termination conditions in the algorithm. The algorithm is based on a generalized AND-OR model and employs diffusion computation and centralized reduction methods to detect deadlocks. The algorithm distinguishes cycles and knots and gives all members of a cycle. The upper limit of the time and space complexity is d and e ＋ n - 1 respectively if the deadlock topology is a static cycle, where e is the number of the edge and n and d are the number of the nodes in the wait-for graph. The performance of the proposed algorithm is equal to or better than that Of the current algorithms.

同期刊论文项目