对于大规模分布式系统的开发而言,其开发周期比较漫长,包括前期的开发、过程中的Debug、后期的维护和测试等.在整个开发周期中,Debug是一个非常关键和重要的环节,如何才能在短时间内找到最可靠的方法来解除bug成为一个重要的挑战.对于系统开发人员来说,bug报告能非常有效地帮助其了解bug的所有特征信息,并找到能修复bug的方法.通过研究发现,许多大规模分布式系统之间具有较强的相关性和相似性,因而其bug的产生情况和修复方法也具有类似特征.开发人员可以利用已存在的修复bug的方案来协助修复与其一致或相近的bug.本文提出一个适用于大规模分布式系统的Debug协助工具——DBugHelper,能为某些大规模分布式系统的开发人员的bug修复提供比较有效、正确的帮助.DBugHelper将最新的bug报告进行文本处理,形成查询向量,并将大量已被修复的bug及其相关信息进行离线处理和缓存,从而为在线查询提供索引机制.通过将大量已修复的bug报告进行离线处理并同时减少在线处理的数据量,从而使其准确并快速地为系统开发人员提供必要的Debug协助工作,以此减少系统开发的周期与成本.
Development of large-scale distributed systems has experienced a long developing period. During the whole development cycle, debug is one of the most important steps. We meet the challenges of finding all the bugs and the corresponding solutions fixing bugs in a short time. Bug reports record bug histories and solutions, which provide a way to understand bug features and help to find solutions for new bugs. After we analyze the bug reports and fixed solutions, we find that there are strong correlation and similarity among many large-scale distributed systems. Thus the developing and fixing scheme of bugs may have similar characteristics. Then existed fixing solutions of bugs can be used to assist fixing new bugs. In this paper, we propose DBugHelper, a debug helping tool which can be applied to boost the development of large-scale distributed systems and provide a more effective way to fix bugs. In DBugHelper, the existed bug reports are processed offiine, and the latest bug report is represented as a query vector. We query the bug report history database and find the similar bugs with their solutions. In such way, we suppose to shorten the whole system development period.