程序调试工作的首要基础是错误可重现,然而并行程序执行过程存在天然的不确定性,尤其在多核处理器上,如何重现并行程序的错误是一个巨大的挑战.现有的方法或记录整个系统的状态或需要细粒度插桩,存在可用性差与运行时开销大等问题.本文首次提出一种基于硬件辅助的面向用户态并行程序的轻量级记录与重放方法,该方法通过软件协助来记录信号、系统调用与操作系统调度相关的序关系;利用硬件记录访存冲突,同时在记录过程中采用基于目录的方法来压缩日志存储.通过在16核模拟平台上评估,本文提出的方法不仅方便了用户态并行程序调试,同时减少了81%的存储日志开销.
Bug reproduction is critical to debug software. But parallel programs are born with non-determinism, because of which reproducing a concurrency bug on CMP becomes a big challenging. Previous work either brings in large runtime overhead or is impractical, so that the paper proposes a lightweight hardware assisted approach to record and replay parallel program. In the approach, software and hardware cooperate to record non-determinism, including: system call, signal, special instruction and memory conflict. Furthermore, a compression technology based on directory is used to reduce log size and a replay algorithm based on the recording log is proposed. Experiment results show that our approach not only can provide a convenient approach for application programmers but also can reduce log size by 81%.