现有反编译器产生的代码与对应的源代码之间存在差异,找到并理解差异有助于改进并完善反编译器的设计。该文给出一种适用于C语言反编译代码与源代码的比较算法。该算法以语法树匹配方法为基础,定义新的C语言中间代码表示形式并对表达式进行动态匹配,提高了语法树匹配的准确性。实验结果表明,该算法能有效计算出反编译代码与源代码之间的多数差异。
There are many differences between the codes produced by existing decompilers and the corresponding source codes. To find and understand these differences can help to improve and refine the design of decompiler. This paper presents a differencing algorithm for decompilation code and source code of a C program. This algorithm is based on syntax tree matching method. A new C program intermediate representation is defined and a dynamic expression matching method is proposed to improve the accuracy of the syntax tree matching method. Experimental results show that this algorithm is able to find out most differences between decompilation code and source code.