基于k-gram的静态软件胎记无法较好区分2个代码量差距较大的程序,且其胎记的鲁棒性一般。为此,将程序k-gram碎片的频数向量作为软件胎记,以其中一个程序的k-gram碎片集为基准对另一个程序的频数向量进行预变换,计算变换后的2个频数向量之间夹角的余弦,作为胎记的相似度。实验结果表明,对于Java类文件的检测,该胎记的可信性和鲁棒性均有一定提高。
Static software birthmark based on k-gram can not appropriate to distinguish the two programs with much different amount of codes and its robustness is generally. Aiming at these problems, this paper presents a static software birthmark based on k-gram frequencies. For reaching a better balance between its credibility and resilience, it performs the fore-transform on the frequency vectors, and then uses vector cosine to calculate the birthmark similarity. Experimental results show that both the credibility and resilience of the birthmark are improved for Java class files.