Kad网络中存在数以亿计的共享资源,而其中有相当一部分可被评定为敏感资源。为深入了解Kad网络上资源尤其是敏感资源的特征,运用Kad网络采集器:Rainbow对节点拥有的文件资源进行探测分析。该文发现:1)文件流行度和文件所对应的文件名数量都近似符合Zipf分布;2)利用同一个"文件内容哈希"(即file-content-hash)的多个文件名的共现词可以更准确地进行敏感判别;3)敏感资源占随机样本的6.34%,且敏感资源中74.8%为video文件。
In Kad network,there are hundreds of millions of shared resources,among which a considerable part can be rated as questionable information.In order to understand the characteristics of resources,especially questionable ones,in Kad network,the file resources of peers are measured and analyzed using the Kad-network crawler Rainbow.We find that: 1) both the popularity of files and the number of filenames corresponding to a file approximately fit Zipf distribution;2) the severity of questionable files can be judged more accurately using co-occurrence-words in multiple filenames corresponding to the same file-content-hash;3) the questionable resources only occupy 6.34% of random samples,and 74.8% of which are video files.