当今的云计算平台和大型网站在运行时都会产生大量的日志文件,这些日志文件一般都具有收集分析的价值,所以在日志文件的收集过程中就出现了大日志文件的传输问题。本文要解决的问题就是如何使日志文件能够快速的传输到接收端。为此本文在研究了当前已经有的大数据传输办法之后,针对日志文件提出了与传输协议无关的新算法:文件拆分和预测算法。该算法主要由两部分组成:首先对日志文件进行拆分,拆分成包含描述性信息的文件和包含数据的文件,消除了文件中的冗余信息;然后在传输过程中通过预测接收端缓存的数据来达到消除传输过程中的冗余信息的目的。经过实验检验,本文设计的算法能够将实际传输的数据量降低90%以上。
As networks and transmission are becoming increasingly important for computational science,we are working on proposing a journal file transmission algorithm.This new algorithm focusing on optimizing the performance of journal file transmission is independent of transport protocols.Our approach includes two components: the file splitting and buffer memory prediction.The file splitting performs file compression and redundancy elimination,and the buffer memory prediction avoids transferring the redundant data.Experimental results verify that the journal file transmission algorithm can effectively eliminate the redundancy in the journal file and reduce the amount of data transmission by 9.0%.