GNSS数据量呈指数级趋势增长,Hadoop分布式文件系统(HDFS)解决了海量GNSS数据存储瓶颈的难题,却面临内存占用多、文件相关性差和缺乏优化机制的问题。针对HDFS处理海量GNSS小文件效率不高的问题,结合GNSS数据类型、特点以及存储过程,提出了一种新的GNSS小文件云存储方法,优化了GNSS小文件的写入、读取、添加和删除策略。该方法分别按观测文件和解算成果的类型进行合并,对合并后的文件构建压缩Trie树索引,索引切分后,根据匹配算法分布式地存储索引块。实验采用国际GNSS服务(IGS)28d的数据和产品进行云存储优化。结果表明,该方法降低了各节点内存消耗,提高了海量GNSS小文件写入、读取和删除的效率,实现了对海量GNSS小文件的高效云存储。
The data volume of GNSS is increasing exponentially,while HDFS is capable of handling the problem of the storage bottleneck of massive GNSS data,it is faced with much time consumption,poor file correlation and lack of optimization mechanisms.According to the matter of the low processing efficiency of massive GNSS small files faced by HDFS,a new cloud storage method is provided based on the types,characteristics and storage flow of GNSS data,the writing,adding,reading and deleting strategies are optimized.First the observation files and solution files are respectively merged,and the compressed Trie index is established on the merged files;and after splitting the existed index,the index blocks are distributed stored in each mode based on the matching algorithm.Data and products of 28 days from IGS are applied in the experiment,and the result shows that the memory consumption of each node can be decreased greatly,and the efficiency of writing,direct reading,reading after adding files,concurrent reading and deleting can be improved significantly,effective cloud storage of massive GNSS small files is hereafter realized.