分布式文件系统(HDFS)存取大文件有很好的性能,但存取海量小文件时效率很差。鉴于此,提出一种小文件存取优化策略。存储小文件时,客户端将它们按类型和访问权限分类合并成MapFile,合并后的大文件交给HDFS处理。读取小文件时,引入由Nexist文件缓冲区域、一级缓存和二级缓存组成的缓存模块。实验表明,该策略能有效降低存取海量小文件时主节点中内存的消耗,同时减少了小文件的存取时间,极大提高存取的性能。
Hadoop distributed file system (HDFS) has a very good performance in accessing large files,but it was inefficient when accessing massive small files.For that reason,a new strategy for optimizing the access of small files was proposed in this paper.When storing small files,they will be merged into MapFile by type and access rights in the Client Node,then HDFS will handle those large files.When reading small files,a cache module was introduced,which composed of a buffer area of Nexist file,Cache L1 and Cache L2.Experiments showed that,this strategy can reduce the memory consumption of NameNode when accessing massive small files effectively,reduce the time for accessing small files,and greatly improve the performance of accessing simultaneously.