东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

数字标准平台中海量时空小文件合并策略研究

ISSN号：1001-3695
期刊名称：《计算机应用研究》
时间：0
分类：TP333[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]湖北省标准化研究院,武汉430061, [2]武汉大学测绘遥感信息工程国家重点实验室,武汉430079
相关基金：国家自然科学基金资助项目（61263040,61075015）

关键词：数字标准平台, HDFS, 小文件, 时空数据, 序列模式挖掘, digital standard platform, HDFS, small file, spatio-temporal data, sequential pattern mining

中文摘要：

针对HDFS处理时空小文件效率不高的问题，从用户的访问规律和访问数据自身属性这两者之间的相关性上出发，将用户访问流看成对数据文件的请求序列，然后根据数据的时空属性参数化表示，并利用特征提取构建一个新的特征序列，最后通过序列模式挖掘PrefixSpan算法找到用户在不同访问模式下的特征模板，合并相关文件。实验结果表明，该合并策略有效地降低了NameNode内存占用率和响应时间，提高了读取效率。

英文摘要：

Aiming to the issues of low processing efficiency of small files in HDFS,from the perspective of researching corre-lation between user’s accessing regulation and data attributes,this paper treated user accessing streams as request sequences to data files,and parameterized these data on the basis of its spatial and temporal properties.When it generalized new signa-ture sequences by feature extraction,the feature templates of different access modes were found through sequential pattern mi-ning by PrefixSpan algorithm.Experimental results show that the consolidation strategy effectively reduces the NameNode mem-ory usage and response time,and improves the system read efficiency.

同期刊论文项目