文件语义在优化大规模证明了有效分布式的文件系统。作为在上面的层应用程序和文件系统之间的精致、富有的 I/O 接口的后果,文件系统能提供有用、深刻的信息关于语义。因此,文件语义采矿在工程和研究社区成为了一个日益重要的惯例。不幸地,利用文件是挑战语义知识因为许多因素能影响这信息探索,处理。甚至更坏,挑战由于在这些因素之间的复杂互相依赖被加重,并且使充分在各种各样的语义知识之中利用潜在地重要的关联困难。在文件在向量以内被当作一个 multivariate 向量空格,和每个项目的地方,这篇文章建议文件存取关联 miming 和评估引用(农民) 模型通信给定的文件的一个分开的因素。因素的选择取决于申请,因素的例子是文件路径,创造者和执行节目。如果一个特别因素发生在两个文件,它的值是非零。内部文件的关系的程度能在语义向量基于他们的因素价值的相似被测量,是清楚的。从这个模型,的利益农民代表组织了标识符,和基本向量操作的向量的文件能被利用确定在二文件向量之间的文件关联。农民模型利用线性回归模型估计在文件关联和一套影响因素之间的关系的力量以便坏知识能被滤出。为了表明新农民模型,的能力,农民作为案例研究被合并到一个真实大规模基于目标的存储系统动态地推断文件关联。另外使农民能优化服务因为预取算法和对象数据布局算法的元数据被实现。当时,是使农民能的预取算法的试验性的结果表演被显示由近似 30%40% 减少元数据操作潜伏与预取算法和一条通常使用的代替政策的一个最先进的元数据相比。
File semantic has proven effective in optimizing large scale distributed file system.As a consequence of the elaborate and rich I/O interfaces between upper layer applications and file systems,file system can provide useful and insightful information about semantic.Hence,file semantic mining has become an increasingly important practice in both engineering and research community.Unfortunately,it is a challenge to exploit file semantic knowledge because a variety of factors coulda ffect this information exploration process.Even worse,the challenges are exacerbated due to the intricate interdependency between these factors,and make it difficult to fully exploit the potentially important correlation among various semantic knowledges.This article proposes a file access correlation miming and evaluation reference(FARMER) model,where file is treated as a multivariate vector space,and each item within the vector corresponds a separate factor of the given file.The selection of factor depends on the application,examples of factors are file path,creator and executing program.If one particular factor occurs in both files,its value is non-zero.It is clear that the extent of inter-file relationships can be measured based on the likeness of their factor values in the semantic vectors.Benefit from this model,FARMER represents files as structured vectors of identifiers,and basic vector operations can be leveraged to quantify file correlation between two file vectors.FARMER model leverages linear regression model to estimate the strength of the relationship between file correlation and a set of influencing factors so that the "bad knowledge" can be filtered out.To demonstrate the ability of new FARMER model,FARMER is incorporated into a real large-scale object-based storage system as a case study to dynamically infer file correlations.In addition FARMER-enabled optimize service for metadata prefetching algorithm and object data layout algorithm is implemented.Experimental results show that is FARMER-enabled prefetching al