预取是提高存储系统性能的主要手段之一.但现有存储系统的设备层并不知道任何I/O访问的语义信息,因而不能充分利用I/O访问的语义来预取下一时刻要访问的数据,只能利用较简单的方式如I/O访问的局部性、顺序访问和循环访问等特性来实现简单的预测.为此,本文根据存储系统的特点提出了实用且高效的基于连续度的聚类算法来发现密集读请求访问的区域,并采用ARMA时间序列模型来预测密集读请求可能访问的区域及访问时刻,为正确的预取提供了准确的信息.为提高预取的准确性,并采用了动态参数估计的策略.通过大量实验的结果验证了这两种算法的正确性和预测的准确性,能较大的提高存储系统的预取效率.
Prefetching is a one of the most important methods to improve storage system performance. Without knowing any I/ O semantics in device layer,it is not easy now for storage system to exploit semantic information and to prefetch the accessed data. Many prefetching policies have to relay on simple patterns such as sequentially,temporal locality and loop references to improve storage system performance. Therefore,according to characteristic of storage system, this paper not only introduces a new sequence degree-based clustering algorithm to find the storage areas which be read frequently,but also adopts ARMA time series model to forecast the storage areas requested frequently by future read requests and their corresponding request time. Moreover,to improve the accuracy forecast,this paper adopts dynamic parameter estimation policy to ARMA model. The results of a large number of simulations validate the accuracy of the clustering algorithm and the preciseness of the ARMA time series model of dynamic parameter estimation policy,and indicate that storage system can greatly improve the efficiency of cache prefetching through applying the clustering algorithm and ARMA time series model.