稀疏矩阵向量乘(SpMV)采取压缩行存储格式的算法性能非常差,而寄存器分块算法可以使得数据尽量在靠近处理器的存储层次中访问而提高性能。利用RAM(h)模型进行分析和比较不同算法形式的存储访问复杂度,可以比较两种算法的优劣。通过RAM(h)分析SpMV两种实现形式的存储访问复杂度,同时在奔腾四平台上,测试了7个稀疏矩阵的SpMV性能,并统计了这两种算法中L1,L2,和TLB的缺失率,实验结果与模型分析的数据一致。
Sparse matrix-vector multiplication is an important computational kernel in scientific applications that tends to perform poorly on modem processors. But the register-level blocked algorithm can optimize memory hierarchy access, improve the performance. RAM (h) is a computation model that has h-level memory hierarchies. It indicates that different implementation forms of one same algorithm can have different memory access complexity. With RAM (h) model, memory access complexity analysis is performed on two implementation forms of SpMV, which are CSR storage algorithm and register-level blocked algorithm. Statistical results of performance and the miss rate of L1, L2 and TLB on Pentium IV platform are listed. Model analytical results matched well with experimental results.