提出了一种簇内高效并行访问存储结构.该结构采用“逻辑共享、物理分布”多个存储块并行存储的方法,实现了4×4视频阵列处理器的并行访问.实验结果表明,在无冲突情况下,该结构支持16个轻核处理元的同时读/写操作,最高频率200 MHz,访存峰值带宽6.25GB/s.最后对8×8二维离散余弦变换算法进行映射实现和性能比较发现,簇内存储结构能够为该算法提供312.2Msamples/s的数据访存带宽,相较于同类型阵列结构,执行周期数降低了31.67%,工作频率提高了一倍,访存带宽增加了192.60%.辅助设计.
A high efficient and parallel access memory structure is proposed. The architecture adopts the method of " logical sharing, physical distribution" and parallel storage of multiple memory blocks, which realizes the parallel access of 4 x 4 video array processors. The experimental results show that the proposed architecture can support simultaneous read/write operations of 16 light nuclear processing elements, the highest frequency is 200 MHz, access to the peak bandwidth of 6. 25 GB/s. Finally, the 8X 8 two-dimensional discrete cosine transform algorithm is mapped and compared. It is found that the cluster memory structure can provide data storage bandwidth of 312.2 Msamples/s. Compared with the same type of array structure, the number of execution cycles decreased 31.67%, frequency doubled, memory bandwidth is increased by 192. 60%.