基于Cell处理器的异构多核架构及软件显式管理的多级存储层次,使其面临编程困难和性能难以有效发挥等问题.现有基于Cell/B.E.的编程模型多侧重于支持类似于流处理的“批量访存”(bulk data transfer)应用,传统非规则访存应用性能较低.通过扩展Cell/B.E.访存库增强协处理单元的自主作用,以协处理单元为中心建立Cell计算平台上的MPI和弱一致性Pthread分层并行编程运行时支持.分层的运行时支持结构及扩展后的Cell/B.E.访存库使模型具有更好的效率和可扩展性,并且提高了非规则应用的性能;模型中的MPI方便了大量传统并行应用向新架构的移植及开发,而弱一致性Pthread则为MPI提供高效的任务运行时管理支持及为系统级用户提供对架构全面控制的编程接口.实验结果表明,提出的运行时支持技术不仅可适应不同应用的要求,同时借助访存库中的剖分优化机制可有效地挖掘Cell/B.E.架构性能.
The heterogeneous multi-core architecture and explicit management for multi-level memory hierarchy of cell processor pose programming & performance challenges to both programmers and applications. Most programming models for Cell/B. E. based system well support bulk data transfer applications which are suitable for streaming processing, but suffer performance degradation for the applications whose memory access patterns are irregular or unpredictable. In this paper, by extending memory access library and enhancing the co-processor's independency of Cell/B. E. processor, a co- processor centric multilayer runtime library which supports both MPI and release consistency-based Pthread programming model is proposed. The multilayer structure of the model and the flexible extended memory access library not only make the model more efficient and scalable but also boost the performance of irregular applications. In the model, while MPI programming interface enables large existing MPI applications to be ported to the Cell/B. E. processor easily and facilitates the traditional parallel programming, the release-consistency-based Pthread programming interface offers an efficient task runtime library to both MPI and the system-level users who need full control over the architecture. The experimental results show that the proposed multilayer runtime library is suitable for various applications and can achieve better performance by using a profile based optimizing technology built in the memory access library.