最近,深学习处理器变得深加速的最有希望的答案之一学习算法。当前,编程的唯一的方法深学习的处理器通过作品汇编指令用赤裸的手,它花去很多编程努力和原因很低的效率。一个答案作为新后端把深学习处理器集成到一个流行高级深学习框架(例如, TPU (张肌处理单位) 直接集成于 Tensorflow ) 。然而,这将妨碍另外的框架从编程获利接口。其他的途径是为深学习处理器设计一个框架无关的低级图书馆(例如,深学习为 GPU 的图书馆, cuDNN ) 。以这种方式,图书馆能方便地在高级编程框架被调用并且提供更多的概论。为了允许更多的深学习框架到获得,从这有益于环境,我们作为能容易被嵌进当前的高级框架并且提供高效的一个低级图书馆想象它。设计如此的一个图书馆的三个主要问题被讨论。第一个是数据结构的设计。当能支持所有可能的操作时,数据结构应该是尽可能很少。这将允许我们优化没有损害概论更容易的数据结构。第二种是操作的选择,它应该提供操作的一个相当宽的范围与高效率支持网络的各种各样的类型。第三是 API 的设计,它应该提供一个灵活、用户友好的编程模型并且应该容易被嵌进深存在学习框架。考虑到所有上面发出,我们建议 DLPlib,一个张肌过滤器基于为深学习处理器特定的设计的图书馆。它包含二专业数据结构,张肌和过滤器,和包括基本神经网络原语和矩阵 / 向量操作的一套操作符。它提供作为 C++ 暴露的基于描述符的 API 接口。图书馆与手书集会指令的表演相比完成 0.79x 的加速。
Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing assembly instructions by bare hands, which costs a lot of programming efforts and causes very low efficiency. One solution is to integrate the deep learning processors as a new back-end into one prevalent high-level deep learning framework (e.g., TPU (tensor processing unit) is integrated into Tensorflow directly). However, this will obstruct other frameworks to profit from the programming interface, The alternative approach is to design a framework-independent low-level library for deep learning processors (e.g., the deep learning library for GPU, cuDNN). In this fashion, the library could be conveniently invoked in high-level programming frameworks and provides more generality. In order to allow more deep learning frameworks to gain benefits from this environment, we envision it as a low-level library which could be easily embedded into current high-level frameworks and provide high performance. Three major issues of designing such a library are discussed. The first one is the design of data structures. Data structures should be as few as possible while being able to support all possible operations. This will allow us to optimize the data structures easier without compromising the generality. The second one is the selection of operations, which should provide a rather wide range of operations to support various types of networks with high efficiency. The third is the design of the API, which should provide a flexible and user-friendly programming model and should be easy to be embedded into existing deep learning frameworks. Considering all the above issues, we propose DLPIib, a tensor-filter based library designed specific for deep learning processors. It contains two major data structures, tensor and filter, and a set of operators including basic neural network primitives and matrix/vec