东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

DLPlib： A Library for Deep Learning Processor

ISSN号：1000-9000
期刊名称：《计算机科学技术学报：英文版》
时间：0
分类：TP[自动化与计算机技术]
作者机构：[1]State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China, [2]Microprocessor Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, [3]University of Chinese Academy of Sciences, Beijing 100049, China, [4]Department of Computer Science, University of Science and Technology of China, Hefei 230026, China
相关基金：This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61432016, 61472396, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, and 61602446, the National Basic Research 973 Program of China under Grant No. 2015CB358800, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDB02040009.

作者： Hui-Ying Lan[1,2,3], Lin-Yang Wu[1,2,3], Xiao Zhang[1,2], Jin-Hua Tao[1,2], Xun-Yu Chen[1,2], Bing-Rui Wang[1,2,4], Yu-Qing Wang[1,2,4], Qi Guo[1,2], Yun-Ji Chen[1,2]

关键词：深学习处理器, API, 图书馆, DLPlib, deep learning processor, API, library, DLPlib

中文摘要：

最近，深学习处理器变得深加速的最有希望的答案之一学习算法。当前，编程的唯一的方法深学习的处理器通过作品汇编指令用赤裸的手，它花去很多编程努力和原因很低的效率。一个答案作为新后端把深学习处理器集成到一个流行高级深学习框架(例如， TPU (张肌处理单位) 直接集成于 Tensorflow ) 。然而，这将妨碍另外的框架从编程获利接口。其他的途径是为深学习处理器设计一个框架无关的低级图书馆(例如，深学习为 GPU 的图书馆， cuDNN ) 。以这种方式，图书馆能方便地在高级编程框架被调用并且提供更多的概论。为了允许更多的深学习框架到获得，从这有益于环境，我们作为能容易被嵌进当前的高级框架并且提供高效的一个低级图书馆想象它。设计如此的一个图书馆的三个主要问题被讨论。第一个是数据结构的设计。当能支持所有可能的操作时，数据结构应该是尽可能很少。这将允许我们优化没有损害概论更容易的数据结构。第二种是操作的选择，它应该提供操作的一个相当宽的范围与高效率支持网络的各种各样的类型。第三是 API 的设计，它应该提供一个灵活、用户友好的编程模型并且应该容易被嵌进深存在学习框架。考虑到所有上面发出，我们建议 DLPlib，一个张肌过滤器基于为深学习处理器特定的设计的图书馆。它包含二专业数据结构，张肌和过滤器，和包括基本神经网络原语和矩阵 / 向量操作的一套操作符。它提供作为 C++ 暴露的基于描述符的 API 接口。图书馆与手书集会指令的表演相比完成 0.79x 的加速。

英文摘要：

Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing assembly instructions by bare hands, which costs a lot of programming efforts and causes very low efficiency. One solution is to integrate the deep learning processors as a new back-end into one prevalent high-level deep learning framework （e.g., TPU （tensor processing unit） is integrated into Tensorflow directly）. However, this will obstruct other frameworks to profit from the programming interface, The alternative approach is to design a framework-independent low-level library for deep learning processors （e.g., the deep learning library for GPU, cuDNN）. In this fashion, the library could be conveniently invoked in high-level programming frameworks and provides more generality. In order to allow more deep learning frameworks to gain benefits from this environment, we envision it as a low-level library which could be easily embedded into current high-level frameworks and provide high performance. Three major issues of designing such a library are discussed. The first one is the design of data structures. Data structures should be as few as possible while being able to support all possible operations. This will allow us to optimize the data structures easier without compromising the generality. The second one is the selection of operations, which should provide a rather wide range of operations to support various types of networks with high efficiency. The third is the design of the API, which should provide a flexible and user-friendly programming model and should be easy to be embedded into existing deep learning frameworks. Considering all the above issues, we propose DLPIib, a tensor-filter based library designed specific for deep learning processors. It contains two major data structures, tensor and filter, and a set of operators including basic neural network primitives and matrix/vec

同期刊论文项目