东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

多核/众核平台上推荐算法的实现与性能评估

ISSN号：1002-137X
期刊名称：《计算机科学》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：国防科学技术大学计算机学院,长沙410073
相关基金：国家自然科学基金项目（61170049,61402488,61502514,61602501）; 国家863项目（2015AA01A301）资助

关键词：推荐系统, OPENCL, ALS, CCD, Recommender system, OpenCL, ALS, CCD

中文摘要：

用OpenCL语言标准设计并实现了推荐系统领域的两种经典算法：交替最小二乘法（Alternating Least Squares,ALS）与循环坐标下降法（Cyclic Coordinate Descent,CCD）。将其应用到CPU,GPU,MIC多核与众核平台上,探索了在该平台上影响算法性能的因子：潜在特征维数与线程个数。同时,将OpenCL实现的两种算法与CUDA和OpenMP的实现进行比较,得出了一系列结论。在同等条件下,与ALS算法相比,CCD算法的精度更高,收敛速度更快且更稳定,但所耗时间更长。ALS和CCD算法基于OpenCL的实现性能不亚于CUDA（CCD上加速比为1.03x,ALS上加速比为1.2x）和OpenMP的实现（CCD与ALS上加速比大约为1.6~1.7x）,并且两种算法在CPU平台上的性能均比GPU与MIC好。

英文摘要：

In this paper,we designed and implemented two typical recommender algorithms,alternating least squares and cyclic coordinate descent in openCL.Then we evaluated them on Intel CPUs,NVIDIA GPUs and Intel MIC,and investigated the performance impacting factors：potential feature dimension and the number of thread.Meanwhile,we compared the OpenCL implementation with that of CUDA and OpenMP.Our experimental results show that in the same condition,CCD converges faster and performs more steadily,but is more time-consuming than ALS.We also observed that the performance based on OpenCL is better than CUDA and OpenMP when running on the same platform：the training time on GPU is slightly faster than that of the CUDA implementation（1.03 xfor CCD and 1.2xfor ALS）,and the training time on CPU is 1.6~1.7times less than that of the OpenMP implementation with 16 threads.When running the OpenCL implementation on different platforms,we noticed that CPU performs better than both the GPU and the MIC.

同期刊论文项目

面向飞腾异构并行系统的OpenCL编程模型高效实现技术研究

期刊论文 1

GPU程序访存行为分析和优化关键技术研究

期刊论文 1

高效能异构处理器的存储层次设计和管理

期刊论文 1

基于GPU性能模型的异构系统优化技术研究

期刊论文 19 会议论文 14

同项目期刊论文

Fast parallel cutoff pair interactions for molecular dynamics on heterogeneous systems

MilkyWay-2 supercomputer: system and application

Accelerating PQMRCGSTAB Algorithm on Xeon Phi

面向ARMv8 64位多核处理器DGEMM的实现与优化

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system

基于Intel Xeon Phi的激光等离子体粒子模拟研究

Xeon Phi平台上基于模板优化的3D GVF场计算加速

Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems

Programming for scientific computing on peta-scale heterogeneous parallel systems

Parallelizing SOR for GPGPUs using alternate loop tiling

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

Energy optimization of representative barrier algorithms

基于动态电压调节的高性能业务系统能耗优化

面向高性能业务应用的基于剖视信息的系统能耗优化

Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems

Fast Parallel Cutoff Pair Interactions for Molecular Dynamics on Heterogeneous Systems

期刊信息

《计算机科学》
北大核心期刊（2011版）

主管单位:重庆西南信息有限公司（原科技部西南信息中心）
主办单位:重庆西南信息有限公司（原科技部西南信息中心）
主编：陈国良
地址：重庆市渝北区洪湖西路18号
邮编：401121
邮箱：jsjkx12@163.com
电话：023-63500828

国际标准刊号：ISSN：1002-137X
国内统一刊号：ISSN：50-1075/TP
邮发代号:78-68

获奖情况:
2001年重庆市优秀期刊,2004年第三届重庆市优秀科技期刊,2005年重庆市优秀期刊编辑部,2010年第六届重庆市期刊综合质量考核"十佳科技期刊",2012年重庆市出版专项资金报刊资助项目（重庆市新...,2013年重庆市出版专项资金重点学术期刊资助项目（...,2014年重庆市出版专项资金期刊资助项目（重庆市文...,2015年"中国国际影响力优秀学术期刊"

国内外数据库收录:
波兰哥白尼索引,美国乌利希期刊指南,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:41227