会议详情页-东篱科研大数据发现系统（DRDS）

欢迎您！东篱公司退出

申报数据库
1. 申报指南
立项数据库
成果数据库
1. 期刊论文
2. 会议论文
3. 著作
4. 专利
项目获奖数据库

位置：成果数据库 > 会议 > 会议详情页

A Performance Model of Dense Matrix Operations on Many-core Architectures

所属机构名称：中国科学院计算技术研究所
成果类型：会议
相关项目：高性能片上存储系统

同会议论文项目

高性能片上存储系统

期刊论文 118 会议论文 65

同项目会议论文

Software and Hardware Co-designed Multi-level TLBs for Chip Multiprocessors

High Performance Matrix Multiplication on Many Cores

Architectural Support for Cilk Computations on Many-core Architectures.

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor

A Low-Complexity Synchronization Based Cache Coherence Solution for Many Cores

A Research on an Optimized Adaptive Dynamic Power Management (Department Source

Optimizing Web Browser on Many-Core Architectures

Performance Improvement for Multicore Processors Using Variable Page Technologies

Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors

GPU-Warpsort: A Fast Comparison-based Sorting Algorithm on GPUs

Thread Owned Block Cache: Managing Latency in Many-Core Architecture

Detecting and Eliminating Potential Violation of Sequential Consistency for Concurrent C/C++ Program

Exploiting idle register classes for fast spill destination

An Interconnect-Aware Power Efficient Cache Coherence Protocol for CMPs

Study on Fine-grained Synchronization in Many-Core Architecture

Location Consistency Model Revisited: Problem,Solution and Prospects

Register Relocation to Optimize Clock Network for Multi-Domain Clock Skew Scheduling

Tolerating Memory Latency Using a Hardware-based Active-pushing Technique

A Processor-DMA-Based Memory Copy Hardware Accelerator

Navigating core Assisted Helper Threaded Prefetching

Testing Content Addressable Memories Using Instruction and March-like Algorithms

Formula-Oriented Compositional Minimization in Model Checking

On-the-Fly Reduction of Stimuli for Functional Verification

RIRI scheme: A robust instant-responding ratiochronous interface with zero-latency penalty

Improved texture compreseion for S3TC

Statistical Performance Comparisons of Computers

Zero-Efficient Buffer Design for Reliable Network-on-Chip in Tiled Chip-Multi-Processor

A General Method to Make Multi-Clock System Deterministic

Design of New Hash Mapping Functions

A Fast Linear-Space Sequence Alignment Algorithm with Dynamic Parallelization Framework

On Mitigating Memory Bandwidth Contention through Bandwidth-Aware Scheduling

基于共享存储的高可伸缩嵌入式集群模型

A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor

Investigation on Multi-Grain Parallelism in Chip Multiprocessor for Multimedia Application

Design of a Continuous Error Correction Pipeline

Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor

Efficient Address Mapping of Shared Cache for On-Chip Many-Core Architecture

DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/

Desynchronize a legacy floating-point adder with operand-dependant delay elements

Effective and Efficient Microprocessor Design Space Exploration Using Unlabeled Design Configuration

Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core

Software and Hardware Cooperate for 1-D FFT Algorithm Optimization on Multicore Processors

Efficient Parallelization of a Protein Sequence Comparison Algorithm on Manycore Architecture

An optimized tag sorting circuit in WFQ scheduler based on leading zero counting

VB-DVFS: a new algorithm for power efficiency of CMP with GALS

Design and Effective functional Verification of an Embedded Processor with SIMD extension

A Synchronization-Based Alternative to Directory Protocol

Dynamic Register Promotion of Stack Variables

LReplay: A Pending Period Based Deterministic Replay Scheme

Alpha Compression with Variable Data Formats

Godson-3B: A 1GHz 40W 8-Core 128GFlops Processor in 65nm CMOS

On Improving Heap Memory Layout by Dynamic Pool Allocation

An Evaluation of Misaligned Data Access Handling Mechanisms in Dynamic Binary Translation Systems

Logic simulation acceleration based on GPU

Design of a Reliable Cache Based on Grouped Checking and Data Reloading

基于二进制插桩的ASIP处理器指令集混合仿真方法等

Empirical design bugs prediction for verification

Exploiting the Character of Memory Accesses to Achieve Lower Power Consumption of the Data TLB

Design and Performance Analysis of One 32-bit Dual Issue RISC Processor for Embedded Application

The implementation and design methodology of a quad-core version Godson-3 microprocessor

Logical clustering for fast clock skew scheduling

Efficient Binary Translation System with Low Hardware Cost

Efficiency-Aware QoS DRAM Scheduler

An Efficient Methodology for Power Modeling and Simulation of Modern Cell-Based Microprocessors