欢迎您!
东篱公司
退出
申报数据库
申报指南
立项数据库
成果数据库
期刊论文
会议论文
著 作
专 利
项目获奖数据库
位置:
成果数据库
>
会议
> 会议详情页
A Performance Model of Dense Matrix Operations on Many-core Architectures
所属机构名称:中国科学院计算技术研究所
成果类型:会议
相关项目:高性能片上存储系统
同会议论文项目
高性能片上存储系统
期刊论文 118
会议论文 65
同项目会议论文
Software and Hardware Co-designed Multi-level TLBs for Chip Multiprocessors
High Performance Matrix Multiplication on Many Cores
Architectural Support for Cilk Computations on Many-core Architectures.
Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor
A Low-Complexity Synchronization Based Cache Coherence Solution for Many Cores
A Research on an Optimized Adaptive Dynamic Power Management (Department Source
Optimizing Web Browser on Many-Core Architectures
Performance Improvement for Multicore Processors Using Variable Page Technologies
Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors
GPU-Warpsort: A Fast Comparison-based Sorting Algorithm on GPUs
Thread Owned Block Cache: Managing Latency in Many-Core Architecture
Detecting and Eliminating Potential Violation of Sequential Consistency for Concurrent C/C++ Program
Exploiting idle register classes for fast spill destination
An Interconnect-Aware Power Efficient Cache Coherence Protocol for CMPs
Study on Fine-grained Synchronization in Many-Core Architecture
Location Consistency Model Revisited: Problem,Solution and Prospects
Register Relocation to Optimize Clock Network for Multi-Domain Clock Skew Scheduling
Tolerating Memory Latency Using a Hardware-based Active-pushing Technique
A Processor-DMA-Based Memory Copy Hardware Accelerator
Navigating core Assisted Helper Threaded Prefetching
Testing Content Addressable Memories Using Instruction and March-like Algorithms
Formula-Oriented Compositional Minimization in Model Checking
On-the-Fly Reduction of Stimuli for Functional Verification
RIRI scheme: A robust instant-responding ratiochronous interface with zero-latency penalty
Improved texture compreseion for S3TC
Statistical Performance Comparisons of Computers
Zero-Efficient Buffer Design for Reliable Network-on-Chip in Tiled Chip-Multi-Processor
A General Method to Make Multi-Clock System Deterministic
Design of New Hash Mapping Functions
A Fast Linear-Space Sequence Alignment Algorithm with Dynamic Parallelization Framework
On Mitigating Memory Bandwidth Contention through Bandwidth-Aware Scheduling
基于共享存储的高可伸缩嵌入式集群模型
A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor
Investigation on Multi-Grain Parallelism in Chip Multiprocessor for Multimedia Application
Design of a Continuous Error Correction Pipeline
Data Management: The Spirit to Pursuit Peak Performance on Many-Core Processor
Efficient Address Mapping of Shared Cache for On-Chip Many-Core Architecture
DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/
Desynchronize a legacy floating-point adder with operand-dependant delay elements
Effective and Efficient Microprocessor Design Space Exploration Using Unlabeled Design Configuration
Characterizing and Understanding the Bandwidth Behavior of Workloads on Multi-core
Software and Hardware Cooperate for 1-D FFT Algorithm Optimization on Multicore Processors
Efficient Parallelization of a Protein Sequence Comparison Algorithm on Manycore Architecture
An optimized tag sorting circuit in WFQ scheduler based on leading zero counting
VB-DVFS: a new algorithm for power efficiency of CMP with GALS
Design and Effective functional Verification of an Embedded Processor with SIMD extension
A Synchronization-Based Alternative to Directory Protocol
Dynamic Register Promotion of Stack Variables
LReplay: A Pending Period Based Deterministic Replay Scheme
Alpha Compression with Variable Data Formats
Godson-3B: A 1GHz 40W 8-Core 128GFlops Processor in 65nm CMOS
On Improving Heap Memory Layout by Dynamic Pool Allocation
An Evaluation of Misaligned Data Access Handling Mechanisms in Dynamic Binary Translation Systems
Logic simulation acceleration based on GPU
Design of a Reliable Cache Based on Grouped Checking and Data Reloading
基于二进制插桩的ASIP处理器指令集混合仿真方法等
Empirical design bugs prediction for verification
Exploiting the Character of Memory Accesses to Achieve Lower Power Consumption of the Data TLB
Design and Performance Analysis of One 32-bit Dual Issue RISC Processor for Embedded Application
The implementation and design methodology of a quad-core version Godson-3 microprocessor
Logical clustering for fast clock skew scheduling
Efficient Binary Translation System with Low Hardware Cost
Efficiency-Aware QoS DRAM Scheduler
An Efficient Methodology for Power Modeling and Simulation of Modern Cell-Based Microprocessors