位置:成果数据库 > 期刊 > 期刊详情页
Using multi-threads to hide deduplication I/O latency with low synchronization overhead
  • ISSN号:1002-137X
  • 期刊名称:《计算机科学》
  • 时间:0
  • 分类:TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术] TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]School of Computer, Huazhong University of Science and Technology, Wuhan 430074, China, [2]Wuhan National Lab for Optoelectronics, Wuhan 430074, China
  • 相关基金:Project(IRT0725) supported by the Changjiang Innovative Group of Ministry of Education, China
中文摘要:

Data deduplication,as a compression method,has been widely used in most backup systems to improve bandwidth and space efficiency.As data exploded to be backed up,two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency.However,CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors;the I/O latency is likely becoming the bottleneck in data deduplication.To alleviate the challenge of I/O latency in multi-core systems,multi-threaded deduplication (Multi-Dedup) architecture was proposed.The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency.A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead.On the other hand,a collisionless cache array was also designed to preserve locality and similarity within the parallel threads.In various real-world datasets experiments,Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods.In addition,Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.

英文摘要:

Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/0 intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/0 latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/0 latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/0 latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3-5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5-2 times performance improvements comparing to traditional lock-based synchronization methods.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《计算机科学》
  • 北大核心期刊(2011版)
  • 主管单位:重庆西南信息有限公司(原科技部西南信息中心)
  • 主办单位:重庆西南信息有限公司(原科技部西南信息中心)
  • 主编:陈国良
  • 地址:重庆市渝北区洪湖西路18号
  • 邮编:401121
  • 邮箱:jsjkx12@163.com
  • 电话:023-63500828
  • 国际标准刊号:ISSN:1002-137X
  • 国内统一刊号:ISSN:50-1075/TP
  • 邮发代号:78-68
  • 获奖情况:
  • 2001年重庆市优秀期刊,2004年第三届重庆市优秀科技期刊,2005年重庆市优秀期刊编辑部,2010年第六届重庆市期刊综合质量考核"十佳科技期刊",2012年重庆市出版专项资金报刊资助项目(重庆市新...,2013年重庆市出版专项资金重点学术期刊资助项目(...,2014年重庆市出版专项资金期刊资助项目(重庆市文...,2015年"中国国际影响力优秀学术期刊"
  • 国内外数据库收录:
  • 波兰哥白尼索引,美国乌利希期刊指南,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:41227