东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种容三盘失效纠删码的单数据盘失效快速重建方法

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP31[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国船舶重工集团公司第709研究所,湖北武汉430074, [2]华中科技大学计算机科学与技术学院,湖北武汉430074
相关基金：国家自然科学基金资助项目（61232004）

关键词：视频, 爬虫, 分布式, Hadoop框架, MapReduce计算模型, HADOOP分布式文件系统, video, crawler, distribution, Hadoop framework, MapReduce computation model, HDFS

中文摘要：

针对当前网络中视频媒体数量大、更新快、内容多、下载难,以及基于单机的视频网络爬虫系统中的处理速度慢、并发度低和下载速度慢等问题,提出了基于Hadoop框架的视频爬虫系统,为视频爬取提供了高并发度的处理和爬取速度.通过MapReduce计算模型实现网页抓取、分析、去重及下载等计算任务,Hadoop分布式文件系统（HDFS）存储各阶段计算任务的计算结果,运用多处备份机制,使得在某个结点退出时转移任务集,不影响整个系统的稳定性和有效性.实验结果表明完全分布式基于Hadoop的视频爬虫系统无论在单位时间内的视频下载速率还是爬取网页个数都明显高于未基于Hadoop的和伪分布式的视频爬虫系统.

英文摘要：

Current network has numerous,comprehensive,update sooner video content,and there exists some flaws such as slow disposing speed,low concurrency and slow download speeds in the video content crawler system based on single PC（personal computer）.In consider of these problems,the video content crawler system was proposed based on Hadoop framework to acquire high concurrency processing and crawling speed.The MapReduce computation model was used to implement crawling,analysis,duplicate removal,downloads and other computing tasks,and the Hadoop distributed file system（HDFS）was used to do the storage for the coordination with the computing model.Experiment demonstrates that the video content crawler system based on Hadoop both in downloading speed and crawling webpage numbers are significantly higher than the single and pseud-distributed one.

同期刊论文项目

面向服务的大规模网络存储系统理论与方法

期刊论文 30 会议论文 46 获奖 2 专利 19

大规模数据存储系统能耗优化方法的研究

期刊论文 28 会议论文 48 专利 15

网络公用存储服务的系统结构和数据组织研究

期刊论文 4

同项目期刊论文

基于再生码的云存储系统--Ustor

Memcached的分析与改进

基于I/O的广域网数据异步复制系统设计与实现

热辅助磁记录技术

Exploring and Exploiting the Multi-level Parallelism Inside SSDs for Improved

An Efficient Penalty-Aware Cache to Improve the Performance of Parity-Based Disk Arrays under Faulty

Solving symbolic regression problems with uniform design-aided gene expression programming

Optimizing storage performance in public cloud platforms

Online Availability Upgrades for Parity-Based RAIDs through Supplementary Parity Augmentations

A hybrid collaborative filtering recommendation mechanism for P2P networks

一种面向固态硬盘的Cache置换算法

基于读写优化的内核级加密文件系统

Strip-oriented Asynchronous Prefetching for Parallel Disk Systems

Exploiting Redundancies and Deferred Writes to Conserve Energy in Erasure-Coded Storage Clusters

A Dynamic Performance-Based Flow Control Method for High-Speed Data Transfer

Exploring and Exploiting the Multi-level Parallelism Inside SSDs for Improved Performance and Endura

Hint-K: An Efficient Multi-level Cache Using K-step Hints

S2-RAID: Parallel RAID Architecture for Fast Data Recovery

A New Parity-Based Migration Method to Expand RAID-5

BMCloud: Minimizing Repair Bandwidth and Maintenance Cost in Cloud Storage.

磁盘存储系统节能技术研究综述

ST-CDP: Snapshots in TRAP for Continuous Data Protection

Extending and analysis of X-Code

基于RPC的分布式系统个性监测与预测分析

用户态RPC协议分析及其多线程优化

一种高性能多媒体对象存储系统的研究

Exploring optimal combination of a file system and an I/O scheduler for underlying solid state disks

Optical storage： an emerging option in long-term digital preservation

基于相变存储器的存储技术研究综述

Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems

Weighted fairness resource allocation of disks in Xen

Exploring and Exploiting the Multilevel Parallelism Inside SSDs for Improved Performance and Enduran

Using Provenance to Efficiently Improve Metadata Searching Performance in Storage Systems

Improving Hybrid FTL by Fully Exploiting Internal SSD Parallelism with Virtual Blocks

A Review of Storage Technology Research Based on Phase Change Memory

SeDas: A Self-Destructing Data System Based on Active Storage Framework

Design and Implementation of Holistic Scheduling and Efficient Storage for FlexRay

Locality-Sensitive Bloom Filter for Approximate Membership Query

Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems

BLESS: Object level encryption security for object-based storage system

SANE: Semantic-Aware Namespace in Ultra-large-scale File Systems

ANTELOPE: A Semantic-aware Data Cube Scheme for Cloud Data Center Networks

An Integrated Pseudo-Associativity and Relaxed-Order Approach to Hardware Transactional Memory

RAID6编码的扩展算法及性能研究

SPA:OnLine Availability Upgrades for Parity-based RAIDs through Supplementary Parity Augmentations.

Design and Evaluation of a Provenance-Based Rebuild Framework

ALDM: Adaptive Loading Data Migration in Distributed File Systems

mlock: Building delegable metadata service for the paralle file systems

Extending and analysis of X-Code

基于再生码的云存储系统--Ustor

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433