东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

大数据流式计算:关键技术及系统实例

ISSN号：1000-9825
期刊名称：软件学报
时间：2014.4.15
页码：839-862
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]清华大学计算机科学与技术系,北京100084, [2]符号计算与知识工程教育部重点实验室(吉林大学),吉林长春130012
相关基金：国家自然科学基金（61170008,61272055）;国家重点基础研究发展计划（973）（2014CB340402）;吉林大学符号计算与知识工程教育部重点实验室资助项目（93K172012K12）致谢在此,我们向对本文的工作给予支持和建议的老师和同学表示感谢.
相关项目：大规模存储系统性能测试方法与技术研究

关键词：大数据计算, 流式计算, 流式大数据, 内存计算, 系统实例, big data computing, stream computing, stream big data, memory computing, system instance

中文摘要：

大数据计算主要有批量计算和流式计算两种形态，目前，关于大数据批量计算系统的研究和讨论相对充分，而如何构建低延迟、高吞吐且持续可靠运行的大数据流式计算系统是当前亟待解决的问题且研究成果和实践经验相对较少．总结了典型应用领域中流式大数据所呈现出的实时性、易失性、突发性、无序性、无限性等特征，给出了理想的大数据流式计算系统在系统结构、数据传输、应用接口、高可用技术等方面应该具有的关键技术特征，论述并对比了已有的大数据流式计算系统的典型实例最后阐述了大数据流式计算系统在可伸缩性、系统容错、状态一致性、负载均衡、数据吞吐量等方面所面临的技术挑战．

英文摘要：

Batch computing and stream computing are two important forms of big data computing. The research and discussions on batch computing in big data environment are comparatively sufficient. But how to efficiently deal with stream computing to meet many requirements, such as low latency, high throughput and continuously reliable running, and how to build efficient stream big data computing systems, are great challenges in the big data computing research. This paper provides a research of the data computing architecture and the key issues in stream computing in big data environments. Firstly, the research gives a brief summary of three application scenarios of stream computing in business intelligence, marketing and public service. It also shows distinctive features of the stream computing in big data environment, such as real time, volatility, burstiness, irregularity and infinity. A well-designed stream computing system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on. Subsequently, the research offers detailed analyses and comparisons of five typical and open-source stream computing systems in big data environment. Finally, the research specifically addresses some new challenges of the stream big data systems, such as scalability, fault tolerance, consistency, load balancing and throughput.

同期刊论文项目

大规模存储系统性能测试方法与技术研究

期刊论文 25 会议论文 1

高可靠易扩展的固态盘阵列关键技术研究

期刊论文 7

同项目期刊论文

IOmark：一种精确的存储系统性能测试工具

一种Linux环境下优化ZFS同步写性能的方法

从系统角度审视大数据计算

从系统角度审视大图计算

网络大数据的文本内容分析

Design and evaluation of a new approach to RAID-0 scaling

Rethinking RAID-5 Data Layout for Better Scalability

Accelerate RDP RAID-6 Scaling by Reducing Disk I/Os and XOR Operations

Redistribute Data to Regain Load Balance during RAID-4 Scaling

一种精确的存储系统性能测试工具

Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environme

Understanding data flow graph for improving big data stream computing environments

Boafft: Distributed Deduplication for Big Data Storage in the Cloud

Characterizing and optimizing TPC-C workloads on large-scale systems using SSD arrays

LOCA: A Low-overhead Caching Algorithm for Flash-based SSDs

CaCo: An Efficient Cauchy Coding Approach for Cloud Storage Systems

Optimizing Data Stream Graph for Big Data Stream Computing in Cloud Datacenter Environments

一种Linux环境下ZFS同步写性能优化方法

AIP: a tool for flexible and transparent data management

分级存储系统中一种数据自动迁移方法

DMStone:一个分级存储系统性能测试工具

IOmark：一种精确的存储系统性能测试工具

一种Linux环境下优化ZFS同步写性能的方法

从系统角度审视大数据计算

从系统角度审视大图计算

网络大数据的文本内容分析

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609