东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

针对高速数据流的大规模数据实时处理方法

ISSN号：0254-4164
期刊名称：计算机学报
时间：2012.3.3
页码：477-490
分类：TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所,北京100190, [2]中国科学院研究生院,北京100190, [3]北方工业大学信息工程学院,北京100144
相关基金：国家自然科学基金（60903137 61003294）资助
相关项目：适于分散式应用集成的业务服务对象建模与即时协作研究

关键词：数据流处理, 大规模数据处理, MAPREDUCE, 物联网, 大数据, 云计算, data stream processing, large scale data processing, MapReduce, Internet of Things, big data, cloud computing

中文摘要：

以实时传感数据和历史感知数据为基础的各类计算需求逐渐成为当前物联网应用建设中的关键,如何实现基于高速数据流和大规模历史数据的实时计算成为数据处理领域的新挑战.现有批处理方式的MapReduce大规模数据处理技术难以满足此类计算的实时要求.文中结合城市车辆数据的实时采集与处理应用,在理论和实践分析的基础上,提出了一种针对高速数据流的大规模数据实时处理方法,并对方法中的本地阶段化流水线、中间结果缓存等关键技术瓶颈进行了改进.其中,根据系统参数控制阶段化流水线,使CPU得到了充分、有效利用;通过改造内外存数据结构、读写策略和替换算法,优化了本地中间结果的高并发读写性能.实验表明,上述方法可以显著提升大规模历史数据上数据流处理的实时性和可伸缩性.

英文摘要：

With the development of Internet of Things,the computing based on real-time and historical sensor data becomes the key point to the IoT applications,and how to support the real-time processing for high speed data stream over large scale data brings a new challenge.However,the existing large scale data processing technology based on the MapReduce model is designed for batch processing and cannot satisfy the real-time requirement.Based on the theory and practice analysis,this paper proposes a method for large scale data processing under high speed data stream,and improves the technical bottlenecks such as local staged pipeline and intermediate result storage.We tune the configuration of staged pipeline dynamically using system information to efficiently utilize CPU,and design the data structure,read/write operation strategy and replacement algorithm to optimize the high concurrency access performance of local intermediate results.The experiment shows that this method can improve real-time performance and scalability of data stream processing over large scale history data.

同期刊论文项目

适于分散式应用集成的业务服务对象建模与即时协作研究

期刊论文 17 会议论文 5 专利 3

基于在线管控的互联网聚合型应用的可用性问题研究

期刊论文 13 会议论文 6 专利 1 著作 1

同项目期刊论文

采用元组聚类的增量式数据分区方法

基于数字预案的应急处置流程构造方法

一种适用于跨安全管理域的分布式服务调用方法

基于时间的感知数据统计优化方法

基于SaaS模式的科技信息资源托管系统

基于缓存的复合数据服务更新优化方法

一种服务网络有序状态分析方法

关系数据库的关键词查询性能优化

一种流数据实时写入保障下的数据查询方法

一种基于数据服务超链进行情景数据集成的方法

基于时间的感知数据优化统计方法

面向大规模感知数据的实时数据流处理方法及关键技术

一种SaaS模式下的服务社区模型及其在全国科技信息服务网中的应用

A model-driven approach for monitoring in service cloud

An Approach to Deploying SOA in Technological Information Integration - A Case Study

Graphical-based data placement algorithm for cloud workflow

基于无共享架构的海量感知数据实时处理系统

一种利用业务服务抽象提升服务可用性的方法

时间滑动窗口上数据流极值聚集的空间优化

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433