地质领域数据量大,且种类多样。实现高效的多源异构地质数据的融合与挖掘,是研究地质作用机理,展开地质考察研究的重要手段。然而目前普遍存在着数据条块分割,格式各异,难以实现有效的共享与互操作。为此,本文提出了一种基于Hadoop的地质大数据融合与挖掘技术框架,该框架包括统一的地质大数据采集与预处理方法,基于元数据索引的存储与管理平台,基于Map/Reduce的地质大数据并行化计算模式与系统,面向可重用的地质大数据挖掘服务,以及支持在线三维展示的地质大数据分析结果可视化系统。该框架具有良好的实用性与可扩展性。本文在5个节点的测试系统上实现了多元素关联性分析,并取得了相对应单台机器3部的加速可比性。
Geology studies requiremount of data in various types. It is important to implement high efficiency fusion and mining methods for the big data in geology. These methods will promote the research on the geology mechanism and geological exploration. However, the existing methods are divided by different data type, format and coverage, so it is difficult to share the methods among different datasets. Therefore, in this paper, we propose a Hadoop based geology big data fusion and mining framework. This framework contains a unified data gathering and preprocess methods, a Meta data index based storage and management platform, a Map/Reduce based data mining system and a reusability oriented geology services to support such as 3D online visualization and other applications. The proposed framework can be easily extend and the preliminary demo shows the efficiency of the proposed framework.