随着数据量的爆炸性增长、数据结构的多样化和数据的流动性,传统的关系数据库系统已经无法满足大数据管理和分析的要求。因此有必要对基于大数据的数据管理和分析系统进行研究,以达到快速地统计和分析特定领域中海量结构化/非结构化数据,最终为决策提供支持的目的。提出一种基于Hadoop和Mahout的大数据管理分析系统。通过数据特性的分析,将数据分解后存入对应的数据库中进行管理。并在特定的应用领域中实现和验证了所提出的大数据管理分析系统,获得了优于已报道相关研究工作的数据分析结果。
Along with the explosive increase of data volume, diversified data structures and mobility of data, traditional rational databases can no longer meet the requirements of big data management and analyses. Therefore it is necessary to study the big data-based data management and analysis system in order to achieve the goal of fast counting as statistics and analysing the massive structured and unstructured data in specific fields so as to provide support for decision making. In this paper,we propose a big data management and analysis system which is based on Hadoop and Mahout. Through analysing data characteristics,the data are decomposed and stored in corresponding databases for management. The proposed big data management and analysis system is implemented and verified in specific application field, and reaches the data analyses results which are better than those to be done by the reported related research works.