分类是空间数据分析中一个非常重要的问题,采用贝叶斯网络进行分类,能够充分利用现有知识,实现对目标更精确的分类。随着实际可应用在贝叶斯网络学习中的数据样本量越来越大,贝叶斯分类器在结果更加准确的同时,其结构学习、参数学习、分类推断等每一个步骤的处理时间也会变得漫长,亟须将并行计算引人到贝叶斯网络的学习与分类预测中。该研究研发了一种海量空间数据的并行贝叶斯分类器,通过对矢量数据序列化、按空间拓扑关系分块、扩展基于MPI的并行原语等一系列设计,解决了其并行计算中不同节点矢量数据传输、负载均衡、异步10等方面的问题。实验结果表明,并行贝叶斯分类器在保证结果一致的前提下大幅缩短了贝叶斯分类器学习与分类预测所需要的时间。
Classification is a very important problem in spatial data analysis. When Bayesian network is introduced to classifica- tion, we can make full use of existing knowledge, in order to get more accurate classification. With sample data that can be used in learning Bayesian network increasing, the classification results will become more precise. However, corresponding processing time of each step in Bayesian classifier, including structure learning, parameter learning, classification inference, will be extreme- ly long. Thus parallel computing was urgently introduced into the Bayesian network learning and classification prediction. In this paper, a parallel Bayesian classifier with mass spatial data is put forward. Also vector data serialization, spatial partition based on topology relationship, expanding MPI parallel primitives and other methods have been used to solve the spatial data transmission between different nodes, load balancing, asynchronous I/O and other problems. The experimental result shows that the parallel Bayesian classifier substantially shortens the time of spatial classification under the premise of consistent.