随着知识发现与数据挖掘领域数据量的不断增加,为了处理大规模数据,scaring up学习成为KDD的热点研究领域.文中提出了基于Hebb规则的分布式神经网络学习算法实现scaling up学习.为了提高学习速度,完整数据集被分割成不相交的子集并由独立的子神经网络来学习;通过对算法完整性及竞争Hebb学习的风险界的分析,采用增长和修剪策略避免分割学习降低算法的学习精度.对该算法的测试实验首先采用基准测试数据circle-in-the-square测试了其学习能力,并与SVM,ARTMAP和BP神经网络进行比较;然后采用UCI中的数据集US-Census1990测试其对大规模数据的学习性能.
In the fields of knowledge discovery and data mining the amount of data available for building classifiers or regression models is growing very fast. Therefore, there is a great need for scaling up inductive learning algorithms that are capable of handling very-large datasets and, simultaneously, being computationally efficient and scalable. In this paper a distributed neural net- work based on Hebb rule is presented to improve the speed and scalability of inductive learning. The speed is improved by doing the algorithm on disjoint subsets instead of the entire dataset. To avoid the accuracy being degraded as compared to running a single algorithm with the entire data, a growing and pruning policy is adopted, which is based on the analysis of completeness and risk bounds of competitive Hebb learning. In the experiments, the accuracy of the algorithm is tested on a small benchmark (circle-in-the-square) and compared with SVM, ARTMAP and BP neural network. The performance on the large dataset (USCensus1990Data) is evaluated on the data from UCI repository.