随着互联网技术的高速发展,数据库的规模和复杂度不断增长,传统的分类方法已经不能满足复杂数据的分类需求,针对此类问题,提出了一种基于变分贝叶斯的数据分类算法。该算法在传统贝叶斯推断上引入变分近似理论,结合最大期望算法思想,利用统计物理中的平均场理论,并以混合高斯模型为例进行了实验仿真。实验结果证明,随机生成数据在经过382次迭代后,能明显看出由3组高斯模型混合而成,似然函数的下界随迭代次数增加不断上升,在350次迭代后曲线与预想一样趋于平缓,并且在误差允许的范围内得到接近真实数据的均值和逆协方差矩阵,实现其分类处理。在保证高精度的要求下计算速度更快、效率更高、更符合实际工程的应用背景。
With the rapid development of Internet technology, the size and complexity of the database are continually growing, the traditional classification method can no longer meet the demand of the classifica- tion of complex data. For this reason, a data classification algorithm based on variational Bayesian is pro- posed. This paper introduces the variational approximation theory on the basis of traditional Bayesian in- ference, combines with the thought of maximum expected algorithm, utilizes the mean field theory in the statistical physics, and simulates taking Gaussian mixture model as an example. The experimental results show that the randomly generated data are composed of the three Gaussian models mixed after 382 itera- tions, the lower bound of likelihood function rises with the increase of iteration number, the curve becomes flat as expectation after 350 iterations, and the mean value and the inverse of covariance matrix close to the real data are obtained in the range of allowable error. Under the requirement of high precision, the calculation speed is faster, calculation efficiency is higher, and all of these accord with the demands of actual engi- neering application background.