电子商务应用中产生了大量用户评分数据,而这些数据中富含了用户观点和偏好信息,为了能够从这些数据中准确地推断出用户偏好,提出一种面向评分数据中用户偏好发现的隐变量模型(即含隐变量的贝叶斯网)构建和推理的方法。首先,针对评分数据的稀疏性,使用带偏置的矩阵分解(BMF)模型对其进行填补;其次,用隐变量表示用户偏好,给出了基于互信息(MI)、最大半团和期望最大化(EM)算法的隐变量模型构建方法;最后,给出了基于Gibbs采样的隐变量模型概率推理和用户偏好发现方法。实验结果表明,与协同过滤的方法相比,该方法能有效地描述评分数据中相关属性之间的依赖关系及其不确定性,从而能够更准确地推断出用户偏好。
Large amount of user rating data, involving plentiful users' opinion and preference, is produced in e-commerce applications. An construction and inference method for latent variable model ( i. e., Bayesian Network with a latent variable) oriented to user preference discovery from rating data was proposed to accurately infer user preference. First, the unobserved values in the rating data were filled by Biased Matrix Faetorization (BMF) model to address the sparseness problem of rating data. Second, latent variable was used to represent user preference, and the construction of latent variable model based on Mutual Information (MI), maximal semi-clique and Expectation Maximization (EM) was given. Finally, an Gibbs sampling based algorithm for probabilistic inference of the latent variable model and the user preference discovery was given. The experimental results demonstrate that, compared with collaborative filtering, the latent variable model is more efficient for describing the dependence relationships and the corresponding uncertainties of related attributes among rating data, which can more accurately infer the user preference.