针对在规模庞大的数据中不能快速准确地选择用户和产品的特征以及不能准确预测用户行为偏好的问题,提出一种CUR矩阵分解方法。该方法是从原始矩阵中选取少量列构成C矩阵,选取少量行构成R矩阵,然后利用正交三角分解(QR)构造U矩阵。分解后的C矩阵和R矩阵分别是用户和产品的特征矩阵,并且C和R矩阵是由真实的数据构成的,因此能够分析出具体的用户和产品特征;为了能够比较准确地预测用户的行为偏好,改进了CUR算法,使其在矩阵恢复方面有更高的稳定性和准确性。最后在真实的数据集(Netflix数据集)上的实验表明,与传统的奇异值分解、主成分分析等矩阵分解方法相比:在特征选择方面,CUR矩阵分解方法具有较高的准确度和很好的可解释性;在矩阵恢复方面,改进的CUR矩阵分解方法具有较高的稳定性和精确度,其准确度能达到90%以上。CUR矩阵分解在推荐系统对用户的推荐方面和交通系统预测交通流量方面有重要的应用价值。
To solve the problem that users and products can not be accurately selected in large data sets, and the problem that user behavior preference can not be predicted accurately, a new method of CUR (Column Union Row) matrix decomposition was proposed. A small number of columns were selected from the original matrix to form the matrix C, and a small number of rows were selected to form the matrix R. Then, the matrix U was constructed by Orthogonal Rotation (QR) matrix decomposition. The matrixes C and R were feature matrixes of users and products respectively, which were composed of real data, and enabled to reflect the detailed characters of both users as well as products. In order to predict behavioral preferences of users accurately, the authors improved the CUR algorithm in this paper, endowing it with greater stability and accuracy in terms of matrix recovery. Lastly, the experiment based on real dataset ( Netflix dataset) indicates that, compared with traditional singular value decomposition, principal component analysis and other matrix decomposition methods, the CUR matrix decomposition algorithm has higher accuracy as well as better interpretability in terms of feature selection, as for matrix recovery, the CUR matrix decomposition also shows superior stability and accuracy, with a preciseness of over 90%. The CUR matrix decomposition has a great application value in the recommender system and traffic flow prediction.