东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

正定矩阵支持向量机正则化路径算法

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2013.11.1
页码：2253-2261
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]天津大学计算机科学与技术学院,天津300072, [2]东北石油大学计算机与信息技术学院,黑龙江大庆163318, [3]北京当当网信息技术有限公司内部系统开发部,北京100028
相关基金：国家自然科学基金项目（61170019）;天津市自然科学基金项目（11JCYBJC00700）
相关项目：机器学习核方法模型选择与组合的核矩阵近似分析方法

关键词：支持向量机, 正则化路径, 活动集, 正定矩阵, CHOLESKY分解, support vector machine （SVM）, regularization path, active set, positive definite matrix, Cholesky decomposition

中文摘要：

正则化路径算法是数值求解支持向量机（supportvectormachine，SVM）分类问题的有效方法，它可在相当于一次SVM求解的时间复杂度内得到所有的正则化参数及对应SVM的解．现有的SVM正则化路径算法或者不能处理具有重复数据、近似数据或线性相关数据，或者计算开销较大．针对这些问题，应用正定矩阵方程组求解方法来求解SVM正则化路径，提出正定矩阵SVM正则化路径算法（positivedefiniteSVMpath，PDSVMP）．PDSVMP算法将迭代方程组的系数矩阵转换为正定矩阵，并采用Cholesky分解方法求解路径上各拐点处Lagrange乘子增量向量；与已有算法中直接求解正则化参数不同，该算法根据活动集变化情况确定参数增量，并在此基础上计算正则化参数，这样保证了理论正确性和数值稳定性，并可降低计算复杂性．实例数据集及标准数据集上的实验表明，PDSVMP算法可正确处理包含重复数据、近似数据或线性相关数据的数据集，并具有较高的计算效率．

英文摘要：

The regularization path algorithm is an efficient method for numerical solution to the support vector machine （SVM） classification problem, which can fit the entire path of SVM solutions for every value of the regularization parameter, with essentially the same computational cost as fitting one SVM model. Existing SVM regularization path algorithms can neither deal with the datasets having duplicate data points, nearly duplicate points, or points that are linearly dependent efficiently, nor have efficient numerical solution. To address these issues, an improved regularization path algorithm via positive definite matrix positive definite SVM path （PDSVMP） is proposed in this paper, which provides the accurate path of SVM solutions. The coefficient matrix of the system of iteration equations is transformed into a positive definite matrix, then the Lagrange multiplier increment vector is computed by Cholesky decomposition, and the increment of regularizatio~ parameter is derived according to the changes of the active set, which is used to compute the regularization parameter on each inflection point. Such treatment is able to guarantee the theoretical correctness and numerical stability, and reduce the computational complexity. Experimental results on instance dataset and benchmark datasets show that the PDSVMP algorithm can effectively and efficiently handle datasets having duplicate data points, nearly duplicate points, or points that are linearly dependent.

同期刊论文项目