中文句法结构复杂,特征维数较高,目前已知最好的汉语句法分析效果与其他西方语言相比还有一定的差距.为进一步提高中文句法分析的效率和精度,该文提出一种采用二阶范数软间隔优化的结构化支持向量机(Structural Support Vector Machines,Structural SVMs)方法对基于短语结构的中文句法进行分析,通过构造结构化特征函数(ψ)(x,y),体现句法树的输入信息,并根据中文句子本身具有的强相关性,在所构造的(ψ)(x,y)中增加中文句法分析树中父节点的信息,使(ψ)(x,y)包含了更加丰富的结构信息.在宾州中文树库PCTB上的实验结果表明,该文方法与经典结构化支持向量机方法以及Berkeley Parser相比可取得较好的效果.
Chinese syntax has complex structure and high dimension features, and the best known Chinese parsing performance is still inferior to that of other western languages. In order to improve the efficiency and accuracy of Chinese parsing, we propose a L2-norm soft margin optimization structural support vector machines (structural SVMs) approach. By constructing the structural function ~(x,y), the input information of syntactic tree can be mapped well. Since Chinese syntax has a strong correlation, we use father node of phrase structure trees to enrich the structure information of ~(x,y). The experiment results on the benchmark dataset of PCTB demonstrate that the proposed approach is effective and efficient compared with classical Structural SVMs and Berkeley Parser system.