蛋白质交互关系(PPI)抽取是生物医学信息抽取领域的一个重要部分,具有很高的应用价值和实际意义。该文使用一种基于SVM的组合核方法进行蛋白质关系抽取,将基于特征的平面核和基于结构的卷积树核组合。一棵完整的句法解析树中包含了较多噪声,需对其修剪以提高PPI抽取效果。首先讨论不同的树的剪裁策略对实验结果的影响,分别使用完全树、最小完全树、最小树和最短路径闭包树进行实验,最短路径闭包树效果最好;然后在最短路径闭包树的基础上提出一种动态拓展树,该树取得了明显优于其他解析树的效果。最后基于组合核在AIMED上进行10倍交叉实验,精确率、召回率和F值分别达到了82.40%、51.30%和63.23%。
Protein Protein Interaction(PPl)extraction is important in the field of biomedical information extraction for its high application value. This paper applies the support vector machine (SVM) to extract PPI, specifically, with an ensemble kernel combined with polynomial kernel and convolution tree kernel. To address the pruning of a corn plete syntax parsing tree which contains too much noise, we discuss the influence of different pruning slrategies to the experimental results with the complete tree, minimum complete tree, the minimum tree and the shortest path enclosed tree, finding the last one to be the best choice. On the basis of the shortest path enclosed tree, we propose a dynamic extended tree with better results than other syntax parsing tree. Finally, we use the ensemble kernel to extract PPI on the AIMED corpora with 10-fold cross-validation, with the precision, recall and F-score reaching 82. 40%, 51. 30% and 63.23%, respectively.