针对非结构化自由文本中关系模式比较复杂,关系抽取性能不高的问题,该文提出了利用BP神经网络的优化算法-LM算法,对非结构化自由文本信息中的领域概念实体属性关系进行抽取.首先对语料进行预处理,然后利用CRFs模型对领域概念的实例、属性和属性值进行实体识别,然后根据领域中各类关系的特点分别进行特征提取,构造BP神经网络模型,利用LM算法抽取相应关系.和适用于二分类问题的SVM相比,人工神经网络优化算法自主学习能力强,识别精度高,更适用于多分类的问题.通过几组实验表明,该方法在领域概念实体属性关系抽取方面取得了良好的效果,F值提高了12.8%.
Aimed at the problems of complex relation pattern and low relation extraction performance in the unstruc tured free text, this paper proposes an approach to extract the entity attribute relation from unstructured free text information by applying the LM optimization algorithm of BP neural network. The procedure consists ofthe corpus preprocessing, the named entity recognition (including the instance, attributes and attribute values) by CRFs mod- el, the BP neural network construction over the domain features, and the application of LM algorithm to extract cor- responding relations. Compared to SVM, the artificial neural network optimization algorithm is more suitable for multi classification problems with a higher recognition accuracy. Several groups of tests show that the method in this paper has achieved good effect in the field of entity attribute relation extraction with an improvments of 12.8% in term of F-score.