实体属性值抽取是信息抽取的重要组成部分.针对数量型属性类型多样以及取值易变的问题,设计实现了一种基于元性质的数量型属性值自动抽取系统.对系统的结构、功能框架以及相关核心技术,包括提取文本的选择、候选值的提取及评估、结果的自动验证等进行了详细讨论.通过对百度百科的五大类9个子类实体数量型属性值的抽取,平均准确率和召回率分别达到71%和89%,高于基于简单搜索的方法和传统的基于词汇-句模的方法.该方法适用于开放领域的数量型属性值获取,易于获取单值属性的精确取值.
Attribute value extraction is an important task of information extraction.However,the heterogeneous attributes and the natural language processing bottleneck make this problem more difficult and complex.In addition,most quantity attributes are single-valued and variable,thus it's difficult to find out the accurate value of those attributes.Most research works are based on semi-supervision methods or lexico-syntactic patterns,however these methods overlook the properties of quantity attributes and require much effort to ensure the reliability of extraction results.In this paper,the definition of meta-property is given to avoid these drawbacks,and a novel approach to attribute-value extraction based on meta-property is proposed to avoid the drawback of traditional methods.The system is implemented and the overall structure and major components of the system are presented,including textual information source selection,candidate extraction,candidate evaluation and automatic verification.Experiments are carried out on 5 kinds of entity types and their 9 subtypes from Baidu encyclopedia.Experimental results show that the new approach achieves an average precision up to 71% and an average recall of 89%,significantly higher than general query-based approaches and traditional lexico-syntactic pattern based methods.The new approach has a better generalization capability on open domain attribute-value extraction,especially on the singled-valued attribute.