提出了一种基于弱监督学习的主页人物属性抽取方法,首先根据领域模式从个人主页中提取出人物属性的前导词,接着通过前导词获取人物属性作为初始的属性种子,在这些属性种子中提取属性的模式,并结合分类和bootstrapping方法不断迭代抽取出无前导词的人物属性。在整个抽取过程中,只需要少量的人工标注。在英文机构网站上的人物属性抽取对比实验结果表明,该方法较属性分类抽取方法在准确率上提高了7.8%,召回率上提高了7.5%。
The approach to extract person attributes based on a weakly supervised learning method was introduced.Firstly,we extract the leading words of person attributes from homepages according to domain pattern;secondly,we extract person attributes with leading word and take them as the initial seed to extract attribute patterns;finally,we combine the classification and bootstrapping methods to extract attributes without leading word iteratively.It only requires a small amount of manual annotation throughout the whole extraction process.The experiment results showed that our method improve precision of 7.8% and recall rate increased by 7.5% compared with the method of classifying attributes in the English agency websites extraction.