Web上的人物社会关系是一类重要的Web信息.本文提出一种轻量级的大规模人物社会关系提取方法,并引入模拟退火方法,迭代发掘网页中蕴涵的表述人物社会关系的最小描述模式集合.利用Web信息冗余性,高效准确地从Web上提取人物关系信息.为验证本文方法的有效性,定义6种人物社会关系,基于1张大规模Web人名列表,对这6种关系进行提取.实验结果表明本文方法的平均准确率为84.79%,平均召回率为81.69%.
Web information about social relations of persons is an important type of information on the Web. A lightweight method for extracting large-scale information of social relations of persons is proposed. The minimum descriptive patterns which are used to describe the social relations in web pages are mined from the web with the help of the simulated annealing method. The descriptive patterns are also used to extract more social relations of persons from the web by the redundancy of the web. Six types of social relations are defined to test the proposed method, and each type of the relations is extracted from a specified person name list, which is created from the web. The experimental result shows the average precision and recall of the proposed method are 84.79% and 81.69% respectively.