针对高校网站上大量的会议稿中的人名、地名以及会议名称等重要信息,提出了一种基于规则与统计相结合的识别方法,首先根据会议稿的特点将会议稿分为2类,规则和非规则会议稿,其中规则会议稿采用编写规则的方法来抽取会议稿中的实体,非规则会议稿则通过条件随机场进行初步识别,然后,再针对条件随机场未能识别的实体进行基于规则的识别。实验结果表明,该方法的识别效果明显优于仅采用单次规则或条件随机场的处理结果。
Aiming at important information such as names of people,places and conference in large amount of conference scripts,a method combining rules and statistics is proposed. First,conference scripts are divided into two categories including regular and irregular one. For the regular one,the entity extraction is realized by the method of writing rules. For the irregular one,the entity extraction is made through Conditional Random Fields( CRFs) to get preliminary recognition result,and the unrecognized entities are recognized based on rules. Experimental results show that the effect of the proposed method is superior to those using a single rule or CRFs.