观点承我着文本的重要信息,而比较句是观点评论中一种常见的句式现象.针对中文比较句识别问题,该文提出了一种基于规则与统计相结合的方法并进行实验.该方法先对语料及其分词结果进行规范化处理,再通过基于比较特征词词典与句法结构模板、依存关系相结合的方法进行泛提取.然后设计一种CSR规则提取算法,并利用CRF挖掘实体对象信息及语义角色信息.最后利用SVM分类器,选取不同特征维数,找到使性能达到最优的特征形式完成精提取.
Opinions always carry important information of texts. Comparative sentence is a common way to express opinion. This paper described how to recognize comparative sentences from Chinese text documents by applying rule-based methods and statistical methods as well as analyze the performance of these methods. This method firstly normalized the corpus and its segmentation results, and then got the broad extraction results by using a lexiconbased method, sentence structure and dependent relationship analysis. Then a kind of CSR rule extraction algorithm was designed to extract the dependency relationship. The paper also used a CRF algorithm to identify entities and semantic roles. Finally, by using SVM classifier and choosing different feature dimensions the paper found the most optimum and effective features combination to finish the accurate extraction.