东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种融合句法短语的汉英统计机器翻译方法

ISSN号：1000-1220
期刊名称：《小型微型计算机系统》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：大连理工大学计算机科学与技术学院,辽宁大连116024
相关基金：国家自然科学基金资助项目（61672127）

关键词：微博摘要, 语义图优化, TF—IDF, 句子相似度, microblogssummarization, semantic graph optimization, TF-IDF, sentence similarity

中文摘要：

为从海量微博中高效地获取不同话题下的关键信息，微博观点摘要成为自然语言处理领域近期研究的热点之一。基线方法基于TF．IDF算法抽取微博句中的关键词，并据此计算微博的重要性分数，直接筛选出观点摘要；朴素改进方法在基线方法的基础上，增加了情感分类步骤，并利用微博句之间的语义距离，将摘要句候选集中语义重复、重要度较小的句子去除，生成观点摘要；基于语义图优化算法的方法在朴素改进方法的基础上，利用微博句的重要性分数及微博句之间的语义距离构建语义图结构，并通过图优化算法筛选出观点摘要。朴素改进方法在COAE2016评测任务一测试数据集上，10个话题的平均ROUGE-1值达到26．39％，平均ROUGE-2值达到0．68％，平均ROUGE-SU4值达到5．69％，且评测官方公布结果显示，该方法在9项评价指标中获得6项最佳性能。基于语义图优化算法的方法在评测样例数据集上进行了实验，结果显示，该方法比朴素改进方法在ROUGE-1,ROUGE-2,ROUGE—SU4值上分别提升了0．63％，1．51％，2．69％。

英文摘要：

To obtain key information in different topics efficiently, microblog opinion summarization has been a hot spot in natural language processing recently. The baseline method of this paper extracts keywordsusing TF-IDF algorithm, and calculate the importance scores of microblogs to filter out opinion summarization directly; the naive improved methodadded a step of sentiment classification, andremove microblogs which are of low importance and high semantic repetitionusing semantic distance between microblogs to generate opinion summarization;the method based on semantic graph optimization algorithm constructs a complete graph using importance scores and semantic distance of microblogs, and filters out the opinion summarization using graph optimization algorithm. According to the official result of evalua- tion,on the test dataset of COAE2016, the average ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value of 10topics using the naive improved methodreached 26.39%, 0.68% and 5.69% respectively, and got 6 max values out of 9 kinds of evaluation index. Besides, the results of experiments done on COAE2016 sample datasetshows that by using the method based on semantic graph optimization algorithmthe ROUGE-1 value, ROUGE-2 value and ROUGE- SU4 value increased by 0.63%, 1.51%, 2.69% respectively.

同期刊论文项目

跨语言信息检索中的机器翻译研究

期刊论文 50 会议论文 29 著作 1

基于深度学习的句子相似度计算研究

期刊论文 1

同项目期刊论文

MT-Oriented English PoS Tagging and Its Application to Noun Phrase Chunking

最大生成树算法和决策式算法相结合的中文依存关系解析

Implication operators on the set of V-irreducible element in the linguistic truth-valued intuitionis

基于句法结构约束的模糊限制信息范围检测

一种基于十八元语言值模糊相似矩阵的聚类方法

A Multistage Gene Normalization System Integrating Multiple Effective Methods

A two-phase Bio-NER system based on integrated classifiers and multiagent strategy

A distributed meta-learning system for Chinese entity relation extraction

Creating Chinese-English Comparable Corpora

基于迁移学习的蛋白质交互关系抽取

基于条件随机场与时间词库的中文时间表达式识别

基于组合核的蛋白质交互关系抽取

中英平行短语依存树库构建

ExtractingBiomedical Event with Dual Decomposition Integrating Word Embeddings

基于广义Jaccard系数的微博情感新词判定

Co-training for detecting hedges and their scope in biomedical texts

Hedge Scope Detection in Biomedical Texts: An Effective Dependency-Based Method

基于简单名词短语的汉语介词短语识别研究

Identification of English prepositional phrases within business domain for machine translation

基于信息熵和词频分布变化的术语抽取研究

利用句法短语改善统计机器翻译性能

An Unsupervised Graph Based Continuous Word Representation Method for BiomedicalText Mining

中医针灸领域术语自动抽取研究

Context Information and Fragments Based Cross-Domain Word Segmentation

基于条件随机场的汽车领域术语抽取

一种基于十元格蕴涵代数的知识表示方法

语言真值直觉模糊命题逻辑系统的推理规则

基于TOPSIS的语言真值直觉模糊多属性决策

利用词表示和深层神经网络抽取蛋白质关系

期刊信息

《小型微型计算机系统》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国科学院沈阳计算技术研究所
主编：林浒
地址：沈阳市浑南新区南屏东路16号
邮编：110168
邮箱：xwjxt@sict.ac.cn
电话：024-24696120 024-24696190-8870

国际标准刊号：ISSN：1000-1220
国内统一刊号：ISSN：21-1106/TP
邮发代号:8-108

获奖情况:
中国自然科学核心期刊,中国科学引文数据库来源期刊

国内外数据库收录:
俄罗斯文摘杂志,波兰哥白尼索引,荷兰文摘与引文数据库,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:23212