整合科技项目所产出成果的信息能间接反映项目的研究内容,可以弥补项目查重过程中申报书难以获取的不足,具有重要的研究意义。本文提出一种整合科技项目相关产出信息的数据模型。该模型通过整合项目产出的科技报告、学术论文和科技成果等信息,抽取其中的关键词、标题和摘要等对项目的研究内容进行准确的描述,并强化了项目负责人和承担机构等辅助信息对项目查重的重要性,从而为解决项目查重问题提供客观的数据支撑。
Information integration of research project outputs which are closely related to research contents can represent the research content of a project without the project proposal. This indirect description method is of important research value for the similar project detection. This paper proposed a data integration model of research project outputs, which precisely represented the research content of a project with keywords, titles and abstracts extracted from its published reports, papers and achievements. The information of principle investigator and research organization was also introduced and applied to reinforce the similarity calculation. This model will provide data support and lay the foundation for similar project detection.