当前科技论文抄袭比较严重,但针对科技论文抄袭的自动检测的研究还不够。科技论文的抄袭检测是重复的表示形式之一,可按照改动的程度分为全文抄袭、章节抄袭、段落抄袭、句子抄袭、同义词替换抄袭、思想抄袭等几种表现形式。本文针对剽窃全部或全部原文,并加以删改或段落移动的情况,首先采用基于bootstrapping算法扩展科技论文的主题词,根据主题词的交集划分重复检测候选组;然后提出基于滑动窗口的加权相似度算法,并通过相似曲线图较为直观的表现计算结果,取得了较好的研究效果。
Nowadays,scientific paper duplication is serious,but there are few academic searches on automatically checking it.Copy detection is one form of duplication that can be divided into several forms,such as full text copying,sections copying, paragraphs copying,sentence copying,synonyms replacement copying,idea copying.This paper makes an attempt at to detect full text copying,paragraphs movement or synonyms replacement copying,but not idea copying.First this thesis groups scientific papers by topic words which are enlarged by bootstrapping,then brings forward the method to calculate similarity by weight based on gliding bezel which is divided by chapters and adopts similarity curve graphs that is relatively intuitive to represent the calculated results,which comes out a good research effect.