文章旨在构建中文专利数据清洗框架。采用文献分析法探究数据清洗概念框架,根据实地调查法进行专利数据清洗的需求分析,进而设计得到针对专利地址信息相关的中文专利数据清洗框架,针对专利地址信息处理提出相关算法,并利用对照法对该框架进行验证优化。采用UML建模技术构建专利清洗系统模型,实现中文专利数据清洗系统以验证本文专利数据清洗框架的有效性。
This paper aims to build a cleaning framework of Chinese patent data. The general framework of data cleaning is obtained by literature analysis method and the specific demands of the patent data cleaning are obtained by field survey method. Based on these,a patent data cleaning framework for address processing is designed,which is verified and modified by comparing with the former data cleaning method. The algorithm about address processing is designed. Besides,the system is modeled by UML and achieved to verify the framework.