随着水利信息化进程的快速发展,与水利信息相关的互联网资源不断增多,面对如此巨量和复杂形式的水利信息数据,依靠人工检索、分析的方式已难以满足行业应用的需求。随着大数据信息技术的逐步深入研究,设计与实现可应用于水利信息获取的网络爬虫,成为解决水利信息检索与分析问题的基础。设计应用主题网络爬虫技术的水利信息检索系统,通过水利主题信息爬取,数据格式转化与整理,规范化写入数据库等步骤,实现网络水利数据到格式化数据库数据的自动转化。该系统的实现为多数据源信息的交叉验证与网络应急数据的获取,提供新的思路与可行方案。
With the rapid development of water resource informatization process,the Internet data about water information is growing.Facing complexity and quantity of water information,searching and analyzing with manual work couldn’t satisfy the need of water conservancy industry.Based on the development of big data research,designing and emplying web crawler on water information has been the foundation of solution for water information search and analyzing problem.This paper designs a water information retrieval system based on focused web crawler,which could automatically transform the online water information to formatted database data by online crawling about water information,data transforming and formatting and properly writing data into database.The proposed system offers a novel and practical solution for cross-validating information from multiple data source and achieving online data for emergency usage.