本文针对目前互联网上科技信息需要定向追踪的需求,利用基于Web的信息采集技术,设计科技信息采集系统的系统框架,并对其中的网页分块和数据消重两个关键技术的实现进行了阐述。该系统使用简单方便,减轻了科技人员的工作量。
According to the need of science and technology information in internet need tracing, the article uses Web-based information collection technology and designs systems framework of science and technology information collection system. The paper describes how to realize the two key technologies, page segmentation and data elimination. The system is easy to use and reduces the workload of science and technology people.