面对互联网上占据全国页面总数50%以上的动态页面,当前网络舆情管控工作中的信息采集环节对以动态页面为主要发布形态的互联网媒体无法实现信息获取。鉴于此,文中提出了基于Rhino实现JavaScript动态页面解析的整体方案。实验结果表明该方案充分丰富了互联网舆情管控工作的数据源对象,是实现动态页面内超链接网络地址递归获取和网页主体内容提取行之有效的解决方案。
Dynamlc Web page holds more than 50% of the total Web pages in countywide;however,the information collector of current network public opinion monitoring system can not get the information of Internet medium which uses dynamic Web page as its main content distribution form. Thereby,there is a scheme for interpreting JavaScript dynamic Web page by using Rhino engine presented in this psper. Proved by the experiments, this scheme is an effective one for extracting the hyperlink network addresses and content of dynamic Web page and it has enriched the work data set of network public opinion monitoring.