通过采用相对路径方法结合节点内容特征进行信息定位,利用HTMLParser解析器的信息转化功能对定位的信息块进行预处理,总结出抽取规则,实现了BT种子网页信息抽取,并以此建立了一种针对BT种子信息的抽取模型。
In this paper, the target information block is correctly extracted by using relative paths in document combined the contents of nodes, which is pretreated by HTMLParser, then the extraction rules are proposed, the web information extract of BT torrent is implemented. And the model of information extraction aimed at BT torrent web is presented.