在已有嵌套数据挖掘算法的基础上,加入了数据区域挖掘算法,根据构造出的嵌套数据列表页的标签树,找出所有的数据区域,再对数据区域进行统一处理,对所有子树应用部分树对齐算法进行匹配,生成全局模式,进而抽取出所有数据记录.与原算法相比,改进后的算法在确保准确性的基础上,有效地提高了原算法在处理多数据区域时的效率.
On the basis of the existing algorithms of the nested data, the data mining algorithm was joined. According to the tag trees of constructed nested list pages, all data regions were found and unified handled. Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm. And all the data records were extracted. Compared with the original algorithm, the efficiency was improved by using the new method, and it ensured the accuracy.