位置:成果数据库 > 期刊 > 期刊详情页
Creating customized data services from web pages
  • ISSN号:1006-6748
  • 期刊名称:高技术通讯(英文版)
  • 时间:0
  • 页码:-
  • 分类:TP393.092[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] P315.69[天文地球—地震学;天文地球—固体地球物理学;天文地球—地球物理学]
  • 作者机构:[1]Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, P. R. China, [2]Graduate University of Chinese Academy of Sciences, Beijing 100190, P. R. China, [3]Research Center for Cloud Computing, North China University of Technology, Beijing 100041, P. R. China)
  • 相关基金:Supported by the National High Technology Research and Development Programme of China ( No. 2009AA01Z141 ), the National Natural Sci- ence Foundation of China (No. 60573117) and Beijing Natural Science Foundation (No. 4131001 ).
  • 相关项目:面向互联网资源聚合的最终用户服务抽象、运算及其性质保障问题研究
中文摘要:

To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness.

英文摘要:

To extract structured data from a web page with customized requirements, a user labels some DOM elements on the page with attribute names. The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts, and are also utilized to retrieve attribute values. To turn the attribute values into a structured result, the attribute pattern needs to be induced. For this purpose, a space-optimized suffix tree called attribute tree is built to transform the document object model (DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order. The pattern is induced bottom-up on the attribute tree, and is further used to build the structured result. Experiments are conducted and show high perform- ance of our approach in terms of precision, recall and structural correctness.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《高技术通讯:英文版》
  • 主管单位:科技部
  • 主办单位:中国科学技术信息研究所
  • 主编:冯纪春
  • 地址:北京三里河路54号2143信箱
  • 邮编:100045
  • 邮箱:hitech@istic.ac.cn
  • 电话:010-68514060 68598272
  • 国际标准刊号:ISSN:1006-6748
  • 国内统一刊号:ISSN:11-3683/N
  • 邮发代号:80-394
  • 获奖情况:
  • 国内外数据库收录:
  • 美国化学文摘(网络版),德国数学文摘,荷兰文摘与引文数据库
  • 被引量:54