本文对XML(Extensible Markup Language)数据的模式推断问题研究现状与进展进行了阐述.首先,从正规树文法的角度介绍了不同模式语言的理论模型.进而从模式推断方法、目标模式语言、支持的表达能力、内容模型对应的正则表达式类型等多个方面对当前研究工作进行了细致的分类归纳和对比.此外,还介绍了模式语言中支持的基本语义完整性约束推断的研究进展.最后指出了当前研究中的不足,并对未来需要深入研究的方向进行了展望.重在对XML模式推断的主流方法和前沿进展进行概括、比较和分析,以期对后续研究有所助益.
This paper surveys the state of the art of schema inference from XML data. First,the formal models based on regular tree grammar for commonly used XML schema languages are presented. Then,the existing works on XML schema inference are summarized and compared from various aspects such as inference methods,target schema languages,supported expressiveness,regular expression types corresponding to the content models,and so on. In addition,inferences of some basic integrity constraints from XML data are also introduced. Finally,this paper points out the defects of current research and discusses some potential future research directions. This paper aims to offer a detail overview,comparison and analysis of the mainstream methods and recent progress in this field,expecting to be beneficial for subsequent research.