对语料库进行语料标注是实现原始语料信息计算机可读的关键。采用XML+XMLSchema对甲骨文语料库进行结构化标注,使不同类型的数据表示成统一的格式,方便数据的交换与共享。给出了一种依据XML文档中使用的词汇集,对词汇集进行建模来约束XML文档中使用的元素和属性及其之间的结构关系和数据类型。根据定义好的XMLSchema使用XML对甲骨文信息进行结构化标注,可以准确地描述数据的结构及数据类型。
It is the key of realizing original computer-readable information to tag the corpus . Using XML + XML schema to oracle corpus tagging structured , so that different types of data into a uniform format to facilitate da- ta exchange and sharing, a Document based on the use of XML vocabulary sets, modeling of vocabulary sets to constrain the use of XML document elements, attributes, their structural relationship between and data types. According to the defined XML schema to oracle XML structured information tagging, the data structure and data types can be deseribed accuratly.