目前大多数东巴经典原始手稿被十多个国家的著名机构收藏,学术研究处于分散形态,沟通不便。构建东巴古籍共享平台有利于经典文化的抢救与传承。针对东巴古籍资源的数字化以及数据存储的问题,在分析现有信息抽取方法以及数据存储方式的基础上,提出了《中国少数民族古籍总目提要(纳西卷)》纸质书籍的数字化方法,并使用元数据表示从纸质书籍中抽取的东巴古籍书目,最终使用XML数据库管理数字化后的内容。实验结果表明,提出的信息抽取方法能够针对东巴古籍书目的特殊结构正确地抽取内容,并提供结构化检索手段。验证了该方法的可行性、正确性。这项研究对于少数民族古籍的数字化以及半结构化数据管理具有重要的借鉴意义。
At present, most original classic manuscripts of Dongba script have been collected by well-known institutions from more than ten countries. As academic researehersare decentralized, it is very inconvenient for them to communicate with each other. The construction of a sharing platform for ancient books of Dongba script is beneficial for emergency treatment and inheritance of classic culture. In allusion to digitalization and data storage of ancient book resources o'f Dongba script, a digitalization method is presented in this paper for printing books known as Annotated General Catalog of Ancient Books of Ethnic Minorities in China (Naxi Volume) based on the analysis of existing information extraction approaches and data storage modes. Moreover, metadata is also adopted to refer to the bibliography of ancient books of Dongba script, which are extracted from printing books. And ultimately, XML database is employed to manage the digitalized contents. According to the experimental results, the information extraction approach proposed in this paper is able to extract contents accurately direct at the elaborate structure of the bibliography for ancient books of Dongba script on one hand and provides structured retrieval means on the other hand. As a result, both feasibility and validity of such an approach areverified. This research has important reference meanings for the digitalization and semi-structured data management of ancient books of ethnic minorities.