提出了一个基于语义、面向自然语言处理的多文种信息处理平台的模型SMIPP。该模型主要由应用程序/用户接口层、文字输入层和文字输出层、信息处理服务层、语料库层、多文种代码体系SemaCode层和语言Ontology层组成,该平台把各种语言文字统一用具有自描述能力的SemaCode表示,并通过语言Ontology来表示词汇的语义以及在各个文种间的联系,再通过服务形式提供各种基于语料库的文字信息处理功能,是一个全新的多文种信息处理模型。
A hierarchical model ofmultilingual information processing platform based on semantic knowledge (SMIPP) and oriented to natural language processing is proposed, which consists of application/user interface layer, character input/output layer, information services layer, corpora layer, SemaCode and language Ontology layer. That model encodes all languages and their characters with a self-describing multilingual encoding schema-SemaCode and expresses the semantics of words and the relation between similar words in different languages. A set of services is provided to process character information based on corpora, so it's a novel model to process multilingual information.