分析了现有的几种常用编码方案的优点和缺陷,并提出了一个基于语义的多文种编码方案SemaCode及其模型。SemaCode模型分为六个层次,分别为交换传输层、字符码位层、词码位层、属性层、语义层和应用接口层。SemaCode是一种面向信息处理、可扩展的多文种编码方案,它在码位层SemaCode以字符为单位编码,并在编码中嵌入文种信息;在词码位层提出了以语义为轴心,以词为单位的编码理念;在属性层引入了一种对编码进行描述的标签机制,使得编码具有良好的可描述性和可扩展性;另外,在语义层以及其他层次提出了基于码位和描述协议的语义表示方法,并使得SemaCode成为一种具有部分可计算特性的编码方案。最后,在与Unicode对比的基础上,分析了SemaCode所具有的优势。
Firstly,this paper analyzed and discussed the advantages and disadvantages of some common encoding schemes.And then to meet the request of expressing the semantic knowledge of the characters for information processing,it put forward a multilingual encoding scheme——SemaCode and its model.SemaCode model consisted of six layers,including exchange and transmission layer,character code point layer,phrase code point layer,property layer,semantic layer and application layer.SemaCode was an extensible multilingual encoding scheme,and it introduced a new character encoding method to the character point layer,which assigned each character(not glyph) a unique code point and language information also was encoded in that code point.On the property layer,the property tags was applied to tag the characters,and consequently the SemaCode had the ability to mark the characters.Furthermore,on phrase code point layer SemaCode put forward a new encoding method that the phrases but not characters were used as the basic encoding unit,and it also realized an encoding scheme on semantic layer,which was centered on semantic knowledge and had the ability to express the semantics of phrases.Finally,in contrast with Unicode,analyzed and discussed the advantages of SemaCode.