针对传统的中文文本特征提取算法存在的语义丢失和语义缺乏问题,设计了融合领域本体的中文文本语义特征提取算法。该算法利用基于种子一扩展机制的关键词识别与提取算法解决传统算法中利用分词工具进行关键词提取所产生的语义丢失问题;利用基于领域本体的文本概念特征语义映射与聚合算法解决传统算法中利用向量空间模型进行文本表示所产生的高维和语义缺乏问题。实验结果表明,该算法取得了很好的预期效果,能够显著提高文本特征提取的深度和准确性。
As for the semantic loss and semantic shortage problems in the traditional Chinese text semantic feature extraction algorithm, this paper designs a Chinese text semantic feature extraction algorithm in combination with domain ontology. The algo rithm utilizes the keyword identification and extraction algorithm based on seedexpansion mechanism to solve the semantic loss prob lem in the traditional algorithm which utilizes the word segmentation tool to extract the keywords, and utilizes the text concept feature semantic mapping and aggregation algorithm based on domain ontology to solve the high dimension and semantic shortage problem in the traditional algorithm which utilizes the vector space model to represent the text. The experiment results show that this algorithm has achieved a very good anticipated effect and can improve the depth and accuracy of the text feature extraction dramatically.