为解决产品知识文档检索过程中遇到的问题,提出一种基于混合语义模型的检索方法。该方法将传统的用户查询需求扩展为用户偏好、语境和用户查询混合而成的语义集合,并对知识文档和用户需求进行基于本体的模糊概念表达。对于知识文档,选择领域本体的叶节点来构造文本概念向量,根据概念在本体图中的深度、携带的信息量,及出现在文档与语料库中的频度来计算权重。同样采用本体表达知识语境与查询语义,建立用户偏好模型。针对检索模型的不同组成,阐述了相应的相似度计算方法,采用概念的语义距离计算用户当前语境和文档语境之间的相似度,用余弦法计算查询语义、用户偏好与文档的相似度。最后用实验验证了该方法的检索效果优于传统的向量空间方法。
An approach based on hybrid semantic model (HSM) was proposed to the solve problem raised in the retrieval process of product knowledge documentation. It expands the traditional user query to a semantic set composed of user preference, context and query, while representing the knowledge documents and user interest with an ontology based fuzzy concept. The leaves in the ontology are selected as components of the document concept vector with the weight determined by the depth of the concept in the ontology graph, the quantity of the information contained, and occurrence in the document and the whole repository. Furthermore, ontology is used to express context and query, and to construct a user preference model. Different relevancy computation methods are adopted for different retrieval models. The semantic similarity between query or user preference and documentation is computed by cosine method. The semantic similarity of context is estimated by the concept distance in the concept hierarchy. Finally, the method is shown by experimentation to be more effective than the classic vector space method.