围绕如何在浩瀚的中文网页中找到用户感兴趣的内容,提出了基于UCL(Uniform Content Loeator)的“二阶过滤法”.它将媒体空间中的信息用UCL语义格(Semantic Cases based on UCL,SCU)表示,通过语义向量空间模型(Semantic Vector Space Model,SVSM)对网页的语义矩阵进行分析计算,粗略筛选出用户感兴趣的网页;再借助精细语义逐句解读其内容,提取用户所关注的信息.根据用户的阅读行为动态了解用户的兴趣变化,建立用户兴趣的本体模型,并分析和定义了用户兴趣度的度量.实验验证了上述过滤方法的有效性,其测试结果同向量空间模型(Vector Space Model,VSM)进行了比较,性能明显优于VSM.
The work focuses on filtering users' interested contents in Chinese web pages. Two-stage filtering method based on UCL is presented. SCU is brought forward to express the information of Medium Space. SVSM is introduced to filtrate cursorily web pages, and then contents of these pages are understood by virtue of some elaborate semantic characteristics, so the web pages which users are interested in can be extracted. At the same time, the users' interested changes are tracked dynamically according to the reading actions, and the interesting ontological profile is submitted, then the measure of interestingness is analyzed and calculated. Laboratory simulations demonstrate the arithmetic feasibility and validity.