位置:成果数据库 > 期刊 > 期刊详情页
网页搜索结果聚类与可视化
  • ISSN号:0469-5097
  • 期刊名称:《南京大学学报:自然科学版》
  • 时间:0
  • 分类:TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]同济大学嵌入式系统与服务计算教育部重点实验室、计算机科学与技术系,上海201804
  • 相关基金:国家自然科学基金(60475019 60970061); 博士学科点专项基金(20060247039)
中文摘要:

搜索引擎成为当今在互联网上进行信息检索最常用的工具.主流搜索引擎以与用户查询的相关度排序返回搜索结果,且自然语言中存在的"一义多词"和"一词多义"现象,用户很难清楚表达他们的意图,导致往往花费较长时间从结果列表中选择所感兴趣的话题.针对这种状况,采用网页聚类技术对标题和摘要进行聚类后,并可视化地以树和图的方式向用户快速、全貌和直观地展示搜索结果,明显改善了用户搜索体验.在此基础上设计了网页聚类原型系统ECE(effective clustering engine),实验结果表明该算法具有聚类结果可读性好以及聚类准确度比较高的优点.

英文摘要:

Nowadays search engines are the most common tools for information retrieval on the internet.However,there are several limitations such as low search coverage and dynamic characteristic of web pages,it is the reason why no breakthrough made on users' searching experience recent years.The leading search engines will return a long list of records that are sorted by the correlation with the queries,the phenomena of synonymy and polysemy make users express their intention difficultly and spend much time on selecting web pages they are interested in.This paper aims at enhancing searching experience using data analysis technologies.Through clustering and visualizing web search results,then grouping the clustering results according to some criterions,it makes users locate their interested information quickly.The data structure related to suffix tree are being widely used in string processing and text compression.The clustering algorithm based on suffix tree which makes it easy to recognize the shared phrases among web pages can be used to cluster web pages,it improves the clustering efficiency as not to calculate the similarities between pair-wise documents,and assigns meaningful labels for the clustering results to enhance the readability,also improves end users' searching experience through visualization.An effective clustering engine prototype system named effective clustering engine has been built on this approach.The algorithm is quite efficient,and the clustering results are readable and accurate verified by the experiments.

同期刊论文项目
期刊论文 40 会议论文 6 获奖 6 著作 8
期刊论文 84 会议论文 21 获奖 1 著作 2
同项目期刊论文
期刊信息
  • 《南京大学学报:自然科学版》
  • 中国科技核心期刊
  • 主管单位:中华人民共和国教育部
  • 主办单位:南京大学
  • 主编:龚昌德
  • 地址:南京汉口路22号南京大学(自然科学版)编辑部
  • 邮编:210093
  • 邮箱:xbnse@netra.nju.edu.cn
  • 电话:025-83592704
  • 国际标准刊号:ISSN:0469-5097
  • 国内统一刊号:ISSN:32-1169/N
  • 邮发代号:28-25
  • 获奖情况:
  • 中国自然科学核心期刊,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:9316