针对某政府网站某一时间段的服务器日志中抽取出的搜索引擎查询信息,提出了一系列规则来遴选出有代表性的核心查询词,并分别针对每个核心查询词进行共现与可视化聚类分析,创建基于共现频率的相似矩阵,采用非计量MDS算法导出三维可视化聚类图,并且采用基于瓦兹算法(Wards method)的层次聚类法验证了MDS算法三维可视化聚类结果的正确性、有效性与优越性.同时,我们针对日志的特点开发了适合本研究的一系列分析工具,从而能够帮助我们对同类网站、不同结构的日志信息进行挖掘、提取、选择和加工,并利用统计分析工具对加工结果进行可视化聚类分析和比较研究.实验结果表明,本分析方法充分发挥了MDS分析方法与各种向量空间聚类计算优点,能更好地观察对象间的聚类样式、形状以及距离,能够为构建基于主题图的政府电子政务平台优化研究提供理论方法和实证依据.
Aiming at the extraction and selection of a particular section of log file from a particular e-Government website server, we get the search engine query keywords, we have presented a series of methods to generate the core- searching words form the log by analyzing the co-occurrences matrix of these queries. We a/so developed a series of application tools for extraction, selection and processing the queries in order to make it effective. We use these multidimensional visualization results to compare with the hierarchical clustering results of Ward' s method, the result shows our results are correct and effective. It proved that the result can give scope to its advantages for clustering calculation in vector space with customizing form, shape and distance. Also, it provides theoretical and experimental support for our research on e-Government website optimization with Topic maps.