本文以2005年的1月1日至6月25日新浪网上下载的各类页面上的文本内容为研究资源集合,从中提取出有效词语,对词语的流行程度的判定属性做了定性定量的分析研究,对词语的流行特性进行了定义,在此基础上,引入衡量关注程度的量化方法,并配合依据词语判定属性与时间关系而绘制的走势曲线图,设置淘汰机制与评分机制,得到了候选流行词语,验证了流行词语判定属性规范的合理性,为机器辅助判定词语特性提供了参考数据。
This paper introduces our research on computer-aided popular words and phrases extraction. In the research we use the web pages download from SINA. COM from Jan 1^st 2005 to June 25^th 2005 as the resource, extract the valid words and phrases, analyze the determinant attributes of every word and make the definition of popularity. Based on the above achievements, we presents the method for measuring the concerning degree of word and phrase, filters and sorts the words according to the tendency curves, and finally get the candidate popular words and phrases. Through the above research, we demonstrate the rationality of the definition of determinant attributes, and give some reference data for the research on words' characteristics .