近年来,国内人民的生活水平在不断的提高,互联网迅速的发展起来,并且出现在人们生活的各个领域中,导致网络用户的数量大大增加。本文通过对网络用户的行为进行分析,运用信息检索的方式来对网络用户进行分类,进而分析网络用户的行为特征。采用CHI特征选择算法对特征进行提取,通过整合特征词将网络用户分类,然后采用TF-IDF算法对特征进行加权运算,分析了算法的不足并为相关的特征词分配了适当的权重,然后对这些网络的身份进行识别。最后本文进行扩展,通过特殊举例用户的网络数据,用余弦定理进行相似度比较,这样可以了解这些用户之间拥有多少相同的话题和爱好,大大增加了彼此之间成为好友的可能性。这种方法在以后也可以应用在用普通的文本搜索相似的文章中。
Recently, the people's living standard in China is constantly improving, the Internet is rapidly developed,and in all areas of people’s life, leading to the number of Internet users has greatly increased. This paper based on the analysis of the behavior of network users, using the method of information retrieval to classify some netizens, and then analyzing the behavioral characteristics of Internet users. This paper utilizes the CHI feature selection algorithm to extract the characteristics, then this paper analyze the defects of the TF-IDF algorithm and use the algorithm to carry on the weighted calculation so as to assign proper weights for these characteristic words and recognize these network identities.Finally this paper extends through the network data for netizens and compare with the cosine similarity. Thus it can understand that how much the same topics and interests among these netizens easily and increase the possibility of becoming friends greatly. This method can also be used later in the search for similar articles in plain text.