对互联网数据资源中特征用户准确识别,可提高互联网特征用户的安全性。进行特征用户识别时,应准确提取特征用户的有效特征,建立带识别样本中条件熵最大的模型,并通过相关条件对模型进行约束优化完成识别,但是传统方法是通过利用朴素贝叶斯模型进行用户特征识别,但是不能对特征用户的有效特征进行准确提取,也无法通过相关条件进行约束优化,降低了特征用户识别的有效性,提出一种基于最大熵的互联网数据资源中特征用户准确识别方法,分析互联网数据资源中特征用户名用字的特点.提取有效特征。利用最大熵原理构建模型对互联网数据资源中特征用户进行准确识别。并在一个40万余的中文人名语料上进行训练和测试.对比了依据不同特征组合进行用户识别的准确率,仿真结果表明.与传统的基于贝叶斯分类器方法相比,利用提出的方法进行互联网数据资源中特征用户识别时的准确率较高。
In this paper, we propose an accurate recognition method of feature users in internet data resources based on the maximum entropy. Firstly, the research analyzed diction characteristics of feature user name and extrac- ted effective features, then built a model to recognize feature user accurately using maximum entropy theory. Finally, we carried out training and tests for corpus with four hundred thousand Chinese names and compared the accuracy rates of user recognition according to different feature combination. Simulation results show that the method has higher recognition accuracy rate compared with traditional method based on Bayes classifier.