提出一种基于聚类分析的中文客户地址自动分组方法.该方法考虑了客户地址的噪声数据,首先利用标准中国邮政编码数据中的邮政编码、省份与城市等信息来判断客户地址的有效性,然后通过字符串匹配反馈学习的方法对有效客户地址进行中文分词,将分词结果利用向量空间模型进行表示后再基于改进的混合K均值微粒群聚类方法将地址聚类,通过结合模拟退火算法来避免搜索陷入局部极小,最终根据最优微粒确定的聚类中心产生地址分组.采用真实地址数据进行对比实验的结果验证了该方法的有效性.
This paper proposes a method of automatic grouping Chinese customer addresses by clustering analysis. This method mainly consists of two algorithms: one is an algorithm of validating and segmenting the Chinese customer addresses, which employs the province and city information of standard Chinese postal code dataset and a character string matching based feedback method to a- chieve it; and the other is an improved hybrid clustering algorithm of K-means and particle swarm optimization, which integrates the simulated annealing algorithm to improve the local explorative ability of particles. It can help to solve the address-related problems in applications such as the optimization of physical distribution. The comparative experiments based on several real address sets show its effectiveness.