针对互联网POI(兴趣点)地址信息中广泛存在的地址要素不完整、文字表达不一致等不规范现象,提出一种顾及位置关系的网络POI地址信息标准化处理方法,首先对POI信息进行切分提取并逐层匹配地址树模型;然后基于4种位置关系从标准POI库中选出相应集合,作为丰富和修正非标准POI地址要素的候选;最后通过最小粒度地址要素的回溯,实现POI地址信息的快速标准化处理。试验表明该方法可以获得较高的准确率,尤其适用于在互联网数据环境中的POI地址信息标准化。
As points of interest(POI)on the internet,exists widely incomplete addresses and inconsistent literal expressions,a fast standardization processing method of network POIs address information based on spatial constraints was proposed.Based on the model of the extensible address expression,first of all,address information of POI was segmented and extracted.Address elements are updated by means of matching with the address tree layer by layer.Then,by defining four types of positional relations,corresponding set are selected from standard POI library as candidate for enrichment and amendment of non-standard address.At last,the fast standardized processing of POI address information was achieved with the help of backtracking address elements with minimum granularity.Experiments in this paper proved that the standardization processing of an address can be realized by means of this method with higher accuracy in order to build the address database.