针对互联网上旅游信息量飞速增长导致旅游者难以从海量大数据中检索出有效信息的问题,该文为帮助旅游者从众多Web旅游网站中高效地获取想要的地理信息,基于开源的Nutch搜索引擎框架,结合旅游领域与地理信息的特性,对原有搜索引擎框架的索引及搜索方法模块进行改造,通过设计基于词典的双向最大匹配模型对地理信息进行分词,并集成到Nutch搜索引擎框架的分词模块中,构建出面向旅游领域地理信息的垂直搜索引擎。最后,利用GIS技术,设计并实现了一个旅游地理信息垂直搜索系统,对旅游地理信息搜索服务进行了验证。
In recent years,with the rapid growth of tourism information on the Internet,it is difficult for the tourist to retrieve information from the mass of big data.To help tourists get the information from Web travel sites efficiently,this paper based on Nutch search engine framework,combined the characteristics of tourism and geographical information,transformed the index and search method module,designed a two-way maximum matching model based on dictionary for geographic information word segmentation,and integrated into the Nutch search engine,built a geographic information vertical search engine.Finally,using the GIS,it implemented a tourism geographic information vertical search system,and validated the search services for tourism geography information.