针对分布式搜索引擎系统效能建模与评估问题,通过对当前分布式搜索引擎系统的建模与分类,扩展了能耗与网络开销的成本模型;对5种构建搜索引擎系统的设计方案,从系统成本、系统规模和查询响应时间等角度进行了详尽的理论分析与评价.由此发现,由广域网分布式采集系统和多机群索引系统组成的半广域网搜索引擎系统相对于其他系统具有相对较高的效能,同时能够较好地兼顾用户的服务质量.
This study extends the current productivity models for a typical Web search engine system, which consists of a Web crawling system and an indexing system. Five different design schemata are characterized according to this model and are compared through power consumption, networking cost, system scale, and query efficiency. The half-WAN scheme, which consists of a WAN-based crawling system and a multi-cluster indexing system, is proved to be the best choice for a large-scale highly-efficient Web search engine.