针对实时垂直搜索引擎搜索对象热门度多变和数据抓取由查询驱动等问题,提出一种全新的实时垂直搜索引擎对象缓存优化策略.基于对象及属性间的关联设计热门对象预测模型,预测热门对象的变化趋势;基于用户查询及对象变化符合泊松过程的特点,推导最大化数据新鲜度的计算方法,从理论上给出资源分配和动态平衡的最优策略.大量的对比实验验证了新的缓存优化策略在较少开销增长的前提下,用户查询结果平均新鲜度和准确率均明显优于传统固定频率的缓存策略.
A new vertical search engine object cache optimization strategy was proposed to address the challenges like the changeful of popular objects,the property of query triggered data crawl and so on.A popular object prediction model was proposed based on relationships between objects and their properties in order to predict the tendency of popular object distribution.Since user query and data changed by Poisson process,a procedure to maximize the data freshness and an optimal strategy to distribute and balance resource were proposed.Experimental results show that the increase in time complexity is relative limited,while the average freshness of user query result and the query precision ratio preceded traditional fixed-rate cache strategy.