现有基于可能世界建模的Top-k算法难以适应大数据量和键值对(Key-Value)数据模型下的不确定性Top-k查询.研究首先建立了不确定性Key-Value数据模型;随后在已有U-TopK查询语义的基础上,提出了优化的EU-TopK(Early Terminated Uncertain Top-k Query)算法,该算法优先建立以最可能的Top-k元组为树根的可能世界树形结构,并利用两种优化策略来优化算法,降低了元组访问深度,使得该算法在时间复杂度上较原算法有所改进.此外采用MapReduce实现EU-TopK算法,使EU-TopK能够适应大数据分析.最后,通过实验验证EU-TopK算法功能性,并对其查询时间、扫描深度进行评价.
The existing researches, most of which are based on possible world model, are unsuitable to big data and key-value data model. This research, first designs a new key-value data model for uncertain data; secondly proposes EU-TopK ( Early Terminated Un- certain Top-k Query ) algorithm which optimizes the original U-TopK algorithm by searching for suitable termination conditions and u- sing efficient data structure. Different from U-TopK algorithm, the EU-TopK algorithm uses the most favorable Top-k records with a relatively greater possibility as the first choice as the root of a possible world tree. Furthermore, the research designs MapReduce-based EU-TopK which can be well adopt in the Big Data analyze. The functionality, query time and scan depth of EU-TopK is proved by ex- periments.