基于Chebyshev多项式的概要能有效估计数据库关系属性的频度分布。然而,从肘个Chebyshev系数选择最近似原始频度分布的N(N〈〈M)个系数,是NP难问题。依据贪心策略,提出了三种概要构造算法,精度最高的一个称为GreedyB。GreedyB先找出2N个绝对值最大的系数,再由贪心策略剔除多余的N个。在模拟数据序列和实际数据序列的实验数据表明,GreedyB尽管时间复杂度要高,但L1、L1、L∞等误差显著较小。
The synopses based Chebyshev polynomials can be efficiently used to estimate frequency distribution of attributes of a given database relation. However, it is a NP-hard problem to choose N coefficients most approximate to the origin frequency distribution from M (M 〉〉 N) Chebyshev polynomial coefficients. Three greedy algorithms were introduced, and the best one was named GreedyB. First, 2N largest absolute value coefficients were picked out, then, redundant N coefficients were eliminated by greedy strategy. Experimental results over synthetic and real data sequences clearly show that compared with the existing algorithms, despite GreedyB has higher time complexity, it achieves higher accuracy as measured by the L_1, L_2 and L_inf error metrics.