异常数据检测在基于无线传感器网络的环境监测系统中起着十分重要的作用,不仅有助于对传感器网络健康状况的监测,而且能够及时发现外部环境发生的突发事件(如森林火灾、环境污染等).通过对top-k算法的改进,提出了一种基于top-k(σ)的无线传感器网络异常数据检测算法.不同于top-k算法,该算法根据传感器节点采集到的数据分布规律,构造合适的数据网格,将多维数据归一化处理后置入相应的网络单元.然后通过增设距离阈值σ来重构PC列表(populated-cells list).除了对每个单元格及其邻域内的数据点个数分别进行排序,还计算不同数据子集之间的欧氏距离,并与阈值σ的比较,确认数据子集与正常值集合的偏离程度,从而提高检测结果的准确性.通过MATLAB仿真实验发现,距离阈值σ的选取对算法效果具有较大的影响,当σ∈[2.5,3]时,top-k(σ)算法在维持较高检测率的同时,最大程度地降低误报率.当取σ=3时,对于给定的5个数据集,top-k(σ)算法的检测率平均达到了93.70%,比top-k算法平均提高了4.94%,误报率则比top-k算法平均降低了4.48%.
Outlier detection plays an important role in wireless sensor network(WSN)application system for environment monitoring,which helps people monitor the condition of WSNs themselves,and also can detect the emergent events of the environment such as forest fire and air pollution.After improving the top-kalgorithm,a top-k(σ)outlier detection algorithm for WSNs was proposed in this paper.Different from top-kalgorithm,the proposed algorithm uses the data distribution collected by the sensor nodes to construct appropriate data grid,and puts the data sets into the grid after normalization,then sets a distance thresholdσto reconstruct the PC list(populated-cells list).This algorithm sorts the numbers of data points in each cell and those of its neighborhood respectively,as well as computes the Euclidean distance R_D between two data subsets,and compares the value of R_D withσso as toverify the degree of deviation of the subset from the normal data sets.Thus the top-k(σ)algorithm can improve the precision of the outliers detection.For given several datasets,the simulation results under MATLAB platform show that,the thresholdσhas great effect on the performance of outlier detect algorithm.Whenσ∈ [2.5,3],the top-k(σ)algorithm has higher detection accuracy and lower false positive rate.Ifσ=3,for the given five data sets,the average accuracy of outlier detection of top-k(σ)algorithm is 93.70%,which is 4.94% higher than that of top-kalgorithm,and the average false positive rate of top-k(σ)algorithm is 4.48% lower than that of top-kalgorithm.