提出了一种基于马尔可夫链的离群点检测(outlier detection algorithms based on Markov chain,MRKFOD)算法。该算法把基本数据集看作一个加权无向图,数据集中的每个数据表示一个节点,用每条加权边表示节点之间的相似度;形成一个邻接矩阵,把邻接矩阵当作马尔可夫链中的概率转移矩阵;寻求概率转移矩阵的主要特征向量;把每个节点的主要特征向量值作为每个数据的离群度。实验结果表明,该算法与其他高维离群点挖掘算法相比,在效率及有效处理的维数方面均有显著提高。
An outlier detection algorithm based on Markov chain(MRKFOD algorithm) is presented.First,the basic data set is regarded as a weighted undirected graph,in which each datum represents a node,and each weighted edge denotes the similarity between nodes;so it forms an adjacency matrix,and then the adjacency matrix is regarded as a probability transition matrix in Markov chain.Secondly,the algorithm seeks the main feature vector of the probability transition matrix.Finally,the main feature vector of each node is looked upon as the outlier degree of each datum.The experimental results show that both the efficiency of MRKFOD algorithm and the maximum number of dimensions processed are obviously improved compared with other high-dimensional outlier mining algorithms.