东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于k近邻图的稀有类检测算法？

ISSN号：1000-9825
期刊名称：《软件学报》
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]武汉大学计算机学院,湖北武汉430072, [2]武汉大学中南医院,湖北武汉430072, [3]武汉大学国际软件学院,湖北武汉430072
相关基金：国家自然科学基金（61502347,61272275,61202033,61070013,U1135005）;中央高校基本科研业务费专项资金（2042015kf0038）;武汉大学人才计划/引进人才科研启动经费

作者：王淞[1], 黄浩[1], 余果[2], 梁楠[1], 王黎维[3], 孙月明[1]

关键词：稀有类检测, k邻近图, 数据分布, 变化系数, 入度, rare category detection, k-nearest neighbor graph, data distribution, variation coefficient, in-degree

中文摘要：

稀有类检测的目标是为无类别标签的数据集中的每个类,特别是仅含少量数据样本的稀有类,寻找到至少一个数据样本以证明数据集中存在这些类.该技术在金融欺诈检测及网络入侵检测等现实问题中具有广泛的应用场景.但是,现有的稀有类检测算法往往存在以下问题：（1）时间复杂度比较高;或（2）对原始数据集需要一定的先验知识,如数据集中各类数据样本所占比例等.提出了一种基于k邻近图的无先验快速稀有类检测算法KRED,通过利用稀有类数据样本在小范围内紧密分布所造成的与周边数据分布的不一致性来定位稀有类.为此,KRED将给定数据集转化为k邻近图,并计算图中各顶点入度和边长的变化.最后,将以上变化最大的顶点对应的数据样本作为稀有类的候选样本.实验结果表明：KRED有效提高了发现数据集中各个类的效率,明显缩短了算法运行所需时间.

英文摘要：

Rare category detection aims at finding at least one data example for each class in an unlabeled data set to prove the existence of these classes, especially the rare classes （a.k.a. rare categories） that have only a few data examples. It has various applications in the fields like financial fraud detection and network intrusion detection. Nevertheless, the existing approaches to this problem suffer either in terms of time complexity or the requirements for prior information about data sets （e.g., the proportion of data examples in each class）. In this paper, a prior-free and efficient algorithm, called KRED is proposed for rare category detection. The algorithm explores the changes on local data distribution caused by the presence of the compact clusters of rare classes. To this end, it transforms a data set into a k-nearest neighbor graph, and investigates the variations in both edge lengths and in-degrees between the nodes. Finally, nodes with the maximal variations are selected as the candidate data examples of rare classes. Experimental results show that KRED effectively improves the efficiency of discovering new classes in data sets, and notably reduces the execution time.

同期刊论文项目

跨媒体协同处理与服务的理论和应用研究

期刊论文 88 会议论文 28

基于语言特征的网络用户身份属性识别方法研究

期刊论文 2

基于平行执行的网络化软件动态建模方法和关键技术研究

期刊论文 16 会议论文 10

大数据环境下稀有类数据挖掘研究

期刊论文 2

不确定性关系数据的溯源方法研究

期刊论文 8 会议论文 2

同项目期刊论文

Adaptive iterative learning control of non-linear MIMO continuous systems with iteration - varying i

Particle swam optimization with an aging leader and challengers

Multiple Populations for Multiple Objectives : A Coevolutionary Technique for Solving Multiobjective

Single Image Super-Resolution Using Combined Total Variation Regularization by Split Bregman Iterati

On Improving Aggregate Recommendation Diversity and Novelty in Folksonomy-based Systems

Evolutionay Strategy based on Mixture Gaussian Models

A Novel Shape Matching Method Based On Feature Points

具有迭代初始误差的高相对度线性离散系统的迭代学习控制理论

基于支持向量机的不平衡数据分类的改进欠采样方法

Weighted attentional blocks for probalilistic object tracking

Discriminant Graph Based Linear Embedding

Social Sensing Enanced Time Ruler for Real-Time Bus Service

WTrack: HMM-based Walk Pattern Recognition and Indoor Pedestrian Tracking Using Phone Inertial Senso

Differential Evolution With Two-Level Parameter Adaption

Image Magnification Based on Classified Training Set

A particle swarm optimization using local stochastic search and enhancing diversity for continuous o

Robust iterative learning control with rectifying action for nonlinear discrete time-delayed systems

Improved heuristic equivalent search algorithm based on Maximal Information Coefficient for Bayesian

Differential evolution with two-level parameter adaptation.

基于贝叶斯服务依赖图的错误定位

基于随机谱梯度的在线学习

大数据下的PAC-Bayesian学习理论综述

Localization and Recognition of Dynamic DigitGestures for Smart TV Systems

EnergyConsumption Prediction based on Time-Series Models for CPU-Intensive Activitiesin the Cloud

AMapReduce Reinforced Distributed Sequential Pattern Mining Algorithm

A novel pheromone-based evolutionary algorithmfor solving degree-constrained minimum spanning tree p

Space-based initialization strategy for particle swarm optimization

MICHAC:Defect Prediction via Feature Selection based on Maximal InformationCoefficient with Hierarch

基于Dropout深度网络的两步图像标注算法

A set-based locally informed discrete particleswarm optimization

Virtual Power Meter Supported Power Consumption Prediction of Web Services

GreenOCR: An Energy-efficient Optimal Clustering Routing Protocol

Adaptive ILC algorithms of nonlinear continuous systems with non-parametric uncertainties for non-re

Queuing Theory based Efficiency Optimization of Business Process for Academic Community Cloud

Global Coupled Learning and Local Consistencies Ensuring for Sparse-based Tracking

A TV-L1 Based Nonrigid Image Registration by Coupling Parametric and Non-Parametric Transformation

改进的OWL-QN方法解稀疏logistic回归问题

基于高阶奇异值分解和均方差迭代的图像去噪

Multiple Populations for Multiple Objectives: A Coevolutionary Technique for Solving Multiobjec

Discriminative subspace learning with sparse representation view-based model for robust visual track

Adaptive ILC for tracking non-repetitive reference trajectoryof 2-D FMM under random boundary condit

Iterative learning control for lineardiscrete-time systems with high relative degree under initial s

A unified adaptive control framework ofnon-parameterized nonlinear continuous systems for repetitive

Discriminative Object Tracking via Sparse Representation and Online Dictionary Learning

AParticle Swarm Optimization Using Local Stochastic Search for ContinuousOptimization

AutomaticallyConstructing Course Dependence Graph based on Association Semantic Link Model

ANovel Source-Location Anonymity Protocol in Surveillance Systems

Image automatic annotation via multi-view deep representation

Discriminative Reverse Sparse Tracking Via Weighted Multi-task Learning

FastService Process Fragment Indexing and Ranking

ARG-based segmentation of radioactive ray image in contraband check

A total variation based nonrigid image registration by combining parametric and non-parametric trans

Iterative Learning Control for Two Dimensional Discrete Systems with Fornasini–MarchesiniModel

Learning a Coupled Linearized Method in Online Setting.

PowerConsumption Prediction of Web Services for Energy-efficient Service Selection

基于热度矩阵的微博热点话题发现

基于退火过渡采样的无向主题模型学习方法

Social Sensing Enhanced Time Ruler for Real-Time Bus Service

WTrack: HMM-based Walk PatternRecognition and Indoor Pedestrian Tracking Using Phone Inertial Sensor

基于动态描述逻辑的语义Web服务组合

Cloud service: automatic construction and evolution of software process problem-solving resource spa

A Particle Swarm Optimization using Local Stochastic Search for Continuous Optimization

A NoSQL based Cached Storage Solution of GIS Web Service

Interconnected Resource Viewpoint System, Its Developing Method and Application

不确定关系数据属性级溯源表示与概率计算

生物实验信息管理系统

对象代理数据库的虚属性查询优化方法

一种基于 k 近邻图的稀有类检测算法

Supporting Various Top-k Queries over Uncertain Datasets

一种面向团体的影响最大化方法

Attribute Level Lineage in Uncertain Data with Dependencies

基于量化情感的网店垃圾评论检测

一种面向团体的影响最大化方法

Robust iterative learning control with rectifying action for nonlinear discrete time - delayed syste

LOTMAP: Learning to Maximize Top-N Recommendation with Mean Average Precision

融合加权动态权威度和兴趣度的专家推荐方法

引入测评机制的综合学习粒子群优化算法

基于深度学习框架的隐藏主题变量图模型

Structure Learning for Weighted Networks Based on Bayesian Nonparametric Models

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609