树突细胞算法(DCA)能够在数据规模方面有效地处理大数据集。然而,在处理复杂数据集时,数据规模不是唯一需要考虑的,也要考虑高维数据问题。树突细胞算法的复杂性出现在数据预处理阶段,因此数据降维就尤其重要,以往,树突细胞算法的数据预处理是根据问题域的专家知识采用手工方法执行的,既浪费时间又是难以实现的。提出利用主成分分析法实现DCA的自动数据预处理,提取和选择相关特征使算法适应于基础数据的特点。在KDDCUP’99数据集上将PCA应用于DCA显示其可行性,并产生有用且准确的分类结果。
The Dendritic Cell algorithm(DCA)can efficiently and effectively process large datasets in terms of data size.However, data size is not the only concern when handling complex datasets, high dimensionality is often a bigger problem. Complexity occurs at the data preprocessing stage of the DCA when dimensionality reduction is required. Previously,the data pre-processing of the DCA is performed manually based on users' expert knowledge of a given problem domain,which is time consuming and sometime difficult to achieve. In this paper, automating the data pre-processing for DCA is proposed using Principal Component Analysis(PCA), which extracts and selects relevant features, and adapts the algorithm to characteristics of the underlying data. The application of PCA to the DCA in KDDCUP'99 data set shows feasibility and generates useful and accurate classification results.