近年来,对蛋白质组学质谱数据进行模式识别成为癌症诊断的一种新方法,由此发现的新生物标记物已经成功用于多种重大疾病的早期预测。这种方法的两个难点是:如何提取能够明显区分不同类别的特征,如何有效处理谱数据中大量的特征。本文提出基于多元图形特征融合的方法对蛋白质组学质谱高维数据进行可视化降维处理。在对质谱数据进行必要的预处理后,选择部分原始特征并将其映射到多元图表示域。通过多层递阶图形特征选择与提取得到最终的多元图癌症诊断模板。采用国际公开卵巢癌高通量数据集进行验证,得到了较好的分类效果。
Protein mass spectra pattern recognition has recentoy emerged as a new method for cancer diagnosis. Applicationofproteomic mass spectra coupled with pattern classification techniques to discover novel biomarkers has been successfully used for the predictive diagnoses of several cancer diseases. However, the extraction of good features that can represent the identities of different classes plays the frontal critical factor for effective classification. In addition, another major problem is how to effectively handle a large number of features. In this paper, a method based on graphical multivariate feature fusion is proposed and used to offer a visual representation of high dimensional data. The graphical processing method relies on using a multilayered structure of feature fusion which produces as output of the lower dimensional representation. Feature fusion is implemented by combining method of feature selection and feature extraction. The proposed methodology was tested using public MS-based cancer datasets and the results are promising.