卵巢癌是一种常见的妇科肿瘤,死亡率占各类妇科肿瘤的首位。选取既有较高的分类疾病模式能力又具有生物学关联的特征肿瘤标志物用于肿瘤的诊断是目前研究的重点。本研究针对卵巢癌磷脂代谢物数据的问题,提出了一种融合有监督奇异值分解和基于信息增益的随机森林决策的方法用于特征标志物的选择。首先应用有监督奇异值分解计算各标志物的权重值,并根据权重值粗选出候选标志物;其次应用基于信息增益的随机森林决策理论从候选标志物中选出特征标志物;最后通过SVM分类器测试,分类率高达90%以上。本研究方法与其他常用方法比较具有一定优势,其中一个明显的特点是所选特征标志物不但保持了较高的分类率,而且具有生物学关联意义,从而证实本研究方法具有较高的可行性和实用性。
Ovarian cancer is a common gynecological tumor with highest death rate.Selection of biomarkers,having not only higher classification capacity of disease patterns but the biological significance,is the focus of the current studies.Based on the phospholipid metabolites data of ovarian cancer,a new method integrating supervised singular value decomposition and the information gain based random forest was proposed to select biomarkers.First,a supervised singular value decomposition analysis was taken to calculate the weight of biomarkers.A set of candidate biomarkers was then selected according to the weight values.Secondly,the random forest decision theory based on information gain was applied to select biomarkers from candidate biomarkers.A SVM-based classification test indicated that the classification rate could be as high as 90%,superior or comparable to most of existing methods.A distinctive advantage of this proposed method is that the identified biomarkers are proved to be of biological significance.