目的应用Boosting算法建立模型,对卵巢癌和非卵巢癌(卵巢囊肿和子宫肌瘤)患者的尿液代谢组数据进行分析,提取出具有生物学意义的代谢组分,为卵巢癌的早期诊断及疾病机理提供线索。方法将决策树与Boosting算法相结合,对患者的临床样品代谢组数据进行分析,并对代谢组分进行逐步筛选,得到鉴别卵巢癌患者的重要代谢组分。结果由Boosting模型得到的排序靠前的10个差异代谢组分,能够将卵巢癌与对照组患者进行较好的判别分类,其ROC曲线下面积达到了0.944。结论 Boosting模型可以有效地应用于卵巢癌代谢组数据,在保证较高的分类正确率的同时可以得到对分类起作用的重要的代谢组分。
Objective Boosting model was built to analyze the metabonomics data from ovarian cancer and ovarian cyst patients urine.Some biological metabolites were also extracted from the data,which would provide some clues to the early diagnosis.Methods Boosting and decision tress were combined to analyze the metabnomics data and the important metabolites were achieved according to their importance scores.Results The top ten metabolites were extracted and the area under ROC curve was 0.944,which provided a better classification results than the original dataset.Conclusion Boosting could be effectively applied to the classification of ovarian cancer metabnomics data,important features could also be extracted at the same time.