东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

使用肺癌GWAS数据进行遗传风险预测的方法和策略研究

ISSN号：1002-3674
期刊名称：《中国卫生统计》
时间：0
分类：R617[医药卫生—临床医学;医药卫生—外科学]
作者机构：[1]南京医科大学公共卫生学院生物统计学系,211166
相关基金：国家自然科学基金（81473070,81373102）

关键词：肺癌, 遗传风险得分, 支持向量机, 随机森林, 最优预测子集, 单核苷酸多态性, Lung cancer, Genetic risk score, Support vector machine, Random forest, Best predictive subset, Single nucleotide polymorphism

中文摘要：

目的探讨基于肺癌全基因组关联研究数据的遗传风险预测方法和策略。方法将肺癌GWAS数据中的南京子样本和北京子样本分别作为训练集和测试集,分别使用预测全集和最优预测子集两种策略,比较三种预测方法在不同连锁不平衡结构（LD）和初筛检验水准（α）下的预测准确度。结果 w GRS在高LD结构下,随着-log（α）增大,预测准确度呈现上升趋势;RF和SVM对LD结构不如w GRS敏感,但三种方法在低LD结构（r2〈0.2）下预测准确度优于高LD结构;w GRS方法下最优预测子集效果略优于预测全集效果,SVM下子集效果与全集近似,但略逊于全集,RF下子集效果则不如全集,且差距较大。结论基于LD结构修剪SNP位点和选择适当的初筛水准可以提高遗传风险预测准确度,此时w GRS方法预测效果优于SVM和RF。

英文摘要：

Objective To investigate the performance of three genetic risk prediction methods, weighted genetic risk score （ wGRS ）, support vector machine （ SVM ） and random forest （ RF）, applied to high dimensional data of lung cancer with two strategies. Methods This study served Nanjing and Beijing samples of GWAS data as training set and testing set respectively. We made use of the two strategies of Full predictive subset（FS） and Best predictive subset（BS） and compared the prediction ac- curacy within the three methods mentioned above with the combination of Linkage Disequilibrium （LD） and hypothesis testing levels（α）. Results Under a high LD structure, the prediction accuracy of wGRS was on the rise with the increasing -log （α）. RF and SVM were not sensitive to LD structures as wGRS, but the predictive accuracy of each method applied with a low LD structure（ r2 〈 0. 2）was mainly better than itself with a high LD structure. Moreover, the performance of B S was slightly better than, approximately equal to or tiny less than and worse than FS when the methods were respectively wGRS, SVM and RF. Con- elusion The prediction accuracy could be improved with the condition of LD-pruning and adopting a proper a-value, mean- while, wGRS was better than SVM and RF in that condition.

同期刊论文项目