针对面向整个全基因组关联研究(genome-wide association studies,GWAS)未覆盖基因组区的数据集成问题,提出基于自训练的半监督机器学习实现的语意映射技术应用于该研究领域的方法。研究结果表明:该方法能有效实现对整个GWAS未覆盖基因组区的自动的语意映射,精度达到94.2%,召回率达到97.5%,能有效降低对人类专家的依赖程度,实现对整个GWAS未覆盖基因组区数据的快捷有效集成。
To solve the problem of data integration on the missing genomic regions associated with genome-wide association study (GWAS), a method about the semantic mapping technique was put forward and investigated based on self-training half supervision machine learning. The results show that the method can effectively deal with the automatic semantic mapping for the missing genomic regions associated with whole GWAS with accuracy of 94.2% and recall rate of 97.5%, and effectively reduce the reliance on human experts. The method can quickly and effectively achieve the goal of data integration on the missing genomic regions of whole GWAS.