Performance evaluation of computational methods for splice-disrupting variants and improving the performance using the machine learning-based framework
A critical challenge in genetic diagnostics is the assessment of genetic variants associated with diseases, specifically variants that fall out with canonical splice sites, by altering alternative splicing. Several computational methods have been developed to prioritize variants effect on splicing; however, performance evaluation of these methods is hampered by the lack of large-scale benchmark datasets. In this study, we employed a splicing-region-specific strategy to evaluate the performance of prediction methods based on eight independent datasets. Under most conditions, we found that dbscSNV-ADA performed better in the exonic region, S-CAP performed better in the core donor and acceptor regions, S-CAP and SpliceAI performed better in the extended acceptor region and MMSplice performed better in identifying variants that caused exon skipping. However, it should be noted that the performances of prediction methods varied widely under different datasets and splicing regions, and none of these methods showed the best overall performance with all datasets. To address this, we developed a new method, machine learning-based classification of splice sites variants (MLCsplice), to predict variants effect on splicing based on individual methods. We demonstrated that MLCsplice achieved stable and superior prediction performance compared with any individual method. To facilitate the identification of the splicing effect of variants, we provided precomputed MLCsplice scores for all possible splice sites variants across human protein-coding genes (http://39.105.51.3:8090/MLCsplice/) . We believe that the performance of different individual methods under eight benchmark datasets will provide tentative guidance for appropriate method selection to prioritize candidate splice-disrupting variants, thereby increasing the genetic diagnostic yield.
基金:
National Key R&D Program of China [2017YFC0909400]; Nature Science Foundation of China [81630010, 91839302, 81700413, 81873506]; Shanghai Municipal Science and Technology Major Project [2017SHZDZX01]; Fundamental Research Funds for the Central Universities [2015ZDTD044]
语种:
外文
被引次数:
WOS:
PubmedID:
中科院(CAS)分区:
出版当年[2021]版:
大类|2 区生物学
小类|2 区生化研究方法2 区数学与计算生物学
最新[2025]版:
大类|2 区生物学
小类|1 区数学与计算生物学2 区生化研究方法
JCR分区:
出版当年[2020]版:
Q1BIOCHEMICAL RESEARCH METHODSQ1MATHEMATICAL & COMPUTATIONAL BIOLOGY
最新[2023]版:
Q1BIOCHEMICAL RESEARCH METHODSQ1MATHEMATICAL & COMPUTATIONAL BIOLOGY
第一作者单位:[1]Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Internal Med,Div Cardiol, Wuhan 430030, Peoples R China[2]Hubei Key Lab Genet & Mol Mech Cardiol Disorders, Wuhan 430030, Peoples R China
通讯作者:
通讯机构:[1]Huazhong Univ Sci & Technol, Tongji Hosp, Tongji Med Coll, Dept Internal Med,Div Cardiol, Wuhan 430030, Peoples R China[2]Hubei Key Lab Genet & Mol Mech Cardiol Disorders, Wuhan 430030, Peoples R China[3]Genet Diagnost Ctr, Wuhan, Peoples R China[4]Internal Med Dept, Wuhan, Peoples R China
推荐引用方式(GB/T 7714):
Liu Hao,Dai Jiaqi,Li Ke,et al.Performance evaluation of computational methods for splice-disrupting variants and improving the performance using the machine learning-based framework[J].BRIEFINGS IN BIOINFORMATICS.2022,23(5):doi:10.1093/bib/bbac334.
APA:
Liu, Hao,Dai, Jiaqi,Li, Ke,Sun, Yang,Wei, Haoran...&Wang, Dao Wen.(2022).Performance evaluation of computational methods for splice-disrupting variants and improving the performance using the machine learning-based framework.BRIEFINGS IN BIOINFORMATICS,23,(5)
MLA:
Liu, Hao,et al."Performance evaluation of computational methods for splice-disrupting variants and improving the performance using the machine learning-based framework".BRIEFINGS IN BIOINFORMATICS 23..5(2022)