Oct 21, 2020 – 11:00 – 11:45 AM
Vast majority of human genetic disorders are associated with mutations that affect protein-protein interactions by altering wild type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein-protein binding free energy to assist the development of therapeutic solutions. Currently the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effects of mutations for which there is no structure available. Here we report a Gradient Boosting Decision Tree (GBDT) machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The SAAMBE-SEQ accuracy is found to be either better or comparable to most advanced structure-based methods.