Bioinformatics

Machine Learning (ML) techniques, such as Markov models, support vector machines, neural networks and, in particular, graphical models and deep architectures have been successfully used in analyzing life science problems, because of their capabilities in handling uncertainty and noise and in generalizing to unseen data.

Actually, ML approaches are increasingly being employed to problems in computational biology and bioinformatics. Novel computational techniques to analyze high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital both for understanding diseases and for future drug discovery, in the perspective of striving towards precision medicine. Precision medicine is, in fact, a groundbreaking approach to disease prevention, diagnosis and treatment, based on individual differences in genes, environment, metabolomics, proteomics, and lifestyle. This does not necessarily mean to tailor a medical treatment to a unique patient, but rather to gain the ability of classifying patients into subpopulations, according to their susceptibility to a particular disease or to their response to a specific treatment.

The Bioinformatics research at SAILab is particularly devoted to applying artificial intelligence tools to the analysis of protein data, in order to predict the appearance of transient pockets on the protein surface and to discover conformational transitions occurring at protein interfaces, which provide critical information to guide the search for druggable active sites. Moreover ML techniques are also applied to the analysis of image data, coming, for instance, from cellular cultures – for the automatic reporting of medical analyses – and NMR images of brain and muscles – for monitoring neurodegenerative diseases and revealing the presence/type of lesions.

Research on COVID19 The spike glycoprotein of COVID-19 is fundamental in the life cicle of the virus, allowing virions to attach to host cell receptors. We analyzed the structure of this protein, which is composed of three monomers, searching for concave moieties located in the monomer-monomer interface regions. The presence of some druggable pockets in these locations suggests that known drug molecules could prevent the quaternary formation of the spike protein. A virtual screening procedure allowed to identify some of the compounds in the Drug Bank as interesting ligands for this purpose.