4-11 March 2020
University of Roma 3, Rome, Italy
Dr Ivan Coluzza
CIC BioMagune, San Sebastian (Donostia), Spain
Aim of the mission
This Short-Term Scientific Mission (STSM) aims at the finalization of the development of a novel coarse-graining strategy aimed at extracting protein-small molecules interactions from the LIBRA ligand binding site database, developed in Prof. Polticelli’s group, by means of statistical mechanics methods.
Summary of the Results
We derived the contact statistics between protein binding site’s residues and small molecules from high-quality crystallography data contained in the LIBRA database , an in-house built collection of bindingsites derived from the PDB. To derive a transferrable scoring function capable of predicting the binding propensity of arbitrary ligands, we identified a set of minimal fragments on which we could map the entire ligand database. The fragments are rigorously defined from a chemical viewpoint and we followed the BRICS protocol  as implemented in the chemoinformatic tool RDKit . Then, we calculated a connectivity network identifying all the amino acid acid-amino acid, amino acid-fragment and fragment-fragment contacts in the set binding pockets. We then measured the contact statistic as the frequencies of observing a species in the presence and/or in the absence of another species. At this step Pietro Corsi applied the tool he developed during this STSM, to compute the interaction
matrices derived from the contact statistics. However, even if the results were in agreement with the chemical nature of the amino acids, the signal of the interactions between the species were too low and
we decided to apply a different rationale for the normalization of the frequency matrices. With this purpose and to optimize the computing time, I started building a novel database of connectivity networks. In this database each pocket is represented by a graph in which each node is an element of the pocket (either fragments or amino acids) and the links are weighted by the lower distance between two elements. I employed two kinds of representation, a full atomistic representation and a coarse grained one considering just the C-alpha of the residues and the geometric centre of the fragments.
The main results of this collaboration are the connectivity networks representing all the intra and inter-molecular interactions of the given pockets. Furthermore, novel interaction matrices built on the connectivity networks are been computed and once they will be ready, they will be tested in a protein-ligand-binding affinity prediction task.
 D. Toti et al., Bioinformatics (2018), 34, 878-880
 J. Degen et al., ChemMedChem (2008) 3(10):1503-7
 RDKit: Open-source cheminformatics; http://www.rdkit.org