Computational Screening for Active Compounds Targeting Protein
Sequences: Methodology and Experimental Validation

The three-dimensional (3D) structures of most protein targets have not been determined so far, with many of them not even having
a known ligand, a truly general method to predict ligandprotein interactions in the absence of three-dimensional information
would be of great potential value in drug discovery. Using the support vector machine (SVM) approach, we constructed a model for
predicting ligandprotein interaction based only on the primary sequence of proteins and the structural features of small molecules.
The model, trained by using 15 000 ligandprotein interactions between 626 proteins and over 10 000 active compounds, was
successfully used in discovering nine novel active compounds for four pharmacologically important targets (i.e., GPR40, SIRT1, p38,
and GSK-3β). To our knowledge, this is the first example of a successful sequence-based virtual screening campaign, demonstrating
that our approach has the potential to discover, with a single model, active ligands for any protein.