Approaching the limits: scoring functions for affinity prediction
- Christoph Sotriffer1
© Christoph; licensee BioMed Central Ltd. 2010
Published: 04 May 2010
One of the main tasks of computational methods in ligand design and lead identification is the elucidation and assessment of interaction modes between small-molecule ligands and protein target structures. This generally requires to estimate the relative or absolute affinity of a protein-ligand complex from its three-dimensional coordinates. Although statistical thermodynamics would in principle provide the necessary equations to calculate free energies of binding from molecular properties, these equations are not readily amenable to computation since appropriate ensembles of the solvated systems must be generated and thoroughly sampled, which normally requires prohibitively long computing times. For the purpose of drug design, and virtual screening in particular, simpler and faster methods are needed, which are commonly referred to as scoring functions.
In general, three major classes of scoring functions can be distinguished: force-field based methods, knowledge-based potentials, and empirical scoring functions. For any of these classes, many different functions have been developed over the past years. Although large-scale and truly unbiased comparative assessments of their performance are relatively rare due to inherent difficulties in setting up appropriate test sets, the strengths and limitations of current scoring functions are fairly evident from the available data. In general, good results can be obtained in the identification or reproduction of experimentally observed binding modes. The ability to distinguish active ligands from decoys appears, at least, to be sufficient to make virtual screening a practically useful endeavour. However, the correlation with the experimental binding free energy and the possibility to quantitate the effects of small structural changes on the ligand affinity are in many cases still disappointing.
Based on the approximations used by current scoring functions, strategies for improvement can be defined. On the other hand, there are some fundamental problems, also with respect to the underlying experimental data, which suggest that the best functions presently available may already be close to the limit of what can be achieved with empirical approaches.
This article is published under license to BioMed Central Ltd.