4D Flexible AtomPairs: An efficient probabilistic conformational space comparison for ligandbased virtual screening
 Andreas Jahn^{1}Email author,
 Lars Rosenbaum^{1},
 Georg Hinselmann^{1} and
 Andreas Zell^{1}
DOI: 10.1186/17582946323
© Jahn et al; licensee Chemistry Central Ltd. 2011
Received: 29 April 2011
Accepted: 6 July 2011
Published: 6 July 2011
Abstract
Background
The performance of 3Dbased virtual screening similarity functions is affected by the applied conformations of compounds. Therefore, the results of 3D approaches are often less robust than 2D approaches. The application of 3D methods on multiple conformer data sets normally reduces this weakness, but entails a significant computational overhead. Therefore, we developed a special conformational space encoding by means of Gaussian mixture models and a similarity function that operates on these models. The application of a modelbased encoding allows an efficient comparison of the conformational space of compounds.
Results
Comparisons of our 4D flexible atompair approach with over 15 stateoftheart 2D and 3Dbased virtual screening similarity functions on the 40 data sets of the Directory of Useful Decoys show a robust performance of our approach. Even 3Dbased approaches that operate on multiple conformers yield inferior results. The 4D flexible atompair method achieves an averaged AUC value of 0.78 on the filtered Directory of Useful Decoys data sets. The best 2D and 3Dbased approaches of this study yield an AUC value of 0.74 and 0.72, respectively. As a result, the 4D flexible atompair approach achieves an average rank of 1.25 with respect to 15 other stateoftheart similarity functions and four different evaluation metrics.
Conclusions
Our 4D method yields a robust performance on 40 pharmaceutically relevant targets. The conformational space encoding enables an efficient comparison of the conformational space. Therefore, the weakness of the 3Dbased approaches on single conformations is circumvented. With over 100,000 similarity calculations on a single desktop CPU, the utilization of the 4D flexible atompair in realworld applications is feasible.
Background
Sorting and comparing molecules from chemical databases represent two of the key tasks in cheminformatics [1]. The sorting of such databases, with respect to a given set of queries (molecules) and similarity functions, is known as virtual screening (VS). The goal of VS is to enrich molecules with similar properties (e.g., biological activity) to the query molecules and to discover new chemical entities in a small fraction of the database. To ensure the desired properties (e.g., biological activity) and to evaluate the success of the VS run, it is necessary to further analyze the enriched molecules by means of biological assays. The success of a VS run consists of two different aspects. First, the enriched molecules should have similar properties as the query molecules. Second, the discovery of new chemical entities that consist of different scaffolds in comparison with the query molecules, and, therefore represent an information gain. Based on the focus on a relevant subset of the database and the possible structural information gain, VS experiments represent a fundamental approach in the drug discovery pipeline [2, 3].
In the last two decades a plethora of different similarity functions were proposed [4, 5], and the development of new functions is still an open field of research. All similarity functions can be categorized by the dimension of the applied representation of molecules. 1D similarity functions are based on molecular property counts such as molecular weight or number of hydrogen bond acceptors. 2D approaches make use of the adjacency matrix of the molecular graph, and, therefore they are also called topologicalbased approaches. MOLPRINT2D [6], substructurebased fingerprints like BCI [7] and DAYLIGHT [8] as well as the MACCS [9] keys are well known 2D similarity methods. Those topological or structural fingerprints yield promising results with respect to the enrichment of active molecules, but often lack the ability to discover new chemical entities [10]. 3D similarity functions are based on the shape [11–14] or geometrical distance information [15–17] of molecules. Information of the conformational ensembles of molecules extends the 3Dbased methods and can be seen as 4D approaches [18, 19].
Based on the keylock principle of Hermann Emil Fischer, it could be expected that the shape of molecules plays an important role for the biological activity. However, the shape of a molecule is not unique, but rather a function of internal parameters like the torsion angles. Hence, each rotatable bond represents a degree of freedom and increases the number of possible shapes (conformations) of the molecule. The resulting space, which contains all possible conformations, represents the conformational space of the molecule. Based on this increased complexity, it is not surprising that several literature studies reported a more robust VS performance of 2D methods in comparison to 3D approaches [20, 21]. Further arguments for 2D methods are their simplicity and speed [22].
In a comprehensive study, Venkatraman et al. [21] investigated the performance of different 2D and 3D methods on a wide range of pharmaceutically relevant targets. The results of the study underpin the predominant opinion that 2Dbased approaches are superior to 3D approaches with respect to the enrichment of active molecules. The performance of the 2D and 3D approaches with respect to the knowledge gain by means of the discovery of new chemical entities was not evaluated by the study. A possible reason for the inferior performance of 3D methods is the geometric information that is based on one conformation of the molecule [21]. One opportunity to improve the performance of 3D methods is to apply the 3D methods on different conformations of the molecules and use the mean or maximum similarity value. The drawback of this workaround is the quadratic increase in computation time, which scales with the number of conformations. To address this runtime issue, it is necessary to perform the similarity calculation on the complete conformational ensemble in one step in a feasible manner. These limitations of 3D approaches also affect the performance of instancebased machine learning QSAR/QSPR models. To improve the robustness of those QSAR/QSPR models, we developed a 4Dbased approach that is able to compare the conformational space of molecules within one step in feasible time [23]. The results showed that our approach produces robust models that are superior to similar 3D and 2D approaches. Given the fact that the reasons for the inferior performance of 3Dbased methods seem to be similar in both applications (VS and QSAR/QSPR), it is possible that our 4Dbased approach is also able to increase the VS performance in comparison to 2D and 3D methods.
The aim of this study is to evaluate our 4D approach as VS similarity function on a variety of literature VS benchmark data sets. Additionally, we compare the results to stateoftheart 2D and 3D approaches to assess the performance of our method. We employed VS performance metrics that measure the chemotype enrichment performance to reduce the influence of artificial enrichment. The results show a robust performance of our approach in comparison to stateoftheart 2D and 3D approaches. Therefore, our conformational space comparison is able to reduce the weakness of 3Dbased methods without the timedemanding pairwise comparison of individual conformations.
Methods
This section describes our 4D flexible atompair (4D FAP) similarity measure on the conformational space of molecules. To allow an efficient comparison of the conformational space of molecules, our approach needs a special encoding of the conformational ensembles, which can be seen as a preprocessing step. First, We describe our conformational space encoding. Afterwards, a modified Expectation Maximization (EM) algorithm will be presented that computes generative models, which represent the behavior of the molecules in their conformational space. Finally, the actual similarity calculation, which operates on the preprocessed molecules, will be explained.
Conformational Space Encoding
To ensure a fast comparison of the conformational space of molecules, it is necessary to transform the complex information of the conformational space of molecules into a representation that is suitable for the integration into fast similarity functions. Therefore, we decompose the information of the conformational space into small portions. Given a conformational sampling C_{ M } of molecule M with M heavy atoms, the encoding is based on the distance behavior of atompairs in the conformational space. Hence, the conformational space C_{ M } of molecule M is segmented into the distance behavior of atompairs.
Given the class of flexible atompairs from the heuristic above, our encoding measures the distance of each atompair and conformation of the given conformational sampling C_{ M } . This results into atompairs that have C_{ M }  distance values, where C_{ M }  represents the number of sampled conformers of molecule M. We refer to the atompairs containing the distance information in the conformational space as distance profiles.
Gaussian Mixture Models and Parameter Estimation on Distance Profiles
These two steps are repeated until a predefined convergence criterion is reached. The EM algorithm optimizes the parameters of the GMM and guarantees a local optimum solution. Therefore, it is necessary to execute the EM algorithm with different initial parameters to avoid a model from a local optimum with an inferior likelihood value.
The EM algorithm estimates the parameters of a GMM based on a predefined number of Gaussian components. A suitable number of components is crucial for a useful model. Therefore, a model selection step that determines an optimal number of components is necessary. To reduce the risk of overfitting (high number of Gaussian components), several model selection criterions, such as the Bayesian information criterion [25] or the Akaike information criterion [26], were proposed that penalize an increased number of components. This model selection step involves a significant runtime overhead and can be avoided if the number of subdistributions can be estimated. In our application, a GMM has to model the distance behavior of the corresponding atompair in the conformational space. The distance behavior of an atompair can be seen as a function of the flexibility of the shortest path in the molecular graph.
Therefore, the number of flexible bonds in the shortest path (as applied to classify the atompairs) can also be applied as a heuristic to determine the number of Gaussian components for the GMM. In an earlier study we already presented the comparable performance of the heuristic in comparison to model selection criterions [23]. This heuristic avoids the model selection step and reduces the runtime of the preprocessing step. Figure 2 presents the GMM that models the distance behavior of the atompair in Figure 1. The presented EM algorithm assumes that all samples of the data set are equally important for the final model. Transferred to our application this means that each conformation has the same influence on the final model. Based on a thermodynamic point of view, this assumption of equal influence holds if all conformations of the ensemble have the same energy. To emphasize the influence of low energy conformations on the final model, we developed an extension of the EM algorithm that integrates the importance of each sample into the optimization process. In an earlier study, our modified EM algorithm generated improved QSAR/QSPR models in comparison to models based on equally weighted GMMs [27].
Boltzmann Weighted Expectation Maximization Algorithm
4D Flexible AtomPair Similarity Function
After the preprocessing of the molecules (encoding the distance distributions by means of GMMs) the actual 4D similarity calculation can be conducted. The similarity function operates on the molecular graph (adjacency matrix) and the GMMs of the flexible atompairs. Therefore, the conformational ensemble of the molecules is not further needed.
The nodes in these prefix trees can be labeled by any arbitrary atom typing scheme. We applied a labeling function that consists of three different elements. The first element is the element symbol of the atom. A ring flag indicates the membership to a ring system. The final value is the result of the number of neighboring heavy atoms minus the number of neighboring hydrogen atoms.
Given the prefix trees of two molecules A and B, the 4D FAP computes a similarity matrix S, where each entry represents the similarity value between two atompair trees. Based on the two different subtrees, an entry S^{ ij } in the similarity matrix is the sum of two distinct similarity calculations on the subtrees. Hence, the 4D FAP utilizes different similarity functions for the subtrees.
After computation of the similarity matrix S, which contains all pairwise similarity values of the atompair trees, the 4D FAP computes a final similarity value based on the matrix S. The original 4D FAP, as applied in QSAR/QSPR studies [23, 27], sums up the entries of the S matrix and normalizes the sum to obtain a value in the range [0.0, 1.0]. Another possibility to compute a final similarity value represents the optimal assignment. This approach was introduced into the field of cheminformatics by Fröhlich et al. [30, 31] and applied as a VS similarity function in a previous study [28].
Experimental
In this section we initially characterize the applied VS benchmark data sets as well as their preparation step. Afterwards, the protocol for the conformational sampling of the molecules as well as a short description of the VS evaluation metrics follow. Finally, we present a brief overview of literature VS methods that were applied to classify the results of the 4D FAP_{OA}.
Data sets
To evaluate the 4D FAP_{OA} on a wide range of pharmaceutically relevant targets, we employed the directory of useful decoys (DUDs) release 2 [32]. These data sets were introduced as a benchmark data set compilation for the evaluation of docking algorithms [33]. Ligandbased VS, based on similarity values to a query structure, can be afflicted with an analogue enrichment bias. This bias results from the enrichment of structurally similar molecules with respect to the query structure. These similar structures represent only a limited information gain, and, therefore the results of the experiment will have an analogue enrichment bias.
DUD data sets
Filtered sets  Original sets  

target  actives  decoy  actives  decoys 
ACE  46  1796  49  1797 
AChE  99  3859  107  3891 
ADA  23  927  39  927 
ALR2  26  986  26  995 
AmpC  21  786  21  786 
AR  68  2848  79  2854 
CDK2  47  2070  72  2074 
COMT  11  468  11  468 
COX1  23  910  25  911 
COX2  212  12606  426  13289 
DHFR  190  8350  410  8366 
EGFr  365  10303  475  15996 
ER_{agonist}  63  2568  67  2570 
ER_{antagonist}  18  1058  39  1448 
FGFr1  71  3462  120  4550 
FXa  64  1633  146  5743 
GART  8  155  40  879 
GPB  52  2135  52  2947 
GR  32  2585  78  2947 
HIVPR  4  9  62  2038 
HIVRT  34  1494  43  1519 
HMGR  25  1423  35  3478 
HSP90  23  975  37  979 
InhA  57  2707  86  3266 
MR  13  636  15  634 
NA  49  1713  49  1874 
P38  137  6779  454  9140 
PARP  31  1350  35  1351 
PDE5  26  1697  88  1977 
PDGFrb  124  5603  170  5980 
PNP  25  1036  49  1036 
PPAR_{γ}  6  40  85  3117 
PR  22  920  27  1041 
RXR_{ α }  18  575  20  750 
SAHH  33  1346  33  1346 
SRC  98  5679  159  6319 
thrombin  23  1148  72  2456 
TK  22  891  22  891 
trypsin  9  718  49  1664 
VEGFr2  48  2712  88  2906 
The evaluation of similarity functions by means of the DUD data sets represents a retrospective evaluation. Analogous to the "Kubinyi paradox" [37] of QSAR models, the solely retrospective evaluation possibly implies the risk that the development of new methods or the improvement of existing approaches will increase their retrospective performance at the expense of the prospective performance. However, the DUD data sets contain over 100,000 molecules for 40 different targets. Consequently, the evaluation on all 40 data sets is based on an increased molecular diversity in comparison with the usually smaller and less diverse benchmark data sets of QSAR experiments. Therefore, the risk of an inferior prospective performance of VS similarity functions as a result of their optimization for the retrospective performance is reduced but still present.
Conformational sampling
To create the conformational ensembles of the molecules, we applied the ConfGen tool of Schrödinger [38]. Recent studies showed the ability of ConfGen to compute reasonable conformers of molecules [39, 40]. The tool provides four standard parameter schemes that sample the conformational space at different resolutions. To compute useful GMMs in the preprocessing step, it is necessary to sample the conformational space at a high resolution. Therefore, we modified the 'comprehensive' parameter scheme of ConfGen to further increase the resolution. We reduced the heavy atom rmsd for distinct conformers from 0.5 Å to 0.1 Å. This modification results in more conformers but does not increase the runtime of the conformational sampling. The energy values, which are necessary for the Boltzmann weighted GMMs, were computed by the OPLS 2005 forcefield with standard parameters.
The applied conformational sampling algorithm as well as the forcefield model have a major impact on the final results of the 4D FAP_{OA}. Different conformational sampling algorithms compute different sets of conformers, which in turn yield different atompair distance profiles. The forcefield computes an energy value for each conformer and determines the weight of each measured atompair distance. As a result, a different conformational sampling protocol will yield different GMMs of the atompairs. Hence, the computed similarity values differ and will probably change the results. However, the aim of this study is not the evaluation of the impact of different conformational sampling protocols on the 4D FAP_{OA}, but the evaluation of the 4D FAP_{OA} as a VS similarity function based on the given protocol.
Evaluation metrics
The evaluation metrics listed above represent only a small fraction of possible metrics. Other popular metrics for the early enrichment evaluation are the BEDROC [43] score or the enrichment factor. To enable future comparisons with the presented results of the 4D FAP_{OA}, we computed for each target of the filtered DUD data set a result file that contains several additional VS metrics (e.g., BEDROC score at nine predefined alpha values). Additionally, the files contain the complete ranking of the molecules that allows the computation of the VS metric of choice. The 40 result files are contained in the additional file 1 of this study.
Literature Similarity Functions
We employed a wide range of different 2D and 3D similarity approaches to assess the performance of the 4D FAP_{OA}. Due to the fact that we compare our approach to 20 other approaches, we only mention the name of the method and the applied type of information. For a comprehensive description we refer to the original publications.
Different optimal assignment approaches were already evaluated on the filtered DUD data sets in an earlier publication [28]. The best approach of this study was a twostep hierarchical assignment (2SHA) that first operates on a substructure level and afterwards on the atomic level. A second approach of that study optimally assigns the atompair (OAAP) environment trees and represents a similar 3D concept in comparison to the 4D FAP_{OA}. The optimal assignment kernel (OAK) [30, 31] and its flexibility extension, the OAK_{FLEX} [44], were also evaluated in this earlier publication.
Cheeseright et al. [45] introduced FieldScreen as a multiconformerbased VS tool. FieldScreen utilizes a database that contains conformers of each molecule. Therefore, it operates on a conformational ensemble in a similar way as the 4D FAP_{OA} and represents an interesting reference approach. FieldScreen employs four different types of locally optimized molecular field points to compute a similarity value between two given molecules.
Venkatraman et al. conducted a comparison study in which a plethora of different 2D and 3D approaches were evaluated on the original as well as the filtered DUD data sets [21]. We compared the performance of the 4D FAP_{OA} to the main results of this study. The study conducted by Venkatraman et al. employed the 2D fingerprint methods: OPENBABEL [46], DAYLIGHT [8], BCI [7], MACCS [9], and MOLPRINT2D [6]. As 3Dbased approaches they utilized ROCS [12] with two different scoring schemes ROCSS (shape only) and ROCS_{SC} (shape and chemistry). The EON [47] approach compares the electrostatic fields computed by the PoissonBoltzmann equation and was also evaluated using two different parameterizations. EON_{SE} is based on the shape and the electrostatic, whereas EON_{SCE} additionally uses chemical information.
SHAEP [14] is based on a maximum common subgraph approach that is employed to perform a superposition of the molecules. The method operates only on the shape of molecules (SHAPE_{S}) or on the shape and the electrostatic (SHAPE_{SE}). The Ultrafast Shape Recognition (USR) [17] employs four distance relations of each atom and computes the first three moments of each distribution to obtain 12 descriptor values for each molecule. ESHAPE3D is based on a heavy atom distance matrix that is employed to compute fingerprints. The ESHAPE_{HYD} alternatively uses the hydrophobic heavy atoms.
PARAFIT [13] computes a similarity value based on spherical harmonic expansions of molecular surfaces. Another important class of similarity functions are the pharmacophorebased approaches. These approaches operate on an abstract representation of the molecules by means of pharmacophore features. These features are divided into different classes (e.g., hydrogen bond acceptor, aromatic, or hydrophobic) and represent important interaction points of molecules. The distance relation between these pharmacophore features plays an important role and can be measured in a topological or geometrical manner. Therefore, the pharmacophorebased approaches can also be divided into 2D and 3Dbased approaches. Korff et al. [41] compared different structure and ligandbased VS approaches on the DUD data sets. This study contains two different pharmacophorebased methods. The topological pharmacophore point histogram (TopPPHist) computes for each pair of pharmacophore classes a distance histogram based on the topological distances. Therefore, the TopPPHist represents a 2Dbased pharmacophore approach. Finally, the distance histograms are converted into a descriptor vector. The Flexophore approach [1, 41] computes geometrical and binned distance histograms for each pharmacophore point pair based on a representative set of given conformers. The final comparison between two molecules is similar to the maximum common subgraphisomorphism because the pharmacophore points together with the distance histograms form complete graphs.
Results and Discussion
The results section is divided into four different subsections. The first subsection compares the results of the 4D FAP_{OA} approach with other optimal assignmentbased approaches that were already evaluated on the DUD data sets [28]. The second part is based on the results of Venkatraman et al. [21] and compares the average performance of the 4D FAP_{OA} with 15 stateoftheart 2D and 3D approaches. Afterwards, a comparison with the pharmacophorebased approaches of Korff et al. [41] follows. The final subsection focuses on the performance difference between 3D approaches on multiple conformers and our 4D FAP_{OA} approach.
Comparison with other Optimal Assignment Methods
The comparison with other optimal assignment methods measures the influence of the applied information type on the final performance. The OAAP represents the comparable 3D approach in comparison with the 4D FAP_{OA}, and, therefore directly measures the performance gain of the 4D extension. As an early enrichment metric we applied the awROCE_{5%}, which also assesses the chemotype enrichment performance. To reduce the bias introduced by a low number of chemotypes, we only applied data sets that have at least 15 different chemotypes. The AUC value was applied to evaluate the performance on the complete data sets.
Optimal assignment methods results
OAK  OAK_{FLEX}  2SHA  OAAP  4D FAP_{OA}  

target  awROCE_{5%}  AUC  awROCE_{5%}  AUC  awROCE_{5%}  AUC  awROCE_{5%}  AUC  awROCE_{5%}  AUC 
ACE  12.1  0.78  12.1  0.76  11.6  0.82  8.0  0.58  12.2  0.88 
AChE  3.9  0.69  4.4  0.71  5.4  0.74  4.0  0.71  7.6  0.75 
CDK2  2.6  0.57  2.6  0.47  3.5  0.50  3.5  0.55  3.5  0.77 
COX2  9.0  0.88  8.8  0.89  9.7  0.87  12.2  0.93  11.9  0.89 
EGFr  11.6  0.76  11.3  0.75  12.1  0.74  7.3  0.51  18.0  0.99 
FXa  2.1  0.43  1.1  0.51  2.6  0.59  2.1  0.58  3.2  0.64 
HIVRT  3.3  0.53  3.3  0.48  3.5  0.60  5.1  0.65  2.3  0.58 
InhA  8.6  0.54  5.7  0.53  9.4  0.63  7.0  0.57  7.8  0.66 
P38  4.3  0.43  4.0  0.44  5.0  0.75  2.9  0.45  3.1  0.68 
PDE5  2.3  0.46  1.4  0.41  2.7  0.47  1.4  0.38  3.6  0.69 
PDGFrb  4.9  0.44  4.9  0.38  4.5  0.34  8.6  0.42  4.9  0.66 
SRC  3.7  0.67  4.5  0.64  6.4  0.72  1.0  0.45  2.7  0.51 
VEGFr2  1.3  0.28  1.3  0.30  4.5  0.47  2.6  0.39  3.2  0.67 
avg. rank  3.35  3.54  3.81  3.77  2.15  2.62  3.38  3.35  2.31  1.58 
The comparisons of the 4D FAP_{OA} with all other optimal assignment approaches show that the 4D FAP_{OA} outperforms all other methods on 6 and 9 data sets with respect to the awROCE_{5%} and AUC, respectively. These results yield a best average rank of 1.58 for the 4D FAP_{OA} with respect to the AUC. For the awROCE_{5%} results the 2SHA achieves the best average rank of 2.15 followed by the 4D FAP_{OA} with an average rank of 2.31. In a direct comparison the 4D FAP_{OA} outperforms the 2SHA approach on 7 data sets, whereas the reverse case only occurs on 5 data sets. To conclude, the 4D FAP_{OA} shows a robust performance on 13 data sets. Considering the results of the complete data sets (AUC) the 4D FAP_{OA} outperforms all other optimal assignment methods. The ability of 4D FAP_{OA} to early enrich different scaffolds is comparable with the 2SHA approach.
The encoding of the conformational space should be most beneficial if the flexibility of the query structure is high. Therefore, we discuss the results on the two data sets with the most flexible query compounds, the ACE and EGFr data set, in more detail.
To evaluate the chemotype discovery on the complete data set, we plotted the fraction of the discovered chemotypes as a function of the fraction of the ranked data set. A chemotype is considered as discovered if one compound of the chemotype is ranked.
Another important property of a VS similarity function is the computation performance. To enable a VS experiment on a realworld database, the VS similarity function should be able to process a reasonable number of compounds in a feasible time. All presented VS similarity functions that are based on the optimal assignment approach were developed at our department, and, therefore we are able to perform a fair comparison of the computation time. We computed the average computation time of each optimal assignment method on the 13 data sets, which were used in Table 2, to approximate a reasonable performance for druglike compounds.
The 4D FAP_{OA} approach has an averaged performance of 38.8 ± 27.56 similarity calculations per second. This computation time is based on preprocessed molecules (GMMs already computed). The OAK yields 27.34 ± 3.40 calculations per second, whereas its flexibility extension (OAK_{FLEX}) computes 41.03 ± 7.32 molecules per second. The OAAP represents the fastest approach with 51.49 ± 18.07 computations per second. In contrast, the 2SHA is the slowest method with a throughput of 14.04 ± 1.78 per second. All calculations were done on a Core2Duo CPU with 2 GHz using one core and 1 GB memory. As a result, the 4D FAP_{OA} is fast enough to screen over 100,000 molecules within one hour on a desktop CPU using only one core. The similarity calculation can be easily parallelized to further increase the throughput, and, therefore the approach should be fast enough for realworld applications.
The preprocessing step (conformational sampling and GMM calculation) represents an additional computational task of our approach. However, the preprocessing step has only to be computed once for each molecule. Additionally, the computation of different conformers (conformational sampling) is often necessary for different tasks in the drug discovery pipeline. Furthermore, our encoding is a modelbased encoding that reduces the memory usage in a database in comparison to the storage of multiple conformers of a molecule.
Comparison with StateoftheArt 2D and 3D Approaches
Average REF and AUC performance
method  REF_{1%}  REF_{5%}  REF_{10%}  AUC  avg. rank 

BABEL  44.4 ± 28.4  41.1 ± 25.4  49.6 ± 26.6  0.74  3.25 
DAYLIGHT  43.9 ± 28.7  41.8 ± 25.8  52.2 ± 26.7  0.74  2.75 
MACCS  30.5 ± 25.7  29.7 ± 22.8  39.6 ± 23.3  0.69  7.0 
BCI  46.7 ± 31.7  41.3 ± 28.5  49.1 ± 29.7  0.74  2.75 
MOLPRINT2D  34.5 ± 28.3  33.8 ± 26.9  40.9 ± 30.2  0.70  6.0 
PARAFIT_{S}  19.1 ± 20.3  24.4 ± 20.1  33.0 ± 22.4  0.67  12.5 
ROCS_{SC}  36.8 ± 29.7  35.2 ± 27.1  44.0 ± 28.7  0.72  5.0 
ROCS_{S}  27.3 ± 25.7  27.8 ± 22.4  35.2 ± 24.1  0.65  10.25 
EON_{SCE}  24.2 ± 26.5  24.8 ± 24.1  33.3 ± 24.1  0.68  10.375 
EON_{SE}  22.9 ± 25.4  24.7 ± 21.5  32.2 ± 22.8  0.68  11.625 
SHAEP_{SE}  29.0 ± 25.5  27.2 ± 22.1  35.3 ± 23.7  0.67  9.75 
SHAEP_{S}  28.1 ± 26.6  27.2 ± 22.1  35.5 ± 23.8  0.67  8.875 
USR  12.7 ± 15.6  16.2 ± 13.9  24.3 ± 17.6  0.61  15.0 
ESHAPE3D_{HYD}  24.0 ± 27.6  23.1 ± 20.8  27.8 ± 23.4  0.54  13.75 
ESHAPE3D  14.1 ± 16.8  13.0 ± 9.8  18.6 ± 12.7  0.42  15.75 
4D FAP_{OA}  46.0 ± 33.1  45.4 ± 30.1  53.5 ± 31.0  0.78  1.25 
The results of Table 3 confirm that the 2D approaches are more robust in comparison to the 3D methods. Only the ROCS_{SC} is able to yield comparable results in comparison to the MACCS keys and MOLPRINT2D. The 4D FAP_{OA} is able to utilize the GMMs as a source of reasonable information, and, therefore the approach yields the best results with respect to the relative enrichment factor at 5% and 10% as well as the AUC metric. Only the BCI approach is able to marginally improve the results with respect to the relative enrichment factor at 1%. The best performance of the 4D FAP_{OA} on three out of four metrics results in the best average rank of 1.25. The BCI and DAYLIGHT fingerprints yield an average rank of 2.75 and represent the best 2Dbased approach. ROCS_{SC} is the best 3Dbased approach with an average rank of 5.0, and, therefore higher ranked as the 2Dbased approaches MOLPRINT2D (6.0) and the MACCS keys (7.0). All other 3Dbased methods are inferior in comparison to the 2Dbased approaches. To conclude, the 4D FAP_{OA} benefits from the conformational space information and is able to yield the best average performance of all methods.
Comparison with PharmacophoreBased Approaches
Relative enrichment factors and chemotype discovery of pharmacophorebased approaches and the 4D FAP_{OA}.
TopPPHist  Flexophore  4D FAP_{OA}  

target  REF_{1%}  Chem_{1%}  REF_{1%}  Chem_{1%}  REF_{1%}  Chem_{1%} 
ACE  65.01  8  75.84  8  75.84  5 
AChe  57.54  4  55.04  3  57.53  4 
ADA  0.0  0  0.0  0  41.41  1 
ALR2  9.79  1  9.79  1  19.59  1 
AmpC  74.35  1  86.74  1  99.13  1 
AR  44.4  2  27.32  1  68.19  1 
CDK2  23.52  4  23.52  3  41.94  4 
COMT  20.88  1  20.88  1  62.63  1 
COX1  21.39  1  64.17  4  64.1  4 
COX2  69.66  9  96.79  7  99.16  13 
DHFR  96.86  4  91.03  9  91.16  6 
EGFr  70.56  2  73.6  12  97.14  13 
ER_{agonist}  45.66  2  41.86  5  72.05  3 
ER_{antagonist}  47.07  2  26.9  2  47.07  1 
FGFr1  0.0  0  12.92  0  10.71  4 
FXa  5.1  1  23.78  4  8.49  2 
GART  10.99  0  10.99  0  10.88  0 
GPB  72.99  3  36.5  3  96.7  3 
GR  26.47  1  26.47  3  36.36  3 
HIVPR  4.78  0  4.78  0  0.0  0 
HIVRT  38.49  2  38.49  3  38.41  2 
HMGR  66.05  1  99.08  2  99.14  2 
HSP90  99.6  2  69.72  2  98.43  2 
InhA  86.52  5  95.47  6  95.47  6 
MR  0.0  0  0.0  0  77.04  1 
NA  31.2  1  20.8  1  57.2  1 
P38  0.0  0  22.27  1  20.85  1 
PARP  7.22  1  14.43  2  0.0  0 
PDE5  58.45  2  58.45  1  96.85  3 
PNP  28.14  2  93.81  4  55.3  3 
PPAR_{ γ }  87.28  0  96.63  0  87.45  0 
PR  0.0  0  9.45  1  37.45  1 
RXR_{ α }  12.99  1  77.92  1  51.95  1 
SAHH  58.01  1  36.26  1  79.77  1 
SRC  1.96  1  0.0  0  7.72  2 
TK  32.86  1  54.76  2  43.81  1 
trypsin  5.95  0  5.95  0  5.84  1 
mean  37.34  1.78  43.31  2.54  55.48  2.65 
avg. rank  2.35  2.30  2.04  1.93  1.61  1.80 
With respect to the early enrichment performance the TopPPHist and the Flexophore approach achieved an average relative enrichment factor of 37.34 ± 31.38 and 43.31 ± 33.25, respectively. The application of the 4D FAP_{OA} resulted in an average relative enrichment factor of 55.45 ± 33.26 and increased the performance of the Flexophore approach by over 20%. However, based on their abstract representation of molecules, one of the strengths of pharmacophorebased approaches is the ability to discover new chemical entities. This abstraction from the query scaffold can be seen in the chemotype discovery results of Table 4. The Flexophore approach needs ≈ 20% less active compounds to discover a similar amount of chemotypes (94) in comparison with the 4D FAPOA (98). The 2Dbased TopPPHist discovered only 66 chemotypes over all 40 data sets and showed an inferior chemotype discovery in comparison with the 4Dbased approaches (Flexophore, 4D FAP_{OA}).
Comparison with Multiple Conformer Approaches
The results of the previous sections demonstrated the inferior performance of 3Dbased approaches in comparison with 2Dbased methods. A common technique to tackle this deficit of 3D approaches is to utilize multiple conformers and average or use the maximum of all pairwise similarity values. The number of necessary similarity computations scales with O(n^{2}), where n represents the number of conformers of the molecules. Therefore, this technique implies a significant increase in computation time. However, the averaging over multiple conformers increases the available information content of the 3Dbased approaches to a level that is similar in comparison to the 4D FAP_{OA}. The 4D FAP_{OA} has a modelbased description of the conformational space, whereas the 3Dbased approaches explicitly have the conformational space. Consequently, a comparison of the 4D FAP_{OA} with 3Dbased approaches on multiple conformers represents an interesting comparison based on a equal source of information.
Average AUC of ROCS_{SC} with multiple conformers and the 4D FAP_{OA}
ROCS_{SC}  4D FAP_{OA} AUC  

target  AUC_{1}  AUC_{10}  AUC_{100}  AUC_{1000}  
ACE  0.69  0.85  0.82  0.77  0.89 
AChE  0.76  0.75  0.77  0.78  0.72 
ADA  0.63  0.76  0.60  0.59  0.67 
ALR2  0.45  0.47  0.49  0.50  0.63 
AmpC  0.77  0.86  0.88  0.88  0.90 
AR  0.81  0.80  0.79  0.79  0.89 
CDK2  0.78  0.70  0.69  0.67  0.75 
COMT  0.32  0.27  0.33  0.34  0.97 
COX1  0.62  0.62  0.58  0.57  0.63 
COX2  0.95  0.94  0.95  0.95  0.93 
DHFR  0.68  0.45  0.91  0.89  0.99 
EGFr  0.81  0.82  0.95  0.95  0.98 
ER_{agonist}  0.92  0.93  0.94  0.94  0.85 
ER_{antagonist}  0.94  0.97  0.98  0.98  0.92 
FGFr1  0.53  0.61  0.51  0.45  0.63 
FXa  0.61  0.49  0.66  0.64  0.62 
GART  0.43  0.50  0.77  0.84  0.82 
GPB  0.84  0.93  0.94  0.94  0.97 
GR  0.81  0.81  0.77  0.76  0.92 
HIVPR  0.71  0.61  0.58  0.61  0.21 
HIVRT  0.71  0.71  0.72  0.71  0.56 
HMGR  0.76  0.90  0.94  0.93  0.97 
HSP90  0.69  0.71  0.66  0.64  0.86 
InhA  0.72  0.81  0.78  0.79  0.69 
MR  0.83  0.86  0.86  0.85  0.91 
NA  0.97  0.96  0.97  0.97  0.97 
P38  0.46  0.49  0.48  0.48  0.75 
PARP  0.63  0.59  0.58  0.58  0.77 
PDE5  0.68  0.58  0.59  0.56  0.78 
PDGFrb  0.41  0.39  0.30  0.28  0.60 
PNP  0.56  0.58  0.88  0.89  0.94 
PPAR_{γ}  0.87  0.68  0.74  0.91  0.96 
PR  0.74  0.73  0.68  0.69  0.94 
RXR_{ α }  0.88  0.98  0.97  0.95  0.98 
SAHH  0.96  0.96  0.98  0.98  0.98 
SRC  0.51  0.50  0.39  0.34  0.53 
thrombin  0.54  0.66  0.66  0.55  0.59 
TK  0.68  0.84  0.88  0.89  0.87 
trypsin  0.41  0.49  0.57  0.65  0.80 
VEGFr2  0.61  0.54  0.44  0.39  0.64 
avg. rank  3.55  3.26  2.96  3.2  2 
The average AUC of the ROCS_{SC} increases from 0.692 (AUC(1)) over 0.703 (AUC(10)) to 0.725(AUC(100)). The results on 1000 conformers are marginally inferior (average AUC(1000) of 0.722) in comparison to the results on 100 conformers. As a result, the ROCS_{SC} slightly benefits from the additional information content of multiple conformers. However, the average AUC of the 4D FAP_{OA} is 0.80, and, therefore superior in comparison to all four ROCS_{SC} setups. These results are verified by the average ranks of the approaches. The 4D FAP_{OA} is able to achieve the best AUC value on 27 out of 40 data sets and demonstrates its robust performance on a wide range of pharmaceutically relevant targets. The best ROCS_{SC} setup (100 conformers) yields on eight data sets the best result. Please note that the different average AUC values in Table 3 and 5 are the result of the applied data sets (filtered DUD in Table 3 and unfiltered in Table 5).
awROCE and AUC results of FieldScreen and the 4D FAP_{OA}
FieldScreen  4D FAP_{OA}  

data set  awROCE_{5%}  AUC  awROCE_{5%}  AUC 
ACE  4.7  0.67  12.2  0.88 
AChE  7.3  0.76  7.6  0.75 
CDK2  0.8  0.47  3.5  0.77 
COX2  10.4  0.92  11.9  0.89 
EGFr  9.5  0.84  18.0  0.99 
FXa  5.4  0.74  3.2  0.64 
HIVRT  5.1  0.70  2.3  0.58 
InhA  6.5  0.71  7.8  0.66 
P38  0.5  0.33  3.1  0.68 
PDE5  4.8  0.66  3.6  0.69 
PDGFrb  3.8  0.29  4.9  0.66 
SRC  2.5  0.45  2.7  0.51 
VEGFr2  3.5  0.48  3.2  0.67 
mean  4.98  0.62  6.5  0.72 
avg. rank  1.69  1.62  1.31  1.38 
The 4D FAP_{OA} yields a superior early enrichment performance (awROCE_{5%}) on 9 out of 13 data sets. Concerning the performance on the complete data set (AUC) our approach outperforms FieldScreen on 8 data sets. The 4D FAP_{OA} is able to increase the mean early enrichment and complete data set performance by ≈ 30% and ≈ 16%, respectively. The major improvements of FieldScreen in comparison to the 4D FAP_{OA} are on the FXa and HIVRT data sets. These data sets also consist of larger molecules, and, therefore the risk of topological errors is increased and is probably a reason for the inferior 4D FAP_{OA} performance.
To conclude, the best 3Dbased approach of Table 3 (ROCS_{SC}) could increase the performance if it is applied on multiple conformer data sets. However, the performance gain was not strong enough to reach the results of the 4D FAP_{OA}. The comparison with the FieldScreen approach yields similar results and underpinned the robust performance of the 4D FAP_{OA}. The detailed evaluation of the results reveals a weakness of our approach if the compounds of a data set have an increased number of heavy atoms. This weakness is likely the result of the optimal assignment step and was already reported as a weak point of optimal assignment approaches [28]. Nevertheless, the 4D FAP_{OA} represents a robust similarity measure for small and medium sized druglike compounds.
Conclusions
We presented a VS similarity function that operates on GMM encoded conformational space information. Our approach is able to compare the conformational space of molecules within one step, and, therefore avoids the application of timeconsuming averaging techniques. The approach was already applied in QSAR experiments and demonstrated its robust performance in comparison to similar 3Dbased QSAR models [23, 28].
The aim of this study was to evaluate our approach as VS similarity function. Therefore, we compared the results of the 4D FAP_{OA} with 20 other 2D and 3Dbased approaches. Additionally, we applied two approaches (ROCS_{SC} and FieldScreen) that operate on multiple conformers to provide a comparison of approaches that are based on a similar information content.
The results showed that our approach is able to achieve superior results on a wide range of pharmaceutically relevant targets. Even the best 3D approach, with respect to the results of Venkatraman et al. [21], applied on multiple conformers is inferior in comparison to our approach.
The preprocessing, which is necessary to encode the conformational space information by means of GMMs, represents an additional computational step. However, all compounds have only be computed once and the encoded models need less space in comparison to the storage of conformational ensembles. The computational speed of the actual similarity function is fast enough to screen over 100,000 compounds within one hour on a standard desktop CPU with one core. Therefore, our approach should meet the requirements of realworld VS applications.
The complete source code of the preprocessing tool (computing GMMs based on conformational ensembles) as well as the 4D FAP_{OA} similarity function are publicly available on our department website http://www.cogsys.cs.unituebingen.de/software/4DFAP.
List of abbreviations
 ACE:

angiotensinconverting enzyme
 AChE:

acetylcholinesterase
 ADA:

adenosine deaminase
 ALR2:

aldose reductase
 AmpC:

AmpC βlactamase
 AR:

androgen receptor
 CDK2:

cyclindependent kinase 2
 COMT:

catechol Omethyltransferase
 COX1:

cyclooxygenase1
 COX2:

cyclooxygenase2
 DHFR:

dihydrofolate reductase
 EGFr:

epidermal growth factor receptor
 ER:

estrogen receptor
 FGFr1:

fibroplast growth factor receptor kinase
 FXa:

factor Xa
 GART:

glycinamide ribonucleotide transformylase
 GPB:

glycogen phosphorylase β
 GR:

glucocorticoid receptor
 HIVPR:

HIV protease
 HIVRT:

HIV reverse transcriptase
 HMGR:

hydroxymethylglutarylCoA reductase
 HSP90:

human heat shock protein 90
 InhA:

enoyl ACP reductase
 MR:

mineralocorticoid receptor
 NA:

neuraminidase
 P38:

P38 mitogen activated protein
 PARP:

poly(ADPribose) polymerase
 PDE5:

phosphodiesterase 5
 PDGFrb:

platelet derived growth factor receptor kinase
 PNP:

purine nucleoside phosphorylase
 PPAR_{ γ }:

peroxisome proliferator activated receptor γ
 PR:

progesterone receptor
 RXR_{ α }:

retinoic X receptor α
 SAHH:

Sadenosylhomocysteine hydrolase
 SRC:

tyrosine kinase SRC
 TK:

thymidine kinase
 VEGFr2:

vascular endothelial growth factor receptor.
Declarations
Authors’ Affiliations
References
 von Korff M, Freyss J, Sander T: Flexophore, a New Versatile 3D Pharmacophore Descriptor That Considers Molecular Flexibility. J Chem Inf Model. 2008, 48 (4): 797810. 10.1021/ci700359j.View ArticleGoogle Scholar
 Bajorath J: Integration of virtual and highthroughput screening. Nat Rev Drug Discov. 2002, 1 (11): 882894. 10.1038/nrd941.View ArticleGoogle Scholar
 Varnek A, Tropsha A, (Eds): Chemoinformatics Approaches to Virtual Screening. 2008, Cambridge: The Royal Society of ChemistryGoogle Scholar
 Geppert H, Vogt M, Bajorath J: Current Trends in LigandBased Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. J Chem Inf Model. 2010, 50 (2): 205216. 10.1021/ci900419k.View ArticleGoogle Scholar
 Bender A, Jenkins JL, Scheiber J, Sukuru SC, Glick M, Davies JW: How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space. J Chem Inf Model. 2009, 49: 108119. 10.1021/ci800249s.View ArticleGoogle Scholar
 Bender A, Mussa HY, Glen RC, Reiling S: Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance. J Chem Inf Comput Sci. 2004, 44 (5): 17081718.View ArticleGoogle Scholar
 Barnard JM, Downs GM: Chemical Fragment Generation and Clustering Software. J Chem Inf Comput Sci. 1997, 37: 141142.View ArticleGoogle Scholar
 Daylight Chemical Information Systems Inc. [http://www.daylight.com]
 Symyx Software: MACCS structural keys. San Ramon, CA. 2005Google Scholar
 Good AC, Hermsmeier MA, Hindle S: Measuring CAMD Technique Performance: A Virtual Screening Case Study in the Design of Validation Experiments. J ComputAided Mol Des. 2004, 18 (7): 529536. 10.1007/s1082200440671.View ArticleGoogle Scholar
 Venkatraman V, Chakravarthy P, Kihara D: Application of 3D Zernike descriptors to shapebased ligand similarity searching. J Cheminf. 2009, 1: 1910.1186/17582946119.View ArticleGoogle Scholar
 Grant JA, Gallardo MA, Pickup BT: A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape. J Comput Chem. 1996, 17 (14): 16531666. 10.1002/(SICI)1096987X(19961115)17:14<1653::AIDJCC7>3.0.CO;2K.View ArticleGoogle Scholar
 Mavridis L, Hudson BD, Ritchie DW: Toward High Throughput 3D Virtual Screening Using Spherical Harmonic Surface Representations. J Chem Inf Model. 2007, 47 (5): 17871796. 10.1021/ci7001507.View ArticleGoogle Scholar
 Vainio MJ, Puranen SJ, Johnson MS: ShaEP: Molecular Overlay Based on Shape and Electrostatic Potential. J Chem Inf Model. 2009, 49 (2): 492502. 10.1021/ci800315d.View ArticleGoogle Scholar
 Clark DE, Jones G, Willett P, Kenny PW, Glen RC: Pharmacophoric pattern matching in files of threedimensional chemical structures: Comparison of conformationalsearching algorithms for flexible searching. J Chem Inf Comput Sci. 1994, 34: 197206.View ArticleGoogle Scholar
 Baumann K: Distance Profiles (DiP): A translationally and rotationally invariant 3D structure descriptor capturing steric properties of molecules. Quant StructAct Relat. 2002, 21: 507519. 10.1002/15213838(200211)21:5<507::AIDQSAR507>3.0.CO;2L.View ArticleGoogle Scholar
 Ballester PJ, Finn PW, Richards WG: Ultrafast shape recognition: Evaluating a new ligandbased virtual screening technology. J Mol Graphics Modell. 2009, 27 (7): 836845. 10.1016/j.jmgm.2009.01.001.View ArticleGoogle Scholar
 Iyer M, Hopfinger AJ: Treating Chemical Diversity in QSAR Analysis: Modeling Diverse HIV1 Integrase Inhibitors Using 4D Fingerprints. J Chem Inf Model. 2007, 47 (5): 19451960. 10.1021/ci700153g.View ArticleGoogle Scholar
 Senese CL, Duca J, Pan D, Hopfinger AJ, Tseng YJ: 4DFingerprints, Universal QSAR and QSPR Descriptors. J Chem Inf Comput Sci. 2004, 44 (5): 15261539.View ArticleGoogle Scholar
 Ebalunode JO, Zheng W: Unconventional 2D Shape Similarity Method Affords Comparable Enrichment as a 3D Shape Method in Virtual Screening Experiments. J Chem Inf Model. 2009, 49 (6): 13131320. 10.1021/ci900015b.View ArticleGoogle Scholar
 Venkatraman V, PérezNueno VI, Mavridis L, Ritchie DW: Comprehensive Comparison of LigandBased Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods. J Chem Inf Model. 2010, 50 (12): 20792093. 10.1021/ci100263p.View ArticleGoogle Scholar
 Dixon SL, Merz KM: OneDimensional Molecular Representations and Similarity Calculations: Methodology and Validation. J Med Chem. 2001, 44 (23): 37953809. 10.1021/jm010137f.View ArticleGoogle Scholar
 Jahn A, Hinselmann G, Fechner N, Henneges C, Zell A: Probabilistic Modeling of Conformational Space for 3D Machine Learning Approaches. Mol Inf. 2010, 29 (5): 441455. 10.1002/minf.201000036.View ArticleGoogle Scholar
 Dempster AP, Laird NM, Rubin DB: Maximum Likelihood from Incomplete Data via the EM Algorithm. J R Stat Soc Ser B Stat Methodol. 1977, 39: 138.Google Scholar
 Schwarz GE: Estimating the dimension of a model. Ann Stat. 1978, 6 (2): 461464. 10.1214/aos/1176344136.View ArticleGoogle Scholar
 Akaike H: A new look at the statistical model identification. IEEE Trans Autom Control. 1974, 19 (6): 716723. 10.1109/TAC.1974.1100705.View ArticleGoogle Scholar
 Jahn A, Hinselmann G, Rosenbaum L, Fechner N, Zell A: BoltzmannEnhanced Flexible AtomPair Kernel with Dynamic Dimension Reduction. Mol Inf. 2011, 30 (4): 307315. 10.1002/minf.201000120.View ArticleGoogle Scholar
 Jahn A, Hinselmann G, Fechner N, Zell A: Optimal assignment methods for ligandbased virtual screening. J Cheminf. 2009, 1: 1410.1186/17582946114.View ArticleGoogle Scholar
 Hinselmann G, Fechner N, Jahn A, Eckert M, Zell A: Graph kernels for chemical compounds using topological and threedimensional local atom pair environments. Neurocomputing. 2010, 74 (13): 219229. 10.1016/j.neucom.2010.03.008.View ArticleGoogle Scholar
 Fröhlich H, Wegner JK, Sieker F, Zell A: Optimal assignment kernels for attributed molecular graphs. ICML '05. Proceedings of the 22nd international conference on Machine learning. 2005, New York, NY, USA: ACM, 225232.View ArticleGoogle Scholar
 Fröhlich H, Wegner JK, Sieker F, Zell A: Kernel Functions for Attributed Molecular Graphs  A New SimilarityBased Approach to ADME Prediction in Classification and Regression. QSAR Comb Sci. 2006, 25 (4): 317326. 10.1002/qsar.200510135.View ArticleGoogle Scholar
 DUD  A Directory of Useful Decoys. [http://dud.docking.org]
 Huang N, Shoichet BK, Irwin JJ: Benchmarking Sets for Molecular Docking. J Med Chem. 2006, 49 (23): 67896801. 10.1021/jm0608356.View ArticleGoogle Scholar
 Oprea TI, Davis AM, Teague SJ, Leeson PD: Is There a Difference between Leads and Drugs? A Historical Perspective. J Chem Inf Comput Sci. 2001, 41 (5): 13081315.View ArticleGoogle Scholar
 Good AC, Oprea TI: Optimization of CAMD Techniques 3. Virtual Screening Enrichment Studies: a Help or Hindrance in Tool Selection?. J ComputAided Mol Des. 2008, 22 (34): 169178. 10.1007/s1082200791672.View ArticleGoogle Scholar
 DUD Filtered  Leadlike filtered DUD. [http://dud.docking.org/jahn/]
 van Drie JH: Pharmacophore Discovery  Lessons Learned. Curr Pharm Des. 2003, 9 (20): 16491664. 10.2174/1381612033454568.View ArticleGoogle Scholar
 Schrödinger, LLC: ConfGen. version 2.2, New York, NY. 2010Google Scholar
 Watts KS, Dalal P, Murphy RB, Sherman W, Friesner RA, Shelley JC: ConfGen: A Conformational Search Method for Efficient Generation of Bioactive Conformers. J Chem Inf Model. 2010, 50 (4): 534546. 10.1021/ci100015j.View ArticleGoogle Scholar
 Chen IJ, Foloppe N: Druglike Bioactive Structures and Conformational Coverage with the LigPrep/ConfGen Suite: Comparison to Programs MOE and Catalyst. J Chem Inf Model. 2010, 50 (5): 822839. 10.1021/ci100026x.View ArticleGoogle Scholar
 von Korff M, Freyss J, Sander T: Comparison of Ligand and StructureBased Virtual Screening on the DUD Data Set. J Chem Inf Model. 2009, 49: 209231. 10.1021/ci800303k.View ArticleGoogle Scholar
 Mackey MD, Melville JL: Better than Random? The Chemotype Enrichment Problem. J Chem Inf Model. 2009, 49 (5): 11541162. 10.1021/ci8003978.View ArticleGoogle Scholar
 Truchon JF, Bayly CI: Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem. J Chem Inf Model. 2007, 47 (2): 488508. 10.1021/ci600426e.View ArticleGoogle Scholar
 Fechner N, Jahn A, Hinselmann G, Zell A: Atomic Local Neighborhood Flexibility Incorporation into a Structured Similarity Measure for QSAR. J Chem Inf Model. 2009, 49 (3): 549560. 10.1021/ci800329r.View ArticleGoogle Scholar
 Cheeseright TJ, Mackey MD, Melville JL, Vinter JG: FieldScreen: Virtual Screening Using Molecular Fields. Application to the DUD Data Set. J Chem Inf Model. 2008, 48 (11): 21082117. 10.1021/ci800110p.View ArticleGoogle Scholar
 The Open Babel Package. [Version 2.2.3], [http://openbabel.org]
 Muchmore SW, Souers AJ, AkritopoulouZanze I: The Use of ThreeDimensional Shape and Electrostatic Similarity Searching in the Identification of a MelaninConcentrating Hormone Receptor 1 Antagonist. Chem Biol Drug Des. 2006, 67 (2): 174176. 10.1111/j.17470285.2006.00341.x.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.