Robust optimization of SVM hyperparameters in the classification of bioactive compounds
 Wojciech M Czarnecki^{1},
 Sabina Podlewska^{2, 3} and
 Andrzej J Bojarski^{2}Email author
Received: 25 April 2015
Accepted: 6 July 2015
Published: 14 August 2015
Abstract
Background
Support Vector Machine has become one of the most popular machine learning tools used in virtual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and \(\gamma\) values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model.
Results
In this study, we investigated the Bayesian and random search optimization of Support Vector Machine hyperparameters for classifying bioactive compounds. The effectiveness of these strategies was compared with the most popular optimization procedures—grid search and heuristic choice. We demonstrated that Bayesian optimization not only provides better, more efficient classification but is also much faster—the number of iterations it required for reaching optimal predictive performance was the lowest out of the all tested optimization methods. Moreover, for the Bayesian approach, the choice of parameters in subsequent iterations is directed and justified; therefore, the results obtained by using it are constantly improved and the range of hyperparameters tested provides the best overall performance of Support Vector Machine. Additionally, we showed that a random search optimization of hyperparameters leads to significantly better performance than grid search and heuristicbased approaches.
Conclusions
Keywords
Compounds classification Virtual screening Support Vector Machine Parameters optimization Bayesian optimizationBackground
The application of computational methods at various stages of drug design and development has become a vital part of the process. As the methods developed become constantly more effective, despite the aims at optimizing their performance, the focus of the attention shifts away from performance optimization to the minimization of requirements for computational resources. The attainment of both effectiveness and the desired speed has been responsible for the recent extreme popularity of machine learning (ML) methods in computeraided drug design (CADD) approaches. Machine learning methods are mostly used for virtual screening (VS) tasks, in which they are supposed to identify potentially active compounds in large databases of chemical structures. One of the most widely used ML methods in CADD is the Support Vector Machine (SVM). Although it has a potential of providing very high VS performance, its application requires the optimization of the parameters used during the training process, which was proved to be crucial for obtaining accurate predictions. To date, various approaches have been developed to make SVM faster and more effective. In cheminformatics applications, the most popular optimization strategies are grid search [1, 2] and heuristic choice [3, 4]. Depending on the problem, they are able to provide high classification accuracy—for example Wang et al. obtained 86% of accuracy in the classification of hERG potassium channel inhibitors for the heuristic choice of the SVM parameters [4]. On the other hand, Hamman et al. [1] were able to evaluate the cytochrome P450 activities with 66–83% of accuracy using grid search method of SVM parameters optimization. The need for optimizing SVM parameters is undeniable, as classification efficiency can change dramatically for various parameters values. A high computational cost of a systematic search over a predefined set of parameters’ values is a trigger for development of new optimization algorithms. In recent years, Bayesian optimization [5, 6] (including gaussian processes [7]) and random searchbased selection [8] have become more popular [9, 10]. As those approaches were not explored so far in the field of cheminformatics, we analyze their impact on classification accuracy and, more importantly, the speed and ease of use, that these approaches have lent to the optimization of SVM hyperparameters in the search for bioactive compounds.
Hyperparameters optimization
While such an approach guarantees finding the global optimum for \(k \rightarrow \infty\), it might be extremely computationally expensive, as we need to train k classifiers, each of which can take hours. Instead, we can actually try to solve the optimization problem directly by performing an adaptive process that on one hand tries to maximize the objective function and on the other hand samples the possible \(\varvec{\lambda }\) space intelligently in order to minimize the number of classifier trainings. The main idea behind Bayesian optimization for such a problem is to use all of the information gathered in previous iterations for performing the next step. It is apparent that grid searchbased methods violate this assumption as we do not use any knowledge coming out from the results of models trained with other \(\varvec{\lambda }\) values.
Unfortunately, f is an unknown function and we cannot compute its gradient, Hessian, or any other characteristics that could guide the optimization process. The only action we can perform is to obtain a value for f at a given point. However, doing so is very expensive (because it requires training a classifier); thus, we need a fast (with respect to evaluating the function), derivativefree optimization technique to solve this problem.
Details of the classification experiments performed
Targets  Fingerprints  Optimization method  \({C \mathrm\;{\text{and}}\; \gamma \mathrm\;{\text{range}}}\) 

No of iterations  
5HT\(_\text {2A}\)  EstateFP  Bayes  \(\log _{10}(C) \in [2, 5]\) 
5HT\(_\text {2C}\)  ExtFP  Random  \({\log _{10}(\gamma ) \in [10, 3]}\) 
5HT\(_\text {6}\)  KlekFP  Grid search  20, 30, 50, 75, 100, 150 
5HT\(_\text {7}\)  MACCSFP  Small grid  
CDK2  PubchemFP  SVMlight  
M\(_\text {1}\)  SubFP  libSVM  
ERK2  
AChE  
A\(_\text {1}\)  
alpha2AR  
beta1AR  
beta3AR  
CB1  
DOR  
D\(_\text {4}\)  
H\(_\text {1}\)  
H\(_\text {3}\)  
HIVi  
IR  
ABL  
HLE 
Results and discussion
Classification effectiveness analysis
A comparison of the number of highest accuracies obtained with the Bayesian optimization and grid search
Comparison  Bayes  Grid search 

Global  96  34 
EstateFP  15  7 
ExtFP  16  6 
KlekFP  16  5 
MACCSFP  18  4 
PubchemFP  16  6 
SubFP  15  6 
5HT\(_\text {2A}\)  5  1 
5HT\(_\text {2C}\)  5  1 
5HT\(_\text {6}\)  4  3 
5HT\(_\text {7}\)  3  3 
CDK2  6  0 
M\(_\text {1}\)  6  1 
ERK2  5  1 
AChE  5  1 
A\(_\text {1}\)  5  1 
alpha2AR  5  1 
beta1AR  3  3 
beta3AR  3  4 
CB1  5  1 
DOR  4  2 
D\(_\text {4}\)  5  1 
H\(_\text {1}\)  6  0 
H\(_\text {3}\)  5  1 
HIVi  1  5 
IR  5  1 
ABL  6  0 
HLE  4  3 
Because grid search was the secondplace method in the majority of the analyses, both for global analysis, and fingerprint and targetbased comparisons, a direct comparison of the number of the highest accuracies obtained for Bayesian optimization and the grid search approach was performed (Table 2). The sum of the number of wins is not equal for the given fingerprintbased or targetbased comparison as the draws were also considered.
Examination of optimization steps in time
The AUC values obtained in 5HT\(_\text {2A}\), ExtFP for curves illustrating changes in the accuracy in time and final optimal accuracy values obtained
optimization method  AUC  Final accuracy 

Bayes  0.892*  0.896* 
Random  0.885  0.887 
Grid search  0.802  0.881 
SVMlight  0.683  0.683 
libSVM  0.847  0.847 
The average AUC values–global, obtained for a particular fingerprint and particular target
Fingerprint/target  Bayes  Random  Grid search  SVMlight  libSVM 

global  0.883*  0.870  0.799  0.676  0.792 
EstateFP  0.847*  0.829  0.774  0.690  0.763 
ExtFP  0.902*  0.891  0.806  0.669  0.874 
KlekFP  0.899*  0.889  0.812  0.669  0.730 
MACCSFP  0.890*  0.876  0.798  0.683  0.828 
PubchemFP  0.898*  0.885  0.816  0.669  0.808 
SubFP  0.864*  0.854  0.787  0.677  0.749 
5HT\(_\text {2A}\)  0.860*  0.850  0.780  0.683  0.743 
5HT\(_\text {2C}\)  0.848*  0.821  0.702  0.568  0.717 
5HT\(_\text {6}\)  0.913*  0.910  0.886  0.814  0.862 
5HT\(_\text {7}\)  0.830*  0.816  0.748  0.675  0.714 
CDK2  0.876*  0.875  0.796  0.664  0.768 
M\(_\text {1}\)  0.850*  0.843  0.778  0.557  0.748 
ERK2  0.958  0.961*  0.949  0.931  0.942 
AChE  0.884*  0.854  0.788  0.611  0.764 
A\(_\text {1}\)  0.843*  0.835  0.764  0.564  0.720 
alpha2AR  0.875*  0.874  0.773  0.563  0.725 
beta1AR  0.910*  0.864  0.798  0.710  0.828 
beta3AR  0.874*  0.823  0.826  0.545  0.722 
CB1  0.874*  0.854  0.782  0.622  0.793 
DOR  0.888*  0.880  0.734  0.599  0.814 
D\(_\text {4}\)  0.841*  0.837  0.759  0.698  0.745 
H\(_\text {1}\)  0.898*  0.880  0.638  0.548  0.801 
H\(_\text {3}\)  0.937*  0.926  0.906  0.897  0.905 
HIVi  0.939  0.945*  0.934  0.901  0.911 
IR  0.936*  0.936*  0.925  0.886  0.897 
ABL  0.850*  0.831  0.748  0.587  0.733 
HLE  0.867*  0.865  0.763  0.578  0.779 
The average final accuracy values—global, obtained for a particular fingerprint and particular target
fingerprint/target  Bayes  Random  Grid search  SVMlight  libSVM 

Global  0.889*  0.873  0.876  0.676  0.792 
EstateFP  0.852*  0.832  0.833  0.690  0.763 
ExtFP  0.907*  0.896  0.892  0.669  0.874 
KlekFP  0.907*  0.890  0.891  0.669  0.730 
MACCSFP  0.898*  0.878  0.880  0.683  0.828 
PubchemFP  0.901*  0.886  0.894  0.669  0.808 
SubFP  0.869*  0.856  0.864  0.677  0.749 
5HT\(_\text {2A}\)  0.871*  0.848  0.860  0.683  0.743 
5HT\(_\text {2C}\)  0.855*  0.825  0.772  0.568  0.717 
5HT\(_\text {6}\)  0.916*  0.915  0.933  0.814  0.862 
5HT\(_\text {7}\)  0.833*  0.819  0.819  0.675  0.714 
CDK2  0.885*  0.881  0.870  0.664  0.768 
M\(_\text{1}\)  0.858  0.846  0.897*  0.557  0.748 
ERK2  0.959  0.961*  0.961*  0.931  0.942 
AChE  0.889*  0.857  0.872  0.611  0.764 
A\(_\text {1}\)  0.856  0.838  0.882*  0.564  0.720 
alpha2AR  0.880*  0.873  0.872  0.563  0.725 
beta1AR  0.914*  0.870  0.864  0.710  0.828 
beta3AR  0.879  0.825  0.972*  0.545  0.722 
CB1  0.881*  0.857  0.868  0.622  0.793 
DOR  0.897*  0.884  0.872  0.599  0.814 
D\(_\text {4}\)  0.849*  0.838  0.837  0.698  0.745 
H\(_\text {1}\)  0.904*  0.879  0.691  0.548  0.801 
H\(_\text {3}\)  0.938*  0.926  0.919  0.897  0.905 
HIVi  0.938  0.946  0.967*  0.901  0.911 
IR  0.939  0.937  0.956*  0.886  0.897 
ABL  0.857*  0.836  0.840  0.587  0.733 
HLE  0.867*  0.871  0.864  0.578  0.779 
The analysis of the results obtained for the example target/fingerprint pair (5HT\(_\text {2A}\), ExtFP; Table 3) shows that both the highest AUC and final optimal accuracy values were obtained with the Bayesian strategy for SVM optimization. A similar observation was made for the global and fingerprintbased analysis; Bayesian optimization provided the best average AUC and average optimal accuracy for all fingerprints, as well as the global average value of this parameter. Interestingly, although grid search was the secondplace method for optimal accuracy, it was actually the random search that outperformed this method in terms of AUC, which could be explained from an analysis of the respective curves. Although the grid search method provided higher final accuracy values, these occurred relatively 'late' (after a series of iterations), high accuracies were obtained almost immediately for random search (Figs. 4, 5). Similarly, the average AUC and optimal accuracy values calculated for various targets were highest for Bayesian optimization in the great majority of cases. HIVi and ERK2 were the only targets for which the averaged AUC obtained with the Bayesian optimization strategy was outperformed by other optimization methods. On the other hand, the group of targets for which the average optimal accuracy values were the highest for methods other than Bayesian optimization was a bit more extensive (i.e., M\(_\text {1}\), ERK2, A\(_\text {1}\), beta3AR, HIVi, IR). However, for most of these targets, the difference between the best average accuracy and that obtained with Bayesian optimization was approximately 3% (however, for example for beta3AR this difference approached to 10%, from 0.879 to 0.972). On the other hand, an improvement of several percentage points was also observed when the average AUC and optimal accuracy obtained with the Bayesian strategy were compared with the strategy that provided the ‘secondbest’ accuracy value in the ranking.
The number of iterations required to achieve optimal SVM performance was also analyzed in detail (Fig. 5; Additional file 3). The most striking observation was that all curves corresponding to the Bayesian optimization results were both shifted towards higher accuracy values and were much ‘shorter’, meaning that a significantly lower number of iterations was necessary in total to reach optimal SVM performance. Two relevant points arise from a comparison of Bayesian optimization with the grid search method (which sometimes outperformed Bayesian optimization): obtaining optimal accuracy with the grid search method required many more calculations, and even when grid search yielded higher accuracy values than Bayesian optimization, the difference between the two was approximately 1–2%. This result indicates that even when Bayesian optimization ‘lost’, the results provided by this strategy were still very good and taking into account the calculation speed, it can be successfully applied also in experiments for which it was not indicated to be the best approach. A very interesting observation arising from Fig. 5 is that random search reached the optimal classification effectiveness (as measured by accuracy) in the least number of iterations, below 10 in the majority of cases. EstateFP, ExtFP, MACCSFP and PubchemFP, showed similar tendency with respect to the comparison of Bayesian optimization and the grid search strategy; for an initial number of iterations (40), the accuracy values obtained with the grid search were approximately 20% lower than those obtained with the Bayesian approach. However, as the number of iterations for grid search increased, the accuracy values were also higher, and when the number of iterations reached approximately 100, the grid search results were similar to those obtained with Bayesian optimization. On the other hand, for both KlekFP and SubFP, the initial observations were the same; for a lower number of iterations, Bayesian optimization led to significantly higher accuracy values than the grid search approach, and for a higher number of iterations (over 80 for KlekFP and over 115 for SubFP), grid search provided accuracy values at a similar level to the values obtained with the Bayesian strategy. However, increasing the number of iterations for Bayesian optimization from approximately 10 to 90 for KlekFP and 150 for SubFP did not lead to a significant increase in the accuracy (an almost vertical line corresponding to these numbers of iterations), which was already very high (over 0.85 for KlekFP and over 0.8 for SubFP). Further optimization led to further improvement in accuracy of approximately 2–3%.
The results were also analyzed regarding the changes in the accuracy when additional steps were applied. A panel of example results is shown in Fig. 6 for the cannabinoid CB1/SubFP combination (the remaining targets are in Additional file 4). The black dots show the set of parameters tested in the particular approach, and the black squares represent the set of parameters selected as optimal. This chart shows the advantage of Bayesian optimization in terms of the way of work, and the sequence of selected parameters. The set of tested parameters is fixed for grid search optimization, whereas in case of random search, it is based on the random selection. On the other hand, the selection of parameters for Bayesian optimization is more directed, which also affects the effectiveness of the classification. For grid search, only a small fraction of the parameters tested provided satisfactory predictive power of the model (only approximately 35% of the predictions resulted in an accuracy exceeding 0.7). Surprisingly, a relatively high classification efficiency was obtained with the use of the random search approach—60% of the sets of parameters tested provided predictions with an accuracy over 0.7. However, investigation of the Bayesian optimization approach to parameter selection revealed that the choice of parameters tested was justified, and hence, the results obtained with their use were significantly better than those obtained with the other approaches—75% predictions with accuracy over 0.7.

libSVM heuristic (when only one set of hyperparameters is needed),

random search (when we need a strong model quickly, using less than a few dozen iterations),

a Bayesian approach (when we want the strongest model and can wait a bit longer).
The number of active and inactive compounds in the dataset
Protein  Actives  Inactives 

5HT\(_\text {2A}\)  1836  852 
5HT\(_\text {2C}\)  1211  927 
5HT\(_\text {6}\)  1491  342 
5HT\(_\text {7}\)  705  340 
CDK2  741  1462 
M\(_\text {1}\)  760  939 
ERK2  72  958 
AChE  1147  1804 
A\(_\text {1}\)  1789  2286 
alpha2AR  364  283 
beta1AR  195  477 
beta3AR  111  133 
CB1  1964  1714 
DOR  2535  1992 
D\(_\text {4}\)  1034  449 
H\(_\text {1}\)  636  546 
H\(_\text {3}\)  2706  313 
HIVi  102  915 
IR  147  1139 
ABL  409  582 
HLE  820  610 
Experimental
Fingerprints used for compounds representation
Fingerprint  Abbreviation  Length  Short description 

EState fingerprint  EStateFP  79  Computes electrotopological state (Estate) index for each atom, describing its electronic state with consideration of the influence of other atoms in particular structure 
Extended fingerprint  ExtFP  1024  A hashed fingerprint with each atom in the given structure being a starting point of a string of a length not exceeding six atoms. A hash code is produced for every path of such type and in turn it constitutes the basis of a bit string representing the whole structure 
Klekota and Roth fingerprint  KlekFP  4860  Fingerprint analyzing the occurrence of particular chemical substructures in the given compound. Developed by Klekota and Roth 
MACCS fingerprint  MACCSFP  166  Fingerprint using the MACCS keys in its bits definition 
Pubchem fingerprint  PubchemFP  881  Substructure fingerprint with bits divided into several sections: hierarchic element counts, rings, simple atom pairs, simple atom nearest neighbours, detailed atom neighbourhoods, simple SMART patterns, complex SMART patterns 
Substructure fingerprint  SubFP  308  Substructure fingerprint based on the SMART patterns developed by Christian Laggner 

default SVM parameters used in the WEKA package (\(C = 1, \gamma = \tfrac{1}{d}\))—libSVM.

default SVM parameters from the SVMlight library (\(C = \frac{1}{\mathrm {\underset{i}{mean}} \Vert x_{i}\Vert ^2},\; \gamma = \tfrac{1}{d}\)).

grid search optimization of SVM parameters—\(\log _{10}(C) \in [2, 5]\), \(\log _{10}(\gamma ) \in [10, 3]\).

SVM parameters optimization in the truncated crossvalidation mode (‘small grid’cv).

SVM parameters optimization in the random crossvalidation mode—number of iterations: up to 150.

Bayesian optimization using BayesOpt [36]—number of iterations: up to 150.
The range of C and \(\gamma\) values tested was as follows: \(\log _{10}(C) \in [2, 5]\), \(\log _{10}(\gamma ) \in [10, 3]\) (the result of preliminary grid search experiments). The number of iterations in which random search, 'small grid'cv and Bayesian optimization experiments were performed fell within the following set: 20, 30, 50, 75, 100, 150.
Conclusions
The paper presents strengths of Bayesian optimization applied for fitting SVM hyperparameters in cheminformatics tasks. Because the importance and necessity of the SVM optimization procedure is undeniable, various approaches to this task have neen developed so far. However, the most popular approaches to SVM optimization are not always very effective, in terms of both the predictive power of the models obtained and the computational requirements. This study demonstrated that Bayesian optimization not only provides better classification accuracy than the other optimization approaches tested but is also much faster and directed—in the majority of cases, the number of iterations required to achieve optimal performance was the lowest out of the all methods tested, and the set of parameters tested provided the best predictions on average. Interestingly, if good classification results are desired to be obtained quickly (using a low number of iterations and without complex algorithms), the random search method in which hyperparameters are randomly selected from a predefined range) leads to very good performance of the SVM for predicting the activity of compounds and can thus be used when Bayesian optimization approach is not feasible.
 1.
If you have no resources for performing hyperparameters optimization, use \(C=1, \gamma = \frac{1}{d}\) (as defined in libSVM).
 2.
If you have limited resources (up to 20 learning procedures) or limited access to complex optimization software, use a random search for C and \(\gamma\) with distribution defined in the “Methods” section.
 3.
If you have resources for 20 or more training runs and access to Bayesian optimization software^{a}, use a Bayesian optimization of \(C, \gamma\).
In general, there is no scenario in which one should use a grid search approach (it is always preferable to use random search or a Bayesian method) or SVMlight heuristics (it is always better to use libSVM) in the tasks connected with the assessment of compounds bioactivity.
Methods
The abovementioned issue can be viewed as a sequential decision making problem [37] in which at time step i a decision based on all previous points \(\alpha _i(\varvec{\lambda }_{1:i1}, \bar{f}_{1:i1})\), where \(\bar{f}_i = f(x_i) + \varepsilon _i\) is made. In other words, we have access to approximations of f values from previous steps. For simplicity, assume that \(\varepsilon _i = 0\) (f is deterministic); however, in general, all methods considered can be used in a stochastic scenario (for example, when randomized crossvalidation is used as underlying method for f evaluation).
Approximation of generalization capabilities
Random optimization
First, let us define a random optimization technique as a strategy \(\alpha ^\mathcal {R}(\varvec{\lambda }_{1:i1}, \bar{f}_{1:i1}) = \alpha ^\mathcal {R}_i = \varvec{\lambda }_i \sim \varvec{P}(\mathcal {L})\), for some probability distribution over the hyperparameters \(\varvec{P}(\mathcal {L})\). In other words, in each iteration, we sample from \(\varvec{P}(\mathcal {L})\), ignoring all previous samples and their results. Finally, we return the maximum of the values obtained.
It is easily seen that a random search, under the assumption that \(\forall _{\varvec{\lambda }\in \mathcal {L}} \varvec{P}(\alpha ^\mathcal {R}_i = \varvec{\lambda }) > 0\), has a property described in (1). A random search will converge to the optimum [39], if only each set of parameters is possible to generate when taking new sample from our decision making process. In practise, it is only necessary that \(\varvec{P}(f(\alpha ^\mathcal {R}_i) = f(\hat{\varvec{\lambda }})) > 0\). Similarly, if one uses a grid search approach that discretizes \(\mathcal {L}\), then given enough iterations and the assumption that f is continuous, one will converge to the optimal solution. It is important to note that the speed of such a convergence can be extremely low.
Grid search
We put the linear order of \(\varvec{\lambda }_{ij}\) by raveling the resulting matrix by column, which is the most common practice in most ML libraries. It is worth noting that one could achieve better scores by alternating this ordering to any random permutation; however, in practice, such alternation is rarely performed.
Bayesian optimization
If the exact form of f is known (for example, if f is convex and its derivative is known), then the optimization procedure would be much simpler. Unfortunately, f is a blackbox function wih a very complex structure, expensive even to evaluate. However, some simplifying assumptions for f might make a problem solvable. Assume that f can be represented as a sample from a probability distribution over a family of functions \(f \sim \varvec{P}(f), f \in \mathcal {F}\).
Endnotes
^{a}For example BayesOpt http://rmcantin.bitbucket.org/html/
Declarations
Author’s contributions
WCz and SP performed the experiments. All the authors analyzed and discussed the results and wrote the manuscript. All authors read and approved the final version of the manuscript.
Acknowledgements
The study was partially supported by a Grant OPUS 2014/13/B/ST6/01792 financed by the Polish National Science Centre (http://www.ncn.gov.pl) and by the Statutory Funds of the Institute of Pharmacology Polish Academy of Sciences. SP and AJB participate in the European Cooperation in Science and Technology (COST) Action CM1207: GPCRLigand Interactions, Structures, and Transmembrane Signalling: an European Research Network (GLISTEN).
Compliance with ethical guidelines
Competing interests The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Hammann F, Gutmann H, Baumann U, Helma C, Drewe J (2009) Classification of cytochrome P 450 activities using machine learning methods. Mol Pharm 33(1):796–801Google Scholar
 Smusz S, Czarnecki WM, Warszycki D, Bojarski AJ (2015) Exploiting uncertainty measures in compounds activity prediction using support vector machines. Bioorganic Med Chem Lett 25(1):100–105View ArticleGoogle Scholar
 Lee JH, Lee S, Choi S (2010) In silico classification of adenosine receptor antagonists using Laplacianmodified naïve Bayesian, support vector machine, and recursive partitioning. J Mol Graph Model 28(8):883–890View ArticleGoogle Scholar
 Wang M, Yang XG, Xue Y (2008) Identifying hERG potassium channel inhibitors by machine learning methods. QSAR Comb Sci 27(8):1028–1035View ArticleGoogle Scholar
 Swersky K, Snoek J, Adams RP (2013) Multitask bayesian optimization. In: Advances in neural information processing systems, vol 26. Lake Tahoe, pp 2004–2012 Google Scholar
 Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N et al (2015) Scalable bayesian optimization using deep neural networks. arXiv preprint aXiv:1502.05700Google Scholar
 Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, vol 25. Lake Tahoe, pp 2951–2959Google Scholar
 Bergstra J, Bengio Y (2012) Random search for hyperparameter optimization. J Mach Learn Res 13(1):281–305Google Scholar
 Eggensperger K, Feurer M, Hutter F, Bergstra J, Snoek J, Hoos H et al (2013) Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In: NIPS Workshop on Bayesian Optimization in Theory and Practice. Lake Tahoe, pp 1–5Google Scholar
 Thornton C, Hutter F, Hoos HH, LeytonBrown K (2013) Autoweka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 847–855 Google Scholar
 Sencanski M, Sukalovic V, Shakib K, Soskic V, DosenMicovic L, KosticRajacic S (2014) Molecular modelling of 5HT_{2A} receptor—arylpiperazine ligands interactions. Chem Biol Drug Design 83(4):462–471View ArticleGoogle Scholar
 Millan MJ (2005) Serotonin 5HT2C receptors as a target for the treatment of depressive and anxious states: focus on novel therapeutic strategies. Therapie 60(5):441–460View ArticleGoogle Scholar
 Upton N, Chuang TT, Hunter AJ, Virley DJ (2008) 5HT6 receptor antagonists as novel cognitive enhancing agents for Alzheimer’s disease. Neurotherapeutics 5(3):458–469View ArticleGoogle Scholar
 Roberts AJ, Hedlund PB (2012) The 5HT(7) receptor in learning and memory. Hippocampus 22(4):762–771View ArticleGoogle Scholar
 Chen H, Van Duyne R, Zhang N, Kashanchi F, Zeng C (2009) A novel binding pocket of cyclindependent kinase 2. Proteins 74(1):122–132View ArticleGoogle Scholar
 Leach K, Simms J, Sexton PM, Christopoulos A (2012) Structurefunction studies of muscarinic acetylcholine receptors. In: Fryer AD, Christopoulos A, Nathanson NM (eds) Handbook of experimental pharmacology, vol 208. Springer, UK, pp 29–48Google Scholar
 Zhang F, Strand A, Robbins D, Cobb MH, Goldsmith EJ (1994) Atomic structure of the MAP kinase ERK2 at 2.3 A resolution. Nature 367(6465):704–711View ArticleGoogle Scholar
 Soreq H, Seidman S (2001) Acetylcholinesterase–new roles for an old actor. Nat Rev Neurosci 2(4):294–302View ArticleGoogle Scholar
 Hocher B (2010) Adenosine A1 receptor antagonists in clinical research and development. Kidney Int 78(5):438–445View ArticleGoogle Scholar
 Hein L (2001) The alpha 2adrenergic receptors: molecular structure and in vivo function. Z Kardiol 90(9):607–612View ArticleGoogle Scholar
 Wallukat G (2002) The betaadrenergic receptors. Herz 27(7):683–690View ArticleGoogle Scholar
 Pertwee RG (1997) Pharmacology of cannabinoid CB1 and CB2 receptors. Pharmacol Ther 4(2):129–180Google Scholar
 Quock RM (1999) The deltaopioid receptor: molecular pharmacology, signal transduction, and the determination of drug efficacy. Pharmacol Rev 51(3):503–532Google Scholar
 Rondou P, Haegeman G, Van Craenenbroeck K (2010) The dopamine D4 receptor: biochemical and signalling properties. Cell Mol Life Sci 67(12):1971–1986View ArticleGoogle Scholar
 Thurmond RL, Gelfand EW, Dunford PJ (2008) The role of histamine h1 and h4 receptors in allergic inflammation: the search for new antihistamines. Nat Rev Drug Discov 7(1):41–53View ArticleGoogle Scholar
 Passani MB, Lin JS, Hancock A, Crochet S, Blandina P (2004) The histamine H3 receptor as a novel therapeutic target for cognitive and sleep disorders. Trend Pharmacol Sci 25(12):618–625View ArticleGoogle Scholar
 Craigie R (2001) Hiv integrase, a brief overview from chemistry to therapeutics. J Biol Chem 276(26):23213–23216View ArticleGoogle Scholar
 Whitehead JP, Clark SF, Ursø B, James DE (2000) Signalling through the insulin receptor. Curr Opin Cell Biol 12(2):222–228View ArticleGoogle Scholar
 Lanier LM, Gertler FB (2000) From Abl to actin: Abl tyrosine kinase and associated proteins in growth cone motility. Curr Opin Neurobiol 10(1):80–87View ArticleGoogle Scholar
 Lee WL, Downey GP (2001) Leukocyte elastase: physiological functions and role in acute lung injury. Am J Respir Crit Care Med164(5):896–904View ArticleGoogle Scholar
 Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inform Model 35(6):1039–1045View ArticleGoogle Scholar
 Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an opensource Java library for chemo and bioinformatics. J Chem Inform Comp Sci 43(2):493–500View ArticleGoogle Scholar
 Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinform Oxf Engl 24(21):2518–2525Google Scholar
 Ewing T, Baber JC, Feher M (2006) Novel 2D fingerprints for ligandbased virtual screening. J Chem Inform Model 46(6):2423–2431View ArticleGoogle Scholar
 Yap CW (2011) Padeldescriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474View ArticleGoogle Scholar
 MartinezCantin R (2014) Bayesopt: a bayesian optimization library for nonlinear optimization, experimental design and bandits. arXiv preprint arXiv:1405.7430Google Scholar
 Mockus J (1994) Application of bayesian approach to numerical methods of global and stochastic optimization. J Glob Optim 4(4):347–365View ArticleGoogle Scholar
 Bishop CM (2006) Pattern recognition and machine learning. Springer, NJGoogle Scholar
 Auger A, Doerr B (2011) Theory of randomized search heuristics: foundations and recent developments, vol 1. World Scientific, Series on Theoretical Computer ScienceGoogle Scholar
 Mockus J, Tiesis V, Zilinskas A (1978) The application of bayesian methods for seeking the extremum. Towards Glob Optim 2(117–129):2Google Scholar
 Schonlau M, Welch WJ, Jones DR (1998) Global versus local search in constrained optimization of computer models. Lect Notes Monogr Ser 34:11–25View ArticleGoogle Scholar