Learning antibacterial activity against S. Aureus on the Chimiothèque Nationale dataset
© Marcou et al; licensee BioMed Central Ltd. 2010
Published: 04 May 2010
"Chimiothèque Nationale" (CN) represents a library of synthetic and natural products from various French public laboratories. Recently, experimental screening for S. Aureus antibacterial activity of the part of this library has been performed by Dr J.-M. Paris and collaborators in Ecole des Mines (Paris, France). Here, experimental results of the screening have been used to build classification models using SMF descriptors and ISIDA modeling tools (Naïve Bayes (NB) and SVM modules). The dataset consisted in a large and structurally diverse set of 4563 compounds, 62 of which having a demonstrated antibacterial activity. In NB calculations, several variations of the algorithm were used. In particular, the a priori distribution of SMF descriptors was modeled using a binomial law, Bernouilli distributions or first order logic. SVM calculations were performed with an RBF kernel which parameters (C,γ) were optimized to maximize the balanced accuracy.
Both NB and SVM models were validated using external 5-fold cross validation, repeated three times on randomized data set. Additionally fifteen Y-randomizations were performed in order to check for chance correlation. Predictive performance of the models has been assessed by combination of Precision and Recall parameters: the models with Precision > 0.1 and Recall > 0.6 correspond to over 10-fold enrichment and, therefore, were considered as acceptable.
A consensus model was applied to estimate the antibacterial activity of 122 new compounds for the CN. All predictions were done blindly.
This article is published under license to BioMed Central Ltd.