An investigation into pharmaceutically relevant mutagenicity data and the influence on Ames predictive potential
© McCarren et al; licensee Chemistry Central Ltd. 2011
Received: 2 September 2011
Accepted: 22 November 2011
Published: 22 November 2011
In drug discovery, a positive Ames test for bacterial mutation presents a significant hurdle to advancing a drug to clinical trials. In a previous paper, we discussed success in predicting the genotoxicity of reagent-sized aryl-amines (ArNH2), a structure frequently found in marketed drugs and in drug discovery, using quantum mechanics calculations of the energy required to generate the DNA-reactive nitrenium intermediate (ArNH:+). In this paper we approach the question of what molecular descriptors could improve these predictions and whether external data sets are appropriate for further training.
In trying to extend and improve this model beyond this quantum mechanical reaction energy, we faced considerable difficulty, which was surprising considering the long history and success of QSAR model development for this test. Other quantum mechanics descriptors were compared to this reaction energy including AM1 semi-empirical orbital energies, nitrenium formation with alternative leaving groups, nitrenium charge, and aryl-amine anion formation energy. Nitrenium formation energy, regardless of the starting species, was found to be the most useful single descriptor. External sets used in other QSAR investigations did not present the same difficulty using the same methods and descriptors. When considering all substructures rather than just aryl-amines, we also noted a significantly lower performance for the Novartis set. The performance gap between Novartis and external sets persists across different descriptors and learning methods. The profiles of the Novartis and external data are significantly different both in aryl-amines and considering all substructures. The Novartis and external data sets are easily separated in an unsupervised clustering using chemical fingerprints. The chemical differences are discussed and visualized using Kohonen Self-Organizing Maps trained on chemical fingerprints, mutagenic substructure prevalence, and molecular weight.
Despite extensive work in the area of predicting this particular toxicity, work in designing and publishing more relevant test sets for compounds relevant to drug discovery is still necessary. This work also shows that great care must be taken in using QSAR models to replace experimental evidence. When considering all substructures, a random forest model, which can inherently cover distinct neighborhoods, built on Novartis data and previously reported external data provided a suitable model.
In the field of drug-discovery, a positive Ames test can halt development of a particular chemotype and possibly work on an entire drug target because genotoxicity of a potential therapeutic would be a serious issue that needs to be avoided. Sufficiently nuanced rules do not exist to fix such a problem while maintaining the careful balance of potency and properties. Compounding this problem is that impurities or metabolites that could be generated in parts-per-million quantities (10 μg/day) are just as serious from a regulatory standpoint, which could eliminate an essential core structure. Thus, prediction of whether a starting material, degradation product, or drug will be mutagenic in the Ames genotoxicity test is our primary goal. More specifically, our initial focus was on aryl-amines, which are commonly used reagent building blocks in many small molecule drug-discovery projects and appear as a substructure in at least 13% of currently marketed drugs . Aryl-amines also have a known mechanism for genotoxicity. In a previous article, we have shown that an in silico assessment of aryl-amines using quantum mechanics reaction energy calculations can provide excellent detection of mutagenic aryl-amines . However, we were surprised that statistical models incorporating additional descriptors did not improve the performance of the single nitrenium formation energy parameter given the wealth of QSAR literature showing accuracy approaching or exceeding the known experimental error. Additionally, we found that the set of Novartis aryl-amines was surprisingly challenging to model compared to those in the literature.
Our ultimate goal is to provide medicinal chemists with usable models to improve the chances of avoiding a toxicity trap that is often visible only after low-throughput tests come back. The aryl-amines can be predicted reliably with the nitrenium formation energy calculation but comparing all-substructure external Ames results to our Novartis results, we found that these were also much harder. Other groups in pharmaceutical companies have noted difficulties in predicting mutagenicity in aryl-amines , and in internal all-substructure data sets using commercial software [4, 5].
Previous to this article, differences between data sets typically used in the literature for building mutagenicity predictive methods and the data at pharmaceutical companies have not been compared. This is key to the disconnect from literature studies and pharmaceutical studies. The high level of performance of statistical models in this arena with constructed test sets is misleading and does not reflect performance in pharmaceutically relevant sets. Here we show the relative difficulty in predicting the Ames test result in the Novartis aryl-amines and other substructures, in contrast to literature sets.
1.2 The Significance of the Ames Test
Many compounds in the environment released from industrial pollution and production are known to cause cancer . Regulatory agencies around the world in cooperation with industry experts have adopted stringent test methods to identify and regulate the use of chemical mutagens that might be exposed to the environment or administered to humans directly as pharmaceuticals . Carcinogenicity is usually determined by an array of in-vivo and in-vitro surrogate tests, which are specified by regulatory authorities before administration to man. The Ames bacterial test is a simple experiment to perform and it is a mandatory regulatory test that has been in use for almost 40 years and correlates with life-time rodent carcinogenicity studies that require 2 years to complete [8, 9].
At the molecular level, this test for mutagenicity [10, 11] detects a substance's ability to cause mutations in engineered strains of Salmonella typhimurium by observing return of function by point mutations in an altered His operon gene. The mutations in the His operon strains prevents histidine biosynthesis, thus random mutations or mutations due to an external agent must occur for colony growth on histidine-deficient medium. Many compounds are converted to mutagenic compounds after metabolism, so the test is performed with and without pre-incubation of the compound with rat liver enzymes. The bacterial strains used in the test have been further engineered to have permeable cell membranes, a reasonably high spontaneous mutation rate, and diminished DNA repair capacity .
Although the result of the Ames test can be reported as a standardized quantity of the number of colonies formed, in most recent studies and databases, including the Novartis internal test results, are reported as categorical results: "Ames-positive" (Ames+) or "Ames-negative" (Ames-). Additionally, it has been shown that the qualitative carcinogenicity result is not improved by quantitative mutagenicity potency data . An increase in number of colonies over control by at least a factor of 2 and a clear dose dependence in the mini-Ames screening test  is classified as a positive result. Although high-throughput screening assays exist, they do not faithfully predict the result of the Ames test and at the same time require a significant investment [14, 15]. Consequently, the volume of data available for the Ames test is fairly limited. The turnaround time and cost for Ames testing makes accurate in silico models quite useful.
There are some limitations of the Ames test that present a challenge to building accurate in silico models. The exact sensitivity of the test for carcinogenicity is somewhat controversial , but in a recent retrospective analysis by FDA and EPA researchers of carcinogenicity and surrogate test results showed the Ames test is positive for 49% (275Ames+/557 rodent carcinogens) of carcinogenic compounds but only 19% of the Ames+ compounds are not carcinogenic to rats (85 Ames+/431 rodent non-carcinogens) . Reproducibility both across and inside one laboratory conducting the test is another serious issue. Both literature and internal intra-laboratory assessments of the test, at least in a 2-strain screening version of the test, have found discrepancies on the order of 15-20% . Based on a retrospective analysis of 237 compounds at Novartis with multiple Ames screen test results, this is a realistic estimate; there were 49 (21%) with discrepant results. Among aryl amines, 13 out of 57 compounds with multiple test results were discordant (23%). The test is sensitive and uses high concentrations of the test chemical, which can increase the effect of impurities including metals , degradation products, or reagents [18, 19]. The chemical can also be toxic to the bacterial system, most notably antibacterials or cytotoxic compounds, but must still be tested to the maximum possible concentration .
1.3 Substructure alert and QSAR methods
The cause of cancer through the action of chemicals has been studied extensively, and the process typically begins with the chemical, or one of its metabolites, interacting with DNA, which subsequently leads to mutations . The principle of mutagenicity through reaction of DNA with electrophiles has been especially useful in rationalizing and deriving "toxicophores," substructures that are strongly associated with mutagenicity [21, 22]. Some of these mechanisms have been studied carefully in vitro and in vivo , and DNA or protein adducts can be measured and observed experimentally [24, 25]. The first line of defense in avoiding carcinogenicity in drug design is through the use of alerts to chemicals commonly associated with carcinogenicity, mostly derived from environmental testing [22, 26]. Kazius et al. provided an analysis of mutagenicity data correlating chemical substructures to mutagenicity [21, 27, 28]. Most of these toxicophores are associated with Michael acceptors, electrophiles, or enophiles including α,β-unsaturated carbonyl systems, aziridines and epoxides, aliphatic halides, azides, and acid halides. Others such as aryl-amines and nitroaromatics are known to be converted to more reactive species through oxidation, reduction, and conjugation metabolism reactions . The simplest of prediction systems search a chemical for these substructures and uses rules to correctly predict the Ames+ compounds for known mutagenic substructures. However, despite the inclusion of detoxifying rules, these methods misclassify many of the Ames- compounds as positive. For chemical sets containing many unknown classes of mutagens, structural alert systems like DEREK correctly predict only around 50% of the Ames+ compounds [5, 30, 31]. A similar result (55% sensitivity) was found for compounds tested at Novartis (all mini-Ames screening results, August 2009) using an internally modified DEREK rule set . There is a long history of modeling mutagenicity on chemicals expected to be encountered from environmental and food exposure [9, 33–37]. Recent reviews on statistical models of mutagenicity [9, 33, 38, 39] and a recent collaborative head-to-head mutagenicity prediction challenge summarize the current state of the art for external sets . A summary of some recent models is included in the Supporting Information (Additional file 1) which provided an accuracy in test set molecules ranging from 0.73-0.85 (approaching the known error in the experiment) using a number of statistical approaches. A few studies that could be described as using a hybrid approach by identifying the most applicable out of a selection of models have also been developed with extremely good performance [40, 41].
2.1 Data set preparation
Aryl-amine mutagenicity data sets considered in this study.
Number of molecules
Number of Ames+
Number of Ames-
All compound classes
Combined C, D, E, and Fa
For the aryl-amine sets, molecules with other substructures associated with mutagenicity, such as nitroaromatic, nitrile oxide, N-nitroso substructures, were removed from the analysis. Set A was from internal Novartis Ames screening test results tested in one laboratory up to 2009. Set B is the aryl-amine subset from compilations published by Hansen et al [38, 43] and Kazius et al . All Ames screening results at Novartis excluding those with discrepant values comprised Set C. The complete set of Hansen et al. was used as Set D. Set E represents a second pharmaceutically relevant set of marketed pharmaceuticals extracted from a recent review by Brambilla and Martelli . The complete Kazius set, Set F, was included in the analysis to give a combined collection of 9423 molecules. A basic summary of the sets is shown in Table 1.
2.2 Computational studies
For all PLS and random forest models, a set of 185 2D descriptors available in the MOE software program  and a circular Morgan fingerprint [46, 47] generated with a radius of 3 bonds (ECFP6) hashed to 1024 count variables using RDKit . Quantum mechanics reaction energies for nitrenium formation from the primary amine were calculated as described in a previous publication considering conformation, tautomer, and spin state  using B3LYP hybrid density functional theory energies with a 6-31G* basis set for all C, F, H, O, S, N, and P atoms and the LANL2DZ basis set and ECP for Cl, Br, and I in Gaussian03 . The nitrenium formation energy is equal to the energy of the lowest energy amine conformation subtracted from the energy of the lowest energy nitrenium ion plus hydride anion (ArNH2 → ArNH:+ + H-). The anion formation energy and radical formation energy were similarly calculated and generated and also used the 6-31G* basis set, though improvement in the energy could be expected by adding a diffuse function for the anion. The AM1  HOMO and LUMO orbital energies were calculated using the MOPAC  module implemented in MOE. Nitrenium ion charges were determined by using the lowest energy B3LYP/6-31G* nitrenium ion conformation and calculating the NBO population analysis [52, 53] using B3LYP and a 6-311G* basis set in Gaussian03.
The random forest classification models used in this article were constructed using the randomForest package  for R  using the approach developed by Breiman [54, 56]. The method was used by constructing 500 unpruned trees using a random sample of sqrt(N) of the available predictors for each tree and a 0.632 bootstrap sample of the data for each tree. The remaining data was predicted using the tree and averaged to create the combined out-of-bag (OOB) predictions depicted in the receiver operator characteristic (ROC) plots.
The PLS classifications were done using a PLS regression implemented using the kernel algorithm available in the PLS  package in R . Variables showing little variance among cases were removed using the nearZeroVar function in the caret  package and all variables were centered by the mean and divided by the standard deviation using the preProcess function in the caret package. The response variable was 0 for Ames- or 1 for Ames+ in these models and the predicted value found from the regression was used as a cutoff in constructing a classification model. All ROC plots and area-under-the-curve (AUC) metrics used the ROCR package  in R. Averaging of model performances in the ROC plots was done with vertical averaging of performance at a given false-positive rate, and error bars give the standard deviation. A random sample of 70% of the data was used for training and the process was repeated 100 times representing in part how small batches of Ames results might perform. Variables with zero variance were removed prior to training thus removing 906 variables for the Novartis set and 956 for Set B, and variables were mean-centered and variance-scaled at each training step.
The aryl-amine data sets were constructed as previously described . The all-substructure sets were combined using Pipeline Pilot  ignoring chirality due to a lack of chirality in our 2D descriptors and after generating a canonical tautomer. It is also worth noting that absolute chirality determination cannot be done for all compounds and inevitable data entry errors can make this another source of error. A consensus Ames result was used in these all-substructure data with the definition that any Ames+ result in any of the sources was an Ames+ result. Substructure counts were calculated using a Pipeline Pilot  protocol with substructure queries that were able to closely reproduce the counts generated in the work of Kazius et al.  for their data set (see Additional file 1, Figure S1 and Figure S2 for queries and comparison to this reference). Molecules with molecular weight greater than 700 g/mol were excluded from analysis, which were more than 1.5x outside the interquartile region (IQR). The queries used are provided as Additional file 2. The TOPKAT Ames mutagenicity classification model in the Accelrys ADMET component collection in Pipeline Pilot was used for commercial model predictions. All public data (Sets B, D, E, and F) are provided as a merged sd file as Additional file 3.
The Self-Organizing Map  for the combined all-substructure set was generated in Schrodinger Canvas version 1.4  with a 30 cell by 30 hexagonal cell output grid. The program uses Euclidean distance to measure similarity between compounds, and the internal Morgan-type circular fingerprints [47, 63] generated with radius 2 and functional atom types were used as descriptors (ECFP4). The TopKat mutagenicity prediction was centered and scaled to give results from 0 to 1 and the random forest model provided probabilities between 0 and 1 for the Ames+ class. The deviation was then the difference between either 1 for an experimentally Ames+ or 0 for Ames- result and the model output. For the aryl-amine set, the 'kohonen' package  in R was used instead due to a discovered problem in Canvas with applying trained maps to new compounds. In this case, RDKit was used to generate circular Morgan fingerprints hashed to 1024 count variables as described for the statistical modeling.
3. Results and Discussion
In the following results, the differences in the sets are examined in terms of their properties, presence of previously identified mutagenic substructures, and structural similarity and clustering visualized using Kohonen self-organized maps. The difference in predictivity of multiple statistical methods and descriptors between pharmaceutically relevant data and literature compilations is analyzed firstly for aryl-amines and then for sets containing all substructures. For aryl-amines, the quantum mechanically derived reaction energy for forming a known reactive intermediate was shown to be a more stable and accurate predictor than statistical models with more descriptors.
3.1 Comparison of molecules in external Ames data sets and pharmaceutically relevant sets
As can be seen in Table 1, the Novartis Set A of aryl-amines and Set C of all substructures tested have a low number of Ames+ compounds compared to their literature counterparts (Sets B and D). In aryl-amines (Set A), only 22% of the molecules are Ames+, and in the entire set of test results at Novartis (Set D), only 15% of the molecules are Ames+. This low percentage is quite similar to other recent reports on Ames results at other pharmaceutical companies such as the recent report from Hillebrecht et al. from Roche  where 300/2335 = 13% of the internal compounds were Ames+. A paper by Leach et al.  on aryl-amines from AstraZeneca had a slightly higher percentage (109/312 = 35%) of Ames+ aryl-amines less than 250 g/mol. However, in the literature sets (Sets B and D): 71% of aryl-amines and 54% of the entire chemical space are Ames+. Perhaps surprisingly, the marketed pharmaceutical set, set E, has a non-zero incidence of Ames+ test results but it is fairly low-around 12%. An Ames+ test result is only part of a potential drug's profile but the risk of carcinogenicity in later stage animal testing and added regulatory scrutiny present a significant hurdle to drug development in a competitive space.
For the aryl-amine sets, Sets A and B, the situation is similar: the Novartis set, Set A, has a higher average molecular weight but there is an even distribution of weights from about 150-500 g/mol. In Set B, there are only 3 aryl-amine data in the range of 400-600 g/mol. In Set A, there are 93, which is almost 30% of the set. The fact that there is such an even distribution, including a large fraction of lower molecular weight compounds, in the Novartis set may reflect the importance of this class and the response to the issue of genotoxicity. When an issue is identified, the typical medicinal chemistry approach is to synthesize dozens of molecules and test all of them. Building blocks that are components of larger molecules are often tested in case of trace genotoxic impurities and for internal guidelines are tested if used for a final clinical candidate. Also drugs for different disease areas such as neuroscience may require smaller molecules.
Prevalence of the Kazius toxicophores in Sets C and D.
# of molecules (sequential filter)
% of total Set
In Figure 3, we also show where commercial aryl-amines that have been calculated by our model lie in the map. A significant population exists near CF3-substituted anilines in the top right, which have historically been Ames- (2nd plot) and have higher nitrenium formation energies. The top left of the map contains mostly larger and more polar aryl-amines, which were purposely left out of the calculations because of the goal of identifying safer starting materials and the better performance of the predictor for lower molecular weight aryl-amines. The center-right area of the map is where a large proportion of the commercially available aryl-amines are located avoiding some of the larger polyaromatic and triphenyl systems. It is also an area that has cells that contain Ames+ and Ames- amines. The nitrenium formation energy predictor can clarify which compounds in this area are safer bets as discussed in the next section.
3.2 Predicting aryl-amine mutagenicity using quantum mechanically derived descriptors alone
Correlation matrix of quantum mechanics parameters for Set A and the single-parameter ROC AUC performance.
1. NitFormE (Eq. 1)
2. NitFormE (Eq. 2)
3. NitFormE (Eq. 3)
4. RadicalFormE (Eq. 4)
5. Anion Form. E. (Eq. 5)
6. AM1 HOMO E
7. AM1 LUMO E
8. HOMO-LUMO E
9. Nitrenium Charge
10. Expermental Ames (Ames+ = 1)
The best single parameters include the nitrenium formation energy reactions and the AM1 HOMO orbital energy. The Ames+ compounds have values tightly clustered in these two parameters as shown in the beanplots in Figure 5 and show the expected relationship to the barrier of forming the nitrenium intermediate. Larger HOMO orbital energies of the amine and lower reaction energies for forming the reactive nitrenium ion would make it easier to form the intermediate. Ames+ amines tend to have a more negative charge on the nitrenium nitrogen, which has been presented previously , but the relationship is clearly not as strong. As suggested in a recent article , we looked at the anion formation energy (Equation 5) and though on its own it has little discrimination as shown in Figure 5 and its AUC in Table 3, it appears to provide a useful complement to the nitrenium formation energy. Higher sensitivity at equivalent false-positive ratios in the 80-87% sensitivity region of the ROC curve (Additional file 1, Figure S3) were possible. A PLS model using all of the quantum mechanical descriptors showed a large loading value in the first component for nitrenium formation energies and the anion formation energy had the largest loading value in the second component. The starting geometries for the anions can be generated using the same procedure for generating the nitrenium ions from the B3LYP-optimized aryl-amines.
3.3 Predicting aryl-amine Ames test results using multivariate statistics
Multi-dimensional statistical models improving upon the performance of the nitrenium formation energy parameter alone were difficult to construct. A number of available approaches including k-Nearest-Neighbors (kNN), random forest, partial least squares (PLS), support vector machines (SVM), and PLS with discriminant analysis provided similar performance for Set A. A comparison of these methods and other approaches to modeling Ames toxicity when all mutagens are included have already been presented in other studies . We have chosen to focus on PLS and random forest analysis of the aryl-amine data for further discussion because of the interpretability of PLS, the ability to include a large number of correlated variables, and the straightforward assessment of the importance of variables.
Performance of prediction methods.
Set A, Training
Set A, Test
Set B, Training
Set B, Test
PLS with NitFormE
0.76 ± 0.02
0.63 ± 0.08
0.80 ± 0.02
0.78 ± 0.05
0.88 ± 0.03
0.56 ± 0.09
0.83 ± 0.01
0.80 ± 0.04
PLS without NitFormE
0.76 ± 0.02
0.62 ± 0.08
0.80 ± 0.02
0.79 ± 0.05
0.89 ± 0.02
0.54 ± 0.09
0.84 ± 0.01
0.80 ± 0.04
Nitrenium Formation Energy Alone
0.72 ± 0.04
0.71 ± 0.09
0.78 ± 0.02
0.77 ± 0.05
Random Forest with NitFormE
0.62 ± 0.01
0.855 ± 0.003
Random Forest without Nitrenium Formation Energy
0.61 ± 0.01
0.851 ± 0.003
For both Set A and Set B, the multiple-variable PLS models offered an improved prediction over using nitrenium formation energy alone (dashed line) in the training set but not in the test set. The performance of the Set A PLS model on the test set was much worse on average than using this single parameter. The model in Set B was slightly better but unfortunately, most of the performance increase over the nitrenium formation energy (0.03 in AUC) was in a low-sensitivity region of the ROC curve (< 50% true positive rate). The prediction in this range was not considered to be useful for excluding Ames+ fragments. These results are frustrating but provoked thought about why the molecules commonly used in the literature are different and easier to model.
Selected variables important in statistical models of Set A and Set B.
Set A PC1
Mean Decrease in Gini
Set B PC1
Set B PC2
Mean Decrease in Gini
The first principal component included the nitrenium formation energy and other descriptors relating to electrostatics, hydrophobicity, and indirect properties such as the number of atoms. The variable chi0v_C (0χC),  is a valence-modified carbon atom connectivity index which depends on the number of carbon atoms in the structure and how many non-hydrogen atoms are connected to them. a_count and a_nH are simply the number of atoms and hydrogens respectively. GCUT_SLOGP calculates log P based on atomic contributions and a modified graph distance , while BCUT_SMR calculates the molar refractivity based on atomic contributions and bond order [75–77]. Q_VSA_POS is the sum of atomic contributions to van der Waals surface area where the sum of partial charges of the atoms are positive , and density is the molecular weight divided by total van der Waals volume.
The most interpretable variables in the second component for Set B related to flexibility and included the number of rotatable bonds and number of rotatable single-bonds, b_rotN and b_1rotN respectively, the Kier flexibility parameter , abbreviated here as KierFlex. The number of oxygen atoms and a fingerprint bit associated with an aryl-amine substructure was also significant.
Performance of PLS and random forest models on the aryl-amine sets.
Set A, Training
Set A, Test
Set B, Training
Set B, Test
PLS with NitFormE
0.76 ± 0.02
0.63 ± 0.08
0.80 ± 0.02
0.78 ± 0.05
PLS with 9 descriptors
0.68 ± 0.03
0.66 ± 0.08
0.77 ± 0.02
0.76 ± 0.04
Random Forest With NitFormE
0.62 ± 0.01
0.855 ± 0.003
Random Forest With 9 descriptors
0.682 ± 0.008
0.844 ± 0.004
Nitrenium Formation Energy
0.72 ± 0.04
0.71 ± 0.09
0.78 ± 0.02
0.77 ± 0.05
3.4 Cross set performance-training with Set A and testing Set B and vice versa
Performance for PLS and random forest models for the other set.
Trained on 100% Set A
PLS 1-component, 9 descriptors
Trained on 100% Set B
PLS 1-component all descriptors
PLS 1-component 9 descriptors
3.5 Performance of a commercial model on aryl-amine data-TOPKAT
3.6 Modeling Ames test results for all substructures
Given the difficulty of addressing aryl-amines, we began to search for reasons the set would be more difficult and if the result would be true for more than just this subspace. Literature reports have provided excellent results for benchmark sets containing all mutagens and small collections of aryl-amines or nitroaromatics. Even better performance could be obtained using multiple models based on the applicability domain of a mutagen under consideration such as Sushko et al.  for multiple substructures and Leong et al.  for just the aryl-amine substructure. Though surveys of the poor performance of pre-built commercial model performance on proprietary sets has been presented, reports on models of large proprietary sets and delineation of substructure seemed to be lacking. A classification model given a collection of distinct features strongly associated with mutagenicity would be expected to perform better than a model missing such clear-cut mutagenic features such as nitroaromatics mentioned previously.
Performance of models on subsets of the compiled all-substructure set.
Local (remaining data)
Not polyaromatic, ArNH2, ArNO2
Not polyaromatic, ArNH2, ArNO2
Kazius et al.
Aryl-amines (not nitroaromatic or polyaromatic)
Polyaromatic (not nitroaromatic)
In this article we have shown that there are significant differences in the physicochemical and biological properties of compounds used in drug discovery and those in compiled Ames test results from the literature. This includes molecular weight, substructure distribution, and the percentage of mutagenic compounds in the data set. This is important to communicate, as much of the literature data is being used to test prediction methods as well as playing a role in current testing strategy debates. The compounds in the Novartis test results are mostly drug precursor molecules, while literature mutagenicity results are often petrochemicals and pesticides of primary concern as environmental pollutants. The size and complexity of the molecules tested at Novartis was significantly larger on average than that of molecules included in external sets, as visualized by distributions in molecular weight. Chemical functional groups or substructures that have a high association with mutagenicity determined from the literature data are largely absent in the Novartis set taking away a valuable discrimination feature. Additionally, the proportion of mutagens in external sets is higher and disturbingly close to 50% as might occur from successive culling for balanced model development. As a result of these factors, many drug discovery molecules are outside the applicability domain of pre-built commercial models. The data is also more difficult due to lack of strongly associated structural features and would lead to worse performance of these statistical models if they were included in the training set. Therefore these models cannot provide adequate performance to predict, let alone, avoid a positive Ames test. The Ames test, as well as other genotoxicity tests, continue to be a significant problem in drug discovery, and companies should work together to share data with the wider community of scientists and organizations. The best-validated and best-performing prediction available for low molecular weight aryl-amines is still a quantum-mechanics reaction energy representing the formation of the nitrenium ion. Effective predictive models could be built for all-substructure sets using the random forest methodology and commonly available 2D descriptors and chemical fingerprints. Performance was still significantly lower for molecules from Novartis and marketed pharmaceuticals. Despite extensive work in the area of predicting this particular toxicity, work in designing more difficult test sets and more adaptable models is still necessary.
positive Ames test result
negative Ames test result
partial least squares
receiver operator characteristic
number of points
nitrenium formation energy
highest occupied molecular orbital
lowest occupied molecular orbital
We would like to thank the devoted help and computational resources provided by the NIBR IT Scientific Computing group and especially Steve Litster, Michael Derby. P.M. is a NIBR postdoctoral fellow and thanks the NIBR Education Office for funding.
- Mirza A, Desai R, Reynisson J: Known drug space as a metric in exploring the boundaries of drug-like chemical space. Eur J Med Chem. 2009, 44: 5006-5011.View ArticleGoogle Scholar
- McCarren P, Bebernitz GR, Gedeck P, Glowienke S, Grondine MS, Kirman LC, Klickstein J, Schuster HF, Whitehead L: Avoidance of the Ames test liability for arylamines via computation. Bioorg & Med Chem. 2011, 19: 3173-3182.View ArticleGoogle Scholar
- Leach AG, Cann R, Tomasi S: Reaction energies computed with density functional theory correspond with a whole organism effect; modelling the Ames test for mutagenicity. Chem Commun. 2009, 1094-1096.Google Scholar
- Hillebrecht A, Muster W, Brigo A, Kansy M, Weiser T, Singer T: Comparative Evaluation of in Silico Systems for Ames Test Mutagenicity Prediction: Scope and Limitations. Chem Res Toxicol. 2011, 24: 843-854.View ArticleGoogle Scholar
- Naven RT, Louise-May S, Greene N: The computational prediction of genotoxicity. Epxert Opin Drug Metab Toxicol. 2010, 6: 797-807.View ArticleGoogle Scholar
- Waldron HA: A brief history of scrotal cancer. Br J Ind Med. 1983, 40: 390-401.Google Scholar
- S2(R1) Guidance on genotoxicity testing and data interpretation for pharmaceuticals intended for human use. 2008, Geneva, Switzerland: ICH, 28-
- Fetterman BA, Kim BS, Margolin BH, Schildcrout JS, Smith MG, Wagner SM, Zeiger E: Predicting rodent carcinogenicity from mutagenic potency measured in the Ames Salmonella assay. Environ Mol Mutagen. 1997, 29: 312-322.View ArticleGoogle Scholar
- Benigni R, Bossa C, Tcheremenskaia O, Giuliani A: Alternatives to the carcinogenicity bioassay: in silico methods, and the in vitro and in vivo mutagenicity assays. Epxert Opin Drug Metab Toxic. 2010, 6: 809-819.View ArticleGoogle Scholar
- Mortelmans K, Zeiger E: The Ames Salmonella/microsome mutagenicity assay. Mutat Res-Fundam Mol Mech Mutag. 2000, 455: 29-60.View ArticleGoogle Scholar
- McCann J, Choi E, Yamasaki E, Ames BN: Detection of carcinogens as mutagens in the Salmonella/microsome test: assay of 300 chemicals. Proc Natl Acad Sci. 1975, USA, 72: 5135-5139.Google Scholar
- Mortelmans K: Isolation of plasmid pKM101 in the Stocker laboratory. Mutat Res-Rev Mutat. 2006, 612: 151-164.View ArticleGoogle Scholar
- Diehl MS, Willaby SL, Snyder RD: Comparison of the Results of a Modifed Miniscreen and the Standard Bacterial Reverse Mutation Assays. Environ Mol Mutagen. 2000, 35: 72-77.View ArticleGoogle Scholar
- Knight AW, Little S, Houck K, Dix D, Judson R, Richard A, McCarroll N, Akerman G, Yang C, Birrell L, Walmsley RM: Evaluation of high-throughput genotoxicity assays used in profiling the US EPA ToxCast (TM) chemicals. Regul Toxicol Pharm. 2009, 55: 188-199.View ArticleGoogle Scholar
- Westerink WMA, Stevenson JCR, Lauwers A, Griffioen G, Horbach GJ, Schoonen WGEJ: Evaluation of the Vitotox(TM) and RadarScreen assays for the rapid assessment of genotoxicity in the early research phase of drug development. Mutagenesis Mutat Res-Genet Toxicol Environ Mutag. 2009, 676: 113-130.View ArticleGoogle Scholar
- Matthews EJ, Kruhlak NL, Cimino MC, Benz RD, Contrera JF: An analysis of genetic toxicity, reproductive and developmental toxicity, and carcinogenicity data: I. Identification of carcinogens using surrogate endpoints. Regul Toxicol Pharm. 2006, 44: 83-96.View ArticleGoogle Scholar
- Hansen K, Stern RM: A survey of metal-induced Mutagenicity in vitro and in vivo. Toxicol Environ Chem. 1984, 9: 87-91.View ArticleGoogle Scholar
- Kenyon MO, Cheung JR, Dobo KL, Ku WW: An evaluation of the sensitivity of the Ames assay to discern low-level mutagenic impurities. Regul Toxicol Pharm. 2007, 48: 75-86.View ArticleGoogle Scholar
- Looker AR, Ryan MP, Neubert-Langille BJ, Naji R: Risk Assessment of Potentially Genotoxic Impurities within the Framework of Quality by Design. Org Process Res Dev. 2010, 14: 1032-1036.View ArticleGoogle Scholar
- Loeb LA, Harris CC: Advances in chemical carcinogenesis: a historical review and prospective. Cancer Res. 2008, 68: 6863-6872.View ArticleGoogle Scholar
- Kazius J, McGuire R, Bursi R: Derivation and Validation of Toxicophores for Mutagenicity Prediction. J Med Chem. 2005, 48: 312-320.View ArticleGoogle Scholar
- Sanderson DM, Earnshaw CG: Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System. Hum Exp Toxicol. 1991, 10: 261-273.View ArticleGoogle Scholar
- Miller JA: Carcinogenesis by chemicals: an overview-GHA Clowes memorial lecture. Cancer Res. 1970, 30: 559-Google Scholar
- Skipper PL, Kim MY, Sun HLP, Wogan GN, Tannenbaum SR: Monocyclic aromatic amines as potential human carcinogens: old is new again. Carcinogenesis. 2010, 31: 50-58.View ArticleGoogle Scholar
- Hillier SM, Marquis JC, Zayas B, Wishnok JS, Liberman RG, Skipper PL, Tannenbaum SR, Essigmann JM, Croy RG: DNA adducts formed by a novel antitumor agent 11β-dichloro in vitro and in vivo. Mol Cancer Ther . 2006, 5: 977-984.View ArticleGoogle Scholar
- Ridings JE, Barratt MD, Cary R, Earnshaw CG, Eggington CE, Ellis MK, Judson PN, Langowski JJ, Marchant CA, Payne MP, Watson WP, Yih TD: Computer prediction of possible toxic action from chemical structure: an update on the DEREK system. Toxicology. 1996, 106: 267-279.View ArticleGoogle Scholar
- Ashby J: Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. Environ Mutagen. 1985, 7: 919-921.View ArticleGoogle Scholar
- Ashby J, Tennant RW: Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat Res. 1991, 257: 229-306.View ArticleGoogle Scholar
- Kalgutkar AS, Gardner I, Obach RS, Shaffer CL, Callegari E, Henne KR, Mutlib AE, Dalvie DK, Lee JS, Nakai Y, O'Donnell JP, Boer J, Harriman SP: A Comprehensive Listing of Bioactivation Pathways of Organic Functional Groups. Curr Drug Metab. 2005, 6: 161-225.View ArticleGoogle Scholar
- Snyder R, Pearl G, Mandakas G, Choy W, Goodsaid F, Rosenblum I: Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules. Environ Mol Mutagen. 2004, 43: 143-158.View ArticleGoogle Scholar
- Snyder R, Smith M: Computational prediction of genotoxicity: room for improvement. Drug Discov Today. 2005, 10: 1119-1124.View ArticleGoogle Scholar
- Glowienke S: S18: In silico assessment of safety concerns esp. of carcino-genic potential. Exp Toxicol Pathol. 2009, 61: 264-265.View ArticleGoogle Scholar
- Benigni R, Bossa C: Predictivity and Reliability of QSAR Models: The Case of Mutagens and Carcinogens. Toxicol Mech. 2008, 18: 137-147.View ArticleGoogle Scholar
- Debnath AK, Debnath G, Shusterman AJ, Hansch C: A QSAR investigation of the role of hydrophobicity in regulating mutagenicity in the ames test: 1. Mutagenicity of aromatic and heteroaromatic amines inSalmonella typhimurium TA98 and TA100. Environ Mol Mutagen. 1992, 19: 37-52.View ArticleGoogle Scholar
- Hatch FT, Colvin ME: Quantitative structure-activity relationships of mutagenic aromatic and heterocyclic amines. Mutat Res-Fundam Mol Mech Mutag. 1997, 376: 87-96.View ArticleGoogle Scholar
- Hansch C: Structure-activity relationships of chemical mutagens and carcinogens. Total Environ. 1991, 109-110: 17-29.View ArticleGoogle Scholar
- Zhang Q-Y, Aires-de-Sousa J: Random Forest Prediction of Mutagenicity from Empirical Physicochemical Descriptors. J Chem Inf Model. 2006, 47: 1-8.View ArticleGoogle Scholar
- Hansen K, Mika S, Schroeter T, Sutter A, ter Laak A, Steger-Hartmann T, Heinrich N, Müller K-R: Benchmark Data Set for in Silico Prediction of Ames Mutagenicity. J Chem Inf Model. 2009, 49: 2077-2081.View ArticleGoogle Scholar
- Benigni R, Bossa C, Netzeva T, Worth A: Collection and Evaluation of (Q)SAR Models for Mutagenicity and Carcinogenicity. 2007, Luxembourg: Office for Official Publications of the European Communities, 119-Google Scholar
- Sushko I, Novotarskyi S, Körner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Müller K-R, Xi L, Liu H, Yao X, Öberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz'min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV: Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J Chem Inf Model. 2010, 50: 2094-2111.View ArticleGoogle Scholar
- Leong MK, Lin S-W, Chen H-B, Tsai F-Y: Predicting Mutagenicity of Aromatic Amines by Various Machine Learning Approaches. Toxicol Sci. 2010, 116: 498-513.View ArticleGoogle Scholar
- Hardy B, Douglas N, Helma C, Rautenberg M, Jeliazkova N, Jeliazkov V, Nikolova I, Benigni R, Tcheremenskaia O, Kramer S, Girschick T, Buchwald F, Wicker J, Karwath A, Gütlein M, Maunz A, Sarimveis H, Melagraki G, Afantitis A, Sopasakis P, Gallagher D, Poroikov V, Filimonov D, Zakharov A, Lagunin A, Gloriozova T, Novikov S, Skvortsova N, Druzhilovsky D, Chawla S, Ghosh I, Ray S, Patel H, Escher S: Collaborative development of predictive toxicology applications. CheminformaticsJ Cheminf. 2010, 2: 7-View ArticleGoogle Scholar
- Benchmark Data Set for In Silico Prediction of Ames Mutagenicity. Sdf file available at [http://doc.ml.tu-berlin.de/toxbenchmark/], set 2, 6512 compounds
- Brambilla G, Martelli A: Update on genotoxicity and carcinogenicity testing of 472 marketed pharmaceuticals. Mutat Res-Rev Mutat. 2009, 681: 209-229.View ArticleGoogle Scholar
- Molecular Operating Environment. version 2009.10, Chemical Computing Group, Montreal, Canada
- Morgan HL: The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. J Chem Doc. 1965, 5: 107-113.View ArticleGoogle Scholar
- Rogers D, Hahn M: Extended-Connectivity Fingerprints. J Chem Inf Model. 2010, 50: 742-754.View ArticleGoogle Scholar
- Landrum G: RDKit. version Q3 2010, [http://sourceforge.net/projects/rdkit]
- Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Montgomery JA, Vreven T, Kudin KN, Burant JC, Millam JM, Iyengar SS, Tomasi J, Barone V, Mennucci B, Cossi M, Scalmani G, Rega N, Petersson GA, Nakatsuji H, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Klene M, Li X, Knox JE, Hratchian HP, Cross JB, Bakken V, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Ayala PY, Morokuma K, Voth GA, Salvador P, Dannenberg JJ, Zakrzewski VG, Dapprich S, Daniels AD, Strain MC, Farkas O, Malick DK, Rabuck AD, Raghavachari K, Foresman JB, Cui JVO, Baboul AG, Clifford S, Cioslowski J, Stefanov BB, Liu G, Liashenko A, Piskorz P, Komaromi I, Martin RL, Fox DJ, Keith T, Al-Laham MA, Peng CY, Nanayakkara A, Challacombe M, Gill PMW, Johnson B, Chen W, Wong MW, Gonzalez C, Pople JA: Gaussian 03. 2004, Gaussian, Inc.: Wallingford, CT, version Rev. E.01Google Scholar
- Dewar MJS, Zoebisch EG, Healy EF, Stewart JJP: Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. J Am Chem Soc. 1985, 107: 3902-3909.View ArticleGoogle Scholar
- MOPAC. 1993, Stewart JJP., version 7.1
- Foster JP, Weinhold F: Natural hybrid orbitals. J Am Chem Soc. 1980, 102: 7211-7218.View ArticleGoogle Scholar
- Reed A, Weinstock R, Weinhold F: Natural population analysis. J Chem Phys. 1985, 83: 735-746.View ArticleGoogle Scholar
- Liaw A, Wiener M: Classification and Regression by randomForest. R News. 2002, 2: 18-22.Google Scholar
- Team RDC: R: A Language and Environment for Statistical Computing. 2008, Vienna, Austria: The R Foundation for Statistical Computing; Vienna, Austria, version 2.6.2Google Scholar
- Breiman L: Random Forests. Machine Learning. 2001, 45: 5-32.View ArticleGoogle Scholar
- pls: Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR). 2007, Ron Wehrens and Bjørn-Helge Mevik, [http://mevik.net/work/software/pls.html]version 2.1-0
- caret: Classification and Regression Training. 2008, Max Kuhn. Contributions from Jed Wing, Steve Weston and Andre Williams, [http://caret.r-forge.r-project.org/Classification_and_Regression_Training.html]version 3.21
- Sing T, Sander O, Beerenwinkel N, Lengauer T: ROCR: visualizing classifier performance in R. Bioinformatics. 2005, 21: 3940-3941.View ArticleGoogle Scholar
- Pipeline Pilot. 2008, Accelrys Software, Inc.: San Diego, CA 92121, version 7.5
- Kohonen T: Self-Organizing Maps. 2001, Berlin: Springer, 3View ArticleGoogle Scholar
- Canvas. 2011, Schrodinger, LLC: New York, New York, version 1.4
- Duan J, Dixon SL, Lowrie JF, Sherman W: Analysis and comparison of 2D fingerprints: Insights into database screening performance using eight fingerprint methods. J Mol Graphics Modell. 2010, 29: 157-170.View ArticleGoogle Scholar
- Wehrens R, Buydens LMC: Self- and Super-organising Maps in R: the kohonen package. J Stat Softw. 2007, 21:Google Scholar
- Benigni R, Bossa C, Netzeva T, Rodomonte A, Tsakovska I: Mechanistic QSAR of aromatic amines: New models for discriminating between homocyclic mutagens and nonmutagens, and validation of models for carcinogens. Environ Mol Mutagen. 2007, 48: 754-771.View ArticleGoogle Scholar
- Benigni R, Passerini L, Gallo G, Giorgi F, Cotta-Ramusino M: QSAR models for discriminating between mutagenic and nonmutagenic aromatic and heteroaromatic amines. Environ Mol Mutagen. 1998, 32: 75-83.View ArticleGoogle Scholar
- Bentzien J, Hickey ER, Kemper RA, Brewer ML: An in Silico Method for Predicting Ames Activities of Primary Aromatic Amines by Calculating the Stabilities of Nitrenium Ions. J Chem Inf Model. 2010, 50: 274-297.View ArticleGoogle Scholar
- Borosky GL: Ultimate Carcinogenic Metabolites from Aromatic and Heterocyclic Aromatic Amines:A Computational Study in Relation to Their Mutagenic Potency. Chem Res Toxicol. 2007, 20: 171-180.View ArticleGoogle Scholar
- Colvin M, Seidl E, Nielsen I, Le Bui L, Hatch F: Deprotonation and hydride shifts in nitrenium and iminium forms of aminoimidazole-azaarene mutagens. Chem Biol Interact. 1997, 108: 39-66.View ArticleGoogle Scholar
- Shamovsky I, Ripa L, Börjesson L, Mee C, Nordén B, Hansen P, Hasselgren C, O'Donovan M, Sjö P: Explanation for Main Features of Structure-Genotoxicity Relationships of Aromatic Amines by Theoretical Studies of Their Activation Pathways in CYP1A2. J Am Chem Soc. 2011, 133: 16168-16185.View ArticleGoogle Scholar
- Sarkar FH, Radcliff G, Callewaert DM: Purified prostaglandin synthase activates aromatic amines to derivatives that are mutagenic to Salmonella typhimurium. Mutat Res. 1992, 282: 273-281.View ArticleGoogle Scholar
- Balaban AT: Highly discriminating distance-based topological index. Chem Phys Lett. 1982, 89: 399-404.View ArticleGoogle Scholar
- Kier LB, Hall LH: Nature of structure-activity-relationships and their relation to molecular connectivity. Eur J Med Chem. 1977, 12: 307-312.Google Scholar
- Wildman SA, Crippen GM: Prediction of Physicochemical Parameters by Atomic Contributions. Chem Inf Comput Sci. 1999, 39: 868-873.View ArticleGoogle Scholar
- Pearlman RS, Smith KM: Metric Validation and the Receptor-Relevant Subspace Concept. J Chem Inf Comput Sci. 1999, 39: 28-35.View ArticleGoogle Scholar
- Burden FR: Molecular identification number for substructure searches. J Chem Inf Comput Sci. 1989, 29: 225-227.View ArticleGoogle Scholar
- Burden FR: A chemically intuitive molecular index based on the eigenvalues of a modified adjacency matrix. Quant Struct-Act Relat. 1997, 16: 309-314.View ArticleGoogle Scholar
- Gasteiger J, Marsili M: Iterative partial equalization of orbital electronegativity: a rapid access to atomic charges. Tetrahedron. 1980, 36: 3219-3222.View ArticleGoogle Scholar
- Hall LH, Kier LB: The molecular connectivity chi indexes and kappa shape indexes in structure-property modeling. Rev Comput Chem . 1991, 2: 367-422.View ArticleGoogle Scholar
- TOPKAT. Accelrys: San Diego, CA 92121, version 6.2
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.