Quantitative estimation of pesticide-likeness for agrochemical discovery
© Avram et al.; licensee Chemistry Central Ltd. 2014
Received: 17 April 2014
Accepted: 1 September 2014
Published: 12 September 2014
The design of chemical libraries, an early step in agrochemical discovery programs, is frequently addressed by means of qualitative physicochemical and/or topological rule-based methods. The aim of this study is to develop quantitative estimates of herbicide- (QEH), insecticide- (QEI), fungicide- (QEF), and, finally, pesticide-likeness (QEP).
In the assessment of these definitions, we relied on the concept of desirability functions.
We found a simple function, shared by the three classes of pesticides, parameterized particularly, for six, easy to compute, independent and interpretable, molecular properties: molecular weight, logP, number of hydrogen bond acceptors, number of hydrogen bond donors, number of rotatable bounds and number of aromatic rings. Subsequently, we describe the scoring of each pesticide class by the corresponding quantitative estimate. In a comparative study, we assessed the performance of the scoring functions using extensive datasets of patented pesticides.
The hereby-established quantitative assessment has the ability to rank compounds whether they fail well-established pesticide-likeness rules or not, and offer an efficient way to prioritize (class-specific) pesticides. These findings are valuable for the efficient estimation of pesticide-likeness of vast chemical libraries in the field of agrochemical discovery.
KeywordsHerbicide Insecticide Fungicide Pesticide Agrochemicals SAR databases
In the past years, the systematic identification of new lead compounds has gained increasing attention in both pharmaceutical and agrochemical industries. The progress of combinatorial chemistry (the parallel synthesis of large numbers of compounds) and high-throughput screening (the parallel testing for bioactivity of large numbers of compounds) facilitated the exploration of extensive chemical spaces for chemicals with desirable properties. In order to conduct effectively a drug/agrochemical discovery program, a screening library should contain compounds displaying reasonable properties to ease the passage to final products. Thus, in the early stages of such programs, in silico approaches are used to design chemical libraries ,. Oral bioavailability or membrane permeability have often been connected to simple molecular descriptors such as logP, molecular weight, or the counts of hydrogen bond acceptors and donors in a molecule . Hence, over the years, simple rule-based models were derived based upon physicochemical and structural property of available datasets. These qualitative approaches (also referred to as filters) retain or reject molecules depending on a set of strict threshold values for key molecular descriptors (often combined with the presence or absence of undesirable chemical groups). This provides a rapid way to select molecules showing increased likelihood to exhibit the specific property for which the filter has been designed for -.
In drug discovery, Lipinski's rule of five (Ro5) is considered to be the reference in defining physicochemical and structural properties profiles for optimal bioavailability of drug candidates . Upper limits of five basic molecular descriptors were established based upon a set of known drugs, i.e., molecular weight ≤500, octanol/water partition coefficient (hydrophobicity) ≤5, number of hydrogen bond donors ≤5 and number of hydrogen bond acceptors ≤10. Molecules that would obey these rules should exert acceptable solubility and cell permeability properties and were defined as `drug-like' . Although Ro5 is considered predictive for oral bioavailability, 16% of oral drugs violate at least one of the criteria and 6% fail two or more . Other simplified rule-based definitions of drug-likeness were established by Veber  and Ghose .
Rule-based filters for drugs and pesticides
150 – 500
150 – 500
0 - 5
2 - 12
1 – 8
To overcome the hard boundaries established by traditional filters for drug-likeness, Bickerton et al.  developed the so-called quantitative estimate of drug-likeness (QED) which combines the simplicity of rules-based methods and the ranking advantages of continuous models. The approach relies on a small number of relevant, accessible and quick to compute, molecular descriptors describing the distribution of a set of molecules. So-called desirability functions , i.e., functions that describe the distribution of the data, have been fitted for each descriptor. Hence, QED defines drug-like molecules on a continuous scale, ranging from zero (the least drug-like) to one (the most drug-like) .
We consider that the field of agrochemical discovery would benefit from a similar treatment of pesticide-likeness. Thus, in this study, we aim to establish quantitative estimates of pesticide-likeness. Three main classes of pesticides are considered herein, i.e., herbicides, insecticides and fungicides, and, accordingly, we describe the quantitative estimate of herbicide-likeness (QEH), of insecticide-likeness (QEI) and of fungicide-likeness (QEF). We found a simple type of function that accurately describes six physicochemical properties over the three pesticide classes. Furthermore, we compare the performance of this quantitative approach to well known rule-based methods defining pesticide-likeness using a large library of patented compounds for agrochemical applications and discuss the results. For practical reasons and for the purpose of this paper, we will denominate the ensemble of scoring functions dedicated to pesticide-likeness as QEPest-SFs.
Results and discussion
The assessment of a common desirability function for pesticides
We applied the concept of desirability  to provide a quantitative metric for assessing pesticide-classes-likeness and subsequently pesticide-likeness. The desirability function approach was originally proposed by Harrington  and later refined by Derringer and Suich . The approach consists of employing one/several functions to characterize the properties of several dependent variables, normalize (scale between zero and one) and combine the resulted terms using the geometric mean. Since we deal with molecular data sets, we followed the procedure of Bickerton's et al.  which derived series of desirability functions, each for a different molecular descriptor.
We denominate the resulted scoring functions as quantitative estimates of herbicide-likeness (QEH), insecticide-likeness (QEI) and fungicide-likeness (QEF), according to the pesticide class. These functions reflect the probability of a molecule to exhibit desirable characteristics as a pesticide. Thereby, we obtained an intuitive quantitative indicator of the likeness of a molecule to match the physicochemical profile of pesticides.
In order to model specific properties of large data sets, predictive models often use many descriptors limiting the applicability domains of the model. The more descriptors are used, the greater is the likelihood that a candidate molecule will fall outside the limits of one or more of these descriptors . In our approach, we limit the number of descriptors to six basic physicochemical, independent, properties, correlated with pesticide bioavailability, solubility and stability ,,,. These descriptors are included also in the formulation of QED  to define drug-likeness, and moreover, with a slight variation, i.e., count of aromatic rings – arR – replaced by count of aromatic bonds, the same properties were are encountered in Hao's  approach to identity pesticides (see Table 1).
Pesticide class scorings
The three main classes of pesticides are: herbicides (against weeds), insecticides (against harmful insect pests), and fungicides (against harmful diseases) ,,. In this section, we will describe the way the above established pesticide class-specific desirability functions relate to each other.
The recent analysis, conducted by Hao et al. , concerning the distributions of herbicides, insecticides and fungicides as described by six molecular descriptors, i.e., MW, ClogP, HBA, HBD, RB, number of aromatic bonds, indicated CLogP, HBD, and the number of aromatic bonds to be important constitutive properties to distinguish between the three classes of pesticides. Furthermore, the same study, describes RB distributions of herbicides and fungicides to be similar, with lower values compared to insecticides . We note that, for the most part, our dfs agree with previous findings, and slight variations in the distributions might be reasoned by the various datasets employed.
AgroSAR patent database
GVKBio agrochemical patents collection (AgroSAR) comprises ~ 59 k (58915) unique structures and ~ 413 k (413103) SAR end-points measured in ~110 k (109733) assays. A percentage of 38.7% of the data has been published in the seventies, 29.6% in the eighties and 28.67% in the nineties up to 2005. AgroSAR gathers herbicides, insecticides, fungicides, acaricides, nematocides, bactericides, algaecide, plant growth, biocides, microbiocides and rodenticides in a relational database, manually curated and annotated, easy to query and subset. This database comprises large amounts of unexplored patent data, which can help to improve the discovery of agrochemicals. To our knowledge, this is the only SAR patent database built specifically from patent specifications filed in the agro sector.
Pesticide sets extracted from AgroSAR
Num. of compounds
Statistics of the pesticides extracted from AgroSAR
The field of drug discovery is closely related to that of agrochemical-discovery. The development of new medicine offered by agrochemicals and vice-versa may benefit upon the similarities between agrochemical and pharmaceutical research . Similar to drugs, modern-day pesticides are optimized for low mammalian toxicity and act via a single target at nano-molar concentrations. Herbicides and fungicides were reported to generally meet the Lipinski's Ro5 criteria for drug-like compounds . This observation is strongly confirmed also by AgroSAR pesticide database: 97.29% of the herbicides and 91.55% of the fungicides pass Ro5 (with zero violation). In the case of insecticides, 73.56% of the molecules were recognized as drug-like (Table 2). We encountered similar results also for the marketed pesticide set (see Additional file 1: Table S5). As described above, insecticides exhibit a slightly different profile, compared to herbicides and fungicides, mainly consistent with increased hydrophobicity. Future explorations of these datasets can significantly contribute to improve the pesticide discovery and development programs.
Scoring AgroSAR pesticide database
In this section, we will report and discuss the capabilities of the hereby-proposed scoring functions to quantitatively define pesticide-likeness. In addition to the quantitative estimates of class-specific pesticide-likeness, we explored two data fusion rules to provide quantitative estimates of pesticide-likeness. Hence, we define QEPmax and QEPavg, as the maximum and the average, respectively, of QEH, QEI and QEF values. The two fusion rules use QEH, QEI and QEF outcomes in different manners, i.e., the `max-value'- rule reflects only the highest pesticide-class score whilst the `average-value'-rule takes into account the contribution of all pesticide classes averaging the scores. Thus, in this section we will evaluate AgroSAR pesticides by means of QEH, QEI, QEF, QEPmax and QEPavg.
In Figure 6a, we show the cumulative frequency counts of herbicides, insecticide, fungicides and pesticides plotted against the scores assigned by the corresponding quantitative estimate function, i.e., QEH - herbicides, QEI - insecticides, QEF - fungicides, QEPmax - and QEPavg - pesticides. The highest scores can be observed in the case of QEH scoring herbicides. According to the pesticide-class, half of the molecules received QEH scores ≥0.72 (herbicides), QEI scores ≥0.57 (insecticides), QEF score ≥0.6 (fungicides), QEPmax ≥0.7 and QEPavg ≥0.6 (pesticides). These results, further supported by the cutoff values corresponding to 25% and 75% of the datasets (see Additional file 1: Table S6), confirm the ability of the scoring functions to assign high scores to the equivalent pesticide-class.
In Figure 6c, we show the distribution of herbicides, insecticides and pesticides against the corresponding scoring functions values, i.e., QEH, QEI, QEPmax and QEPavg. In order to see how these scores relate to well known rule-based models we plotted, correspondingly, the frequency counts of molecules passing Tice's filters for herbicides and insecticides, and Hao's filter for pesticides. One can observe a consistent trend between higher scores and increased percentages of compounds passing rule-based filters (Figure 6c).
Simple rule-based methods that define pesticide-likeness are applied in the early stages of pesticide-discovery programs. Due to their simplicity, these methods serve to trim large chemical libraries to smaller sets, which are supplied to more computational-expensive approaches. In this sense, a challenging exercise for QEPest-SFs would be to recognize pesticides from a larger set of decoys. In consequence, ten times larger sets of randomly chosen representatives from PubChem Compounds (http://pubchem.ncbi.nlm.nih.gov/; 46.75 million molecules downloaded on December 10, 2013) were assembled for each pesticide class. Using the same six molecular properties, we computed QEH, QEI, QEF, QEPmax and QEPavg also for the decoys sets (the decoys assembled for the pesticide-classes were merged for the evaluation of QEPmax and QEPavg).
In Figure 6b, we show the ROC (receiver operating curve  – see Performance measure section in Methods) plots describing the capacity of QEH, QEI, QEF, QEPmax and QEPavg to recognize the corresponding pesticide sets. A barely increased early enrichment can be seen in the case of QEI retrieving insecticides and, in contrast, QEH retrieved more lately herbicides. The discriminative performance was numerically assessed by AUC (area under the ROC  – see Performance measure section in Methods) values as reported in Additional file 1: Table S7. With the exception of QEH (AUC > 0.7), we encountered relative poor separation capabilities. However, these functions are not meant to be as accurate as virtual screening tools but rather estimative indicators of compounds showing desirable pesticide-like physicochemical properties. Moreover, the decoys employed here were not experimentally demonstrated to not qualify as pesticides. Thus, these results must be seen in the light of the purpose and utility of the scoring functions as described above.
QEPest-SFs have the ability to rank compounds whether they fail pesticide-likeness rules or not. In consequence, different cutoffs for the scoring functions provide various levels of sensitivity and specificity. One might be tempted to find optimal cutoffs values for these scoring functions. The results of such an approach are reported in Additional file 1: Table S8 and Figure S3. However, as underlined by Bikerton et al.  in the case of QED, the usage of any threshold is discouraged as this results in qualitative outcomes, similar to rule-based approaches. A practical application of the hereby-proposed scoring functions would be to rank compounds by their scores and select the number of top ranking compounds required.
In this study, we have demonstrated that QEPest-SFs are able to rank compounds according to their herbicide-, insecticides-, fungicide- or pesticide-likeness. These scoring functions are based upon six simple molecular descriptors and a single type of function, parameterized accordingly to provide desirability scores. These quantitative assessments provide increased flexibility compared to traditional rule-based methods. For example, large chemical libraries can be reduced to desirable sizes, profiling pesticide-like molecules at various levels. In the usual pipeline of a drug and agrochemical discovery programs the resulted sets are supplied to more accurate virtual screening methods to increase cost-effectiveness in further experimental steps. For this purpose, we provide a simple Java-based program ("QEPest.jar") to compute QEH, QEI and QEF (see Additional file 2).
Marketed pesticide set
A set of 1685 pesticides (585 herbicides, 495 insecticides and 278 fungicides) was assembled from The Pesticide Manual  and Compendium of Pesticide Common Names . For standardization (structure canonicalization and transformation – see Additional file 1: Table S9) the molecules were supplied to ChemAxon's Standardizer module (JChem 6.0.0, 2013, ChemAxon, http://www.chemaxon.com). The marketed pesticide set was used to derive quantitative estimate scoring functions for herbicide-likeness (QEH), insecticide-likeness (QEI), fungicide-likeness (QEF) and overall pesticide-likeness (QEP).
Molecular descriptors were computed with ChemAxon's structure database management software Instant JChem (JChem 6.0.0, 2013, ChemAxon, http://www.chemaxon.com). Six descriptors, i.e., molecular weight (MW), molecular hydrophobicity (log of the octanol–water partition coefficient; LogP), number of hydrogen bond acceptors (HBA), number of hydrogen bond donors (HBD), rotatable bonds (RB), aromatic rings (arR) were used to derive desirability functions for QEPest-SFs. Other hydrophobicity estimation metrics such as MLogP  and ClogP  were computed with Dragon (for Windows, Software for Molecular Descriptor Calculations, version 5.5, 2007 Talete srl, http://www.talete.mi.it) and BioByte (ClogP for Windows, version 1.0.0, 1995, BioByte Corp., http://www.biobyte.com/), respectively, and were used accordingly, as required by rule-based methods (Table 1).
Distribution of data
For the assessment of the desirability functions we computed the frequency counts for each class of pesticides, according to the descriptor type-values, i.e., for continuous values (MW and LogP) the optimum bin size was computed with Web Application for Bin-width Optimization - Ver. 2.0 (http://22.214.171.124/~hideaki/res/histogram.html, accessed on Sep 21 2013) , and for discreet values (HBA, HBD, RB, arR) we used a bin-size of one (R 2.14.2) .
The frequency counts and bins computed for each molecular descriptor served as input for curve fitting processed by means of ZunZun.com Online Curve Fitting and Surface Fitting Web Site (http://zunzun.com/, accessed on Aug 6, 2013). Depending on the data to be modeled, up to 573 non-linearly, and 23 linearly equations, were fitted.
The discriminative power of QEPest-SFs was assessed graphically and numerically by means of receiver operating curve (ROC)  and the area under the ROC (AUC) . The ROC plot describes the true positive rate (TPR = sensitivity) versus the false positive rate (FPR = 1- specificity) according to the ranked list. AUC values indicate the ability of a scoring method (or prediction models, in general) to discriminate between two classes of elements, e.g., actives and inactives, and is defined by the area under the ROC. Values range from 0 to 1 (perfect separation), 0.5 suggesting a random spread of the representatives of the two classes.
SM initiated and supervised the project. SA carried out the calculations, implemented and tested the scoring functions, developed the Java-based program and prepared the manuscript. SFT and AB contributed to data preparation for model development and validation and drafted the manuscript. SRC and AKM provided the AgroSAR patent database and corresponding annotations. All authors read and approved the final manuscript.
Quantitative estimate of herbicide-likeness
Quantitative estimate of insecticide-likeness
Quantitative estimate of fungicide-likeness
Quantitative estimate of pesticide-likeness
This project was financially supported by Project 1.1 of the Institute of Chemistry Timisoara of the Romanian Academy. The authors are indebted to ChemAxon Ltd for access to JChem software and to Alan Wood (http://www.alanwood.net/pesticides/index.html) for maintaining the Compendium of Pesticide Common Names.
- Oprea TI, Davis AM, Teague SJ, Leeson PD: Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comput Sci. 2001, 41: 1308-1315. 10.1021/ci010366a.View ArticleGoogle Scholar
- Hann MM, Oprea TI: Pursuing the leadlikeness concept in pharmaceutical research. Curr Opin Chem Biol. 2004, 8: 255-263. 10.1016/j.cbpa.2004.04.003.View ArticleGoogle Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 1997, 46: 3-25. 10.1016/S0169-409X(00)00129-0.View ArticleGoogle Scholar
- Ursu O, Oprea TI: Model-free drug-likeness from fragments. J Chem Inf Model. 2010, 50: 1387-1394. 10.1021/ci100202p.View ArticleGoogle Scholar
- Oprea TI: Property distribution of drug-related chemical databases. J Comput Aided Mol Des. 2000, 14: 251-264. 10.1023/A:1008130001697.View ArticleGoogle Scholar
- Ertl P, Rohde B, Selzer P: Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem. 2000, 43: 3714-3717. 10.1021/jm000942e.View ArticleGoogle Scholar
- Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H: Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov. 2013, 12: 948-962. 10.1038/nrd4128.View ArticleGoogle Scholar
- Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL: Quantifying the chemical beauty of drugs. Nat Chem. 2012, 4: 90-98. 10.1038/nchem.1243.View ArticleGoogle Scholar
- Veber DF, Johnson SR, Cheng H-Y, Smith BR, Ward KW, Kopple KD: Molecular properties that influence the oral bioavailability of drug candidates. J Med Chem. 2002, 45: 2615-2623. 10.1021/jm020017n.View ArticleGoogle Scholar
- Ghose AK, Viswanadhan VN, Wendoloski JJ: A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1: a qualitative and quantitative characterization of known drug databases. J Comb Chem. 1999, 1: 55-68. 10.1021/cc9800071.View ArticleGoogle Scholar
- Tice CM: Selecting the right compounds for screening: does Lipinski's Rule of 5 for pharmaceuticals apply to agrochemicals?. Pest Manag Sci. 2001, 57: 3-16. 10.1002/1526-4998(200101)57:1<3::AID-PS269>3.0.CO;2-6.View ArticleGoogle Scholar
- Clarke ED, Delaney JS: Physical and molecular properties of agrochemicals: an analysis of screen inputs, hits, leads, and products. Chim Int J Chem. 2003, 57: 731-734. 10.2533/000942903777678641.View ArticleGoogle Scholar
- Clarke ED: Beyond physical properties-application of Abraham descriptors and LFER analysis in agrochemical research. Bioorg Med Chem. 2009, 17: 4153-4159. 10.1016/j.bmc.2009.02.061.View ArticleGoogle Scholar
- Hao G, Dong Q, Yang G: A comparative study on the constitutive properties of marketed pesticides. Mol Inform. 2011, 30: 614-622. 10.1002/minf.201100020.View ArticleGoogle Scholar
- Moriguchi I, Hirono S, Liu Q, Nakagome I, Matsushita Y: Simple method of calculating octanol/water partition coefficient. Chem Pharm Bull. 1992, 40: 127-130. 10.1248/cpb.40.127.View ArticleGoogle Scholar
- Leo AJ: Calculating log Poct from structures. Chem Rev. 1993, 93: 1281-1306. 10.1021/cr00020a001.View ArticleGoogle Scholar
- Harrington ECJ: The desirability function. Ind Qual Control. 1965, 21: 494-498.Google Scholar
- Derringer G, Suich R: Simultaneous optimization of several response variables. J Qual Technol. 1980, 12: 214-219.Google Scholar
- Clark RD, Waldman M: Lions and tigers and bears, oh my! Three barriers to progress in computer-aided molecular design. J Comput Aided Mol Des. 2012, 26: 29-34. 10.1007/s10822-011-9504-3.View ArticleGoogle Scholar
- Ritchie TJ, Macdonald SJF: The impact of aromatic ring count on compound developability-are too many aromatic rings a liability in drug design?. Drug Discov Today. 2009, 14: 1011-1020. 10.1016/j.drudis.2009.07.014.View ArticleGoogle Scholar
- Akamatsu M: Importance of physicochemical properties for the design of new pesticides. J Agric Food Chem. 2011, 59: 2909-2917. 10.1021/jf102525e.View ArticleGoogle Scholar
- Delaney J, Clarke E, Hughes D, Rice M: Modern agrochemical research: a missed opportunity for drug discovery?. Drug Discov Today. 2006, 11: 839-845. 10.1016/j.drudis.2006.07.002.View ArticleGoogle Scholar
- Jeschke P: The unique role of halogen substituents in the design of modern agrochemicals. Pest Manag Sci. 2010, 66: 10-27. 10.1002/ps.1829.View ArticleGoogle Scholar
- Fawcett T: An introduction to ROC analysis. Pattern Recognit Lett. 2006, 27: 861-874. 10.1016/j.patrec.2005.10.010.View ArticleGoogle Scholar
- Hanley A, Mcneil J: The meaning and use of the area under a Receiver Characteristic (ROC) curve. Radiology. 1982, 143: 29-36. 10.1148/radiology.143.1.7063747.View ArticleGoogle Scholar
- Tomlin CDS: The Pesticide Manual. 2000, The British Crop Protection Council, Farnham, UKGoogle Scholar
- Wood A: Compendium of pesticide common names. 1995–2014, [http://www.alanwood.net/pesticides/index.html]
- Shimazaki H, Shinomoto S: A method for selecting the bin size of a time histogram. Neural Comput. 2007, 19: 1503-1527. 10.1162/neco.2007.19.6.1503.View ArticleGoogle Scholar
- R: A Language and Environment for Statistical Computing. 2012, R Foundation for Statistical Computing, Vienna, Austria
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.