Druggable chemical space and enumerative combinatorics
© Yu; licensee Chemistry Central Ltd. 2013
Received: 7 January 2013
Accepted: 7 April 2013
Published: 18 April 2013
There is a growing body of literature describing the properties of marketed drugs, the concept of drug-likeness and the vastness of chemical space. In that context, enumerative combinatorics with simple atomic components may be useful in the conception and design of structurally novel compounds for expanding and enhancing high-throughput screening (HTS) libraries.
A random combination of mono- and diatomic carbon, hydrogen, nitrogen, and oxygen containing components in the absence of molecular weight constraints but with the ability to form rings affords virtual compounds that fall in bulk physicochemical space typically associated with drugs, but whose ring assemblies fall in new or under-represented areas of chemical shape space. When compared against compounds in the ChEMBL_14, MDDR, Drug Bank and Dictionary of Natural Products, the percentage of virtual compounds with a Tanimoto index of 1.0 (ECFP_4) was found to be as high as 0.21. Depending on therapeutic target, this value may be in range of what might be expected from an experimental HTS campaign in terms of a true hit rate.
Virtual compounds derived through enumerative combinatorics of simple atomic components have drug-like properties with ring assemblies that fall in new or under-represented areas of shape space. Structures derived in this manner could provide the starting point or inspiration for the design of structurally novel scaffolds in an unbiased fashion.
KeywordsHigh-throughput screening HTS Combinatorics Drug discovery Drug space Enumeration Virtual compound libraries
Contemporary small molecule drug discovery often relies on high-throughput screening (HTS) of either structurally diverse or mechanistically focused compound library sets to identify hits that have the potential for multiparameter optimization against biological targets of interest. Critical to the success of this approach is the availability of compounds in biologically relevant, druggable chemical space [1, 2]. However, the concept of what constitutes a druggable chemical lead is evolving as more synthetically challenging molecules prepared through either semi- (e.g., Yondelis®, Ixempra®) or total synthesis (e.g., Halaven®, Aplidin®) pass through the clinical development pipeline to become marketed drugs. In recent years, for example, diverted total synthesis , diversity-oriented synthesis (DOS)  and biology-oriented synthetic (BIOS)  approaches have provided compounds for biological evaluation that possess increasing levels of structural and stereochemical complexity. Non-traditional lead-like molecules now include, for example, macrocyclic derivatives, which have been described in the literature as an underexploited structural class for drug discovery . Finally, since natural products span regions of chemical space not represented by bioactive medicinal chemistry compounds , their scaffolds may serve as the inspiration for the design of structurally novel combinatorial libraries .
The goal of these efforts is to move into new or underexplored areas of chemical space with the expectation of finding activity against difficult to target protein systems with structurally novel compounds . Since the pioneering work of Lipinski and co-workers , there has been an increased focus on bulk physicochemical properties of both leads and drug candidates as well as a better appreciation for drug space  in the context of the much larger chemical space . Studies exploring the vastness of chemical space, for example, have led to the development of large virtual compound libraries through in silico combinatorial multiplexing [13, 14]. While exhaustive libraries of this type have their place in virtual screening, the development of smaller dynamic virtual libraries focused on a particular mechanistic class may also contribute toward the identification of viable chemical leads.
Drug discovery, on the other hand, involves more than just interrogating proteins to identify small molecules that bind and invoke a functional response in vitro. It is typically a process of which hit identification marks just the beginning. The subsequent hit-to-lead evaluation and the success of lead optimization to identify a clinical candidate are critically dependent on the quality of the initial screening hit. In the extreme, if one limits the process to compound libraries with known activity and only screens against biological targets of known function using known methodology, then there is little need for innovation. Furthermore, past success may not necessarily predict future success [15, 16]. Since complex interactions among protein systems may need to be modulated in the context of a disease state, new chemotypes may be required to enhance the arsenal of known scaffolds to explore new biological target engagement opportunities, and therefore drug discovery opportunities, in areas of unmet medical need.
While there are many complementary strategies to expand and enhance an existing screening library, one among several that we considered was the role of randomness in molecular assembly at the atomic level. We therefore decided to examine the importance of stochastic processes on molecular design, in particular, ring assemblies for identifying structurally novel biogenic-like scaffolds. Used in tandem with predictive in silico biological models as part of an iterative loop (i.e., enumerate, evaluate in silico, adjust enumeration conditions, repeat until convergence criteria satisfied), new structures could potentially be “evolved” in silico and used as the inspiration for structurally novel scaffold design. In this manner, virtual compounds could be generated in an unbiased fashion and filtered by in silico mechanistic models (e.g., kinases, GPCRs, PPIs, etc.) to create dynamic, structurally novel, mechanism-based screening libraries.
To explore similarities that may exist in terms of molecular scaffolds, each of the three compound sets was reduced to a collection of ring assemblies using Pipeline Pilot  where original alpha atom attachments were maintained. In this manner, 95,522 different assemblies were identified from the enumerated set. Of these, 255 (0.27%) were found to be common with ring assemblies derived from either the KDS or the DNP sets.
SAVol = 3D surface area volume
SASA = 3D solvent accessible surface area
As expected, compounds in the enumerated virtual set exhibit a high degree of saturation with a mean ratio of sp3 hybridized carbons to total number of carbon atoms equal to 0.71 ± 0.22 (mean ± SD). In contrast, compounds in the KDS set exhibit a mean ratio of 0.34 ± 0.21. A similar difference was noted with their associated ring assemblies (original alpha atoms allowed), thereby suggesting that the enumerated scaffolds may possess an enhanced level of three-dimensional character relative to the KDS compounds. Since ring assemblies in the DNP exhibited a mean ratio of 0.61 ± 0.30, the enumerated compounds more closely resemble natural products in terms of their overall percentage of sp3 hybridized carbon atoms, which is to be expected given that all but two of the fragments available for enumeration are saturated.
To increase the amount of unsaturation in the enumerated structures, an olefin was added to the fragment set. A total of 250,000 virtual compounds was generated and analyzed as described earlier. Both the bulk physicochemical properties of the enumerated molecules and the shape distribution of the corresponding ring assemblies were found to be very similar (see Additional file 1). Thus, adding unsaturation in the form of non-aromatic double bonds did not alter either the calculated bulk physicochemical properties of the virtual compounds or the shape of the corresponding ring assemblies.
Fragments used in the enumeration process
Allow heteroatom attachment
A*C = O
*C = CP
Appropriate physicochemical properties are important parameters for compounds to become drugs; they affect the route of administration, pharmacokinetics, pharmacodynamics, formulation and chemical stability of the drug substance. Thus, it is critical during lead optimization that these properties be either retained or optimized so that the compound reaches its biological target with sufficient exposure to elicit a functional response in vivo with an acceptable side effect profile. Starting with a lead compound that already falls in accepted physicochemical drug space will likely streamline the path to both in vivo proof of concept and eventually clinical candidate selection. For example, optimizing for CNS penetration involves multiple parameters that include physicochemical properties .
The work by Lipinski and co-workers reinforced the importance of physicochemical property space with regard to oral drugs, which served to highlight that intrinsic potency is just one of many parameters that must be considered during lead optimization. Yet, potency is an important factor that drives the choice of analogues to be progressed based on indices such as ligand efficiency . As pioneered by Paul Ehrlich in the early 20th century, compounds must interact with a biological target to elicit a biological response. This interaction can be explained by conventional chemical interactions and, in a deterministic sense, reduced to a set of atomic features consisting of electrostatics, hydrogen bonds, van der Waals, hydrophobic and pi stacking interactions as well as entropic considerations. Since biological targets (e.g., protein receptors, enzymes and DNA) are three-dimensional structures, the interactions require appropriate complementary placement of the pharmacophoric groups in three-dimensional space. Thus, biologically active molecules arise as a consequence of appropriate pharmacophoric feature pairing with their biological target(s). As a result, adequate representation in shape space in addition to property space will be important for a screening library to successfully provide viable chemical leads for optimization and in vivo proof of concept.
In this regard, the majority of natural product scaffolds have been shown to be absent from commercially available compounds , consistent with the shape analyses depicted in Figures 7 and 8. The directed expansion and enhancement of existing HTS screening libraries for drug discovery purposes could therefore be accomplished not only through the addition of compounds biased toward naturally occurring biogenic scaffolds (e.g., DOS or BIOS), but also through the addition of biogenic-like scaffolds derived from enumerative combinatorics of simple atomic components. When compared with compounds from the KDS and DNP sets, the enumerated compounds exhibited a virtual “hit” rate as high as 0.21% (i.e., Tanimoto index = 1.0). Depending on therapeutic target, this value may be similar to what might be expected from an experimental HTS campaign in terms of a true hit rate.
The question then arises as to whether or not the collection of molecular shapes associated with known small molecule ligands will be sufficient to support future drug discovery efforts that will likely include modulating therapeutic targets for which there are currently no known small molecule effectors. Since biologically active synthetic compounds and natural products work with known systems, it logically follows that screening libraries could be enhanced and expanded based on the concept of privileged structures . If, however, a subset of future therapeutic targets requires ligands that possess molecular shapes not represented by known drugs, then enhancing screening libraries with compounds derived through enumerative combinatorics may be appropriate. In this sense, the results of this study do not contradict the concept of DOS and BIOS , but rather complements it in terms of potential future unknown therapeutic target opportunities. Since the enumerator does not utilize chemical reaction information in the construction of virtual compounds, the resulting structures may point the way to new scaffolds in unexplored areas of molecular shape space. Thus, this approach may be complementary to those that employ known chemical reactions and common molecular building blocks to generate virtual compound libraries .
Since the molecular mass distribution of drugs is essentially identical to that of all chemicals stored in the Beilstein database, it’s tempting to speculate that randomness may indeed play a role in drug design, i.e., that drugs arise from a random sampling of existing substances defined by the rules of organic chemistry as proposed by Fialkowski and co-workers . Alternatively, the overlapping molecular weight distribution of random molecules and drug substances could be viewed as additional evidence that physicochemical space is the consequence of a stochastic process that is independent of synthetic pathways.
During drug discovery, compounds that are found to exhibit desirable biological activity are expanded upon during the hit to lead and lead optimization phases, while those that do not are either discarded or remain as singletons. Since these analogues are usually retained in the screening library, over time the collection can become enriched in chemotypes associated with past programs, which may limit the probability of success with therapeutic targets outside historical mechanistic classes. In such cases, enumerative combinatorics in combination with in silico biological models may represent one approach to enhance and expand existing screening libraries in an unbiased fashion.
This study suggests that bulk physicochemical property drug space could have arisen from enumerative combinatorics independent of synthetic pathway. Since natural products are produced by organisms through biosynthetic pathways and non-naturally occurring drug substances are generally produced through chemical reactions performed in the laboratory using different techniques and starting materials, it might be expected that the two classes of compounds differ from each other and can, in fact, be distinguished on the basis of chemical fingerprints (e.g., HOSE codes) . The enumerated molecules, however, were constructed in a stochastic manner following neither paradigm and yet are predicted to exhibit bulk physicochemical properties consistent with known biologically active agents, whether they be naturally occurring or not. Since these structures fall in new or under-represented areas of shape space, virtual compounds derived through enumerative combinatorics of simple atomic components could provide the starting point or inspiration for the design of structurally novel scaffolds in an unbiased fashion that blur the line between synthetic substances and natural products.
- Hann MM, Keserü GM: Finding the sweet spot: the role of nature and nurture in medicinal chemistry. Nat Rev Drug Discov. 2012, 11: 355-365. 10.1038/nrd3701.View ArticleGoogle Scholar
- Keserü GM, Makara GM: The influence of lead discovery strategies on the properties of drug candidates. Nat Rev Drug Discov. 2009, 8: 203-212. 10.1038/nrd2796.View ArticleGoogle Scholar
- Njardarson JT, Gaul C, Shan D, Huang X-Y, Danishefsky SJ: Discovery of potent cell migration inhibitors through total synthesis: Lessons from structure - activity studies of (+)-migrastatin. J Am Chem Soc. 2004, 126: 1038-1040. 10.1021/ja039714a.View ArticleGoogle Scholar
- Tan DS: Diversity-oriented synthesis: exploring the intersections between chemistry and biology. Nat Chem Biol. 2005, 1: 74-84. 10.1038/nchembio0705-74.View ArticleGoogle Scholar
- Wetzel S, Bon RS, Kumar K, Waldmann H: Biology-oriented synthesis. Angew Chem Int Ed. 2011, 50: 10800-10826. 10.1002/anie.201007004.View ArticleGoogle Scholar
- Driggers EM, Hale SP, Lee J, Terrett NK: The exploration of macrocycles for drug discovery – an underexploited structural class. Nat Rev Drug Discov. 2008, 7: 608-624. 10.1038/nrd2590.View ArticleGoogle Scholar
- Rosén J, Gottfries J, Muresan S, Backlund A, Oprea TI: Novel chemical space exploration via natural products. J Med Chem. 2009, 52: 1953-1962. 10.1021/jm801514w.View ArticleGoogle Scholar
- Grabowski J, Baringhaus K-H, Schneider G: Scaffold diversity of natural products: Inspiration for combinatorial library design. Nat Prod Rep. 2008, 25: 892-904. 10.1039/b715668p.View ArticleGoogle Scholar
- Dandapani S, Marcaurelle LA: Accessing new chemical space for ‘undruggable’ targets. Nat Chem Biol. 2010, 6: 861-863. 10.1038/nchembio.479.View ArticleGoogle Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 1997, 23: 3-25. 10.1016/S0169-409X(96)00423-1.View ArticleGoogle Scholar
- Lipinski C, Hopkins A: Navigating chemical space for biology and medicine. Nature. 2004, 432: 855-861. 10.1038/nature03193.View ArticleGoogle Scholar
- Kirkpatrick P, Ellis C: Chemical space. Nature. 2004, 432: 823-823. 10.1038/432823a.View ArticleGoogle Scholar
- Reymond J-L, Ruddigkeit L, Blum L, Deursen R: The enumeration of chemical space. WIREs Comput Mol Sci. 2012, 2: 717-733. 10.1002/wcms.1104.View ArticleGoogle Scholar
- Blum LC, Reymond J-L: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009, 131: 8732-8733. 10.1021/ja902302h.View ArticleGoogle Scholar
- Drewry DH, Macarron R: Enhancements of screening collections to address areas of unmet medical need: an industry perspective. Curr Opin Chem Biol. 2010, 14: 289-298. 10.1016/j.cbpa.2010.03.024.View ArticleGoogle Scholar
- Faller B, Ottaviani G, Ertl P, Berellini G, Collis A: Evolution of the physicochemical properties of marketed drugs: can history foretell the future?. Drug Discov Today. 2011, 16: 976-984. 10.1016/j.drudis.2011.07.003.View ArticleGoogle Scholar
- ChEMBL. https://www.ebi.ac.uk/chembldb/index.php accessed December 7, 2012
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011, 39 (Database issue): D1035-41-PMID: 21059682Google Scholar
- Yu M: Natural product-like virtual libraries: Recursive atom-based enumeration. J Chem Inf Model. 2011, 51: 541-557. 10.1021/ci1002087.View ArticleGoogle Scholar
- Veber DF, Johnson SR, Cheng H-Y, Smith BR, Ward KW, Kopple KD: Molecular properties that influence the oral bioavailability of drug candidates. J Med Chem. 2002, 45: 2615-2623. 10.1021/jm020017n.View ArticleGoogle Scholar
- López-Vallejo F, Adel Nefzi A, Bender A, Owen JR, Nabney IT, Houghten RA, Medina-Franco JL: Increased diversity of libraries from libraries: chemoinformatic analysis of bis-diazacyclic libraries. Chem Biol Drug Des. 2007, 70: 393-412. 10.1111/j.1747-0285.2007.00579.x.View ArticleGoogle Scholar
- Rogers D, Hahn M: Extended-connectivity fingerprints. J Chem Inf Model. 2010, 50: 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Pipeline Pilot version 8.5. 2008, San Diego, CA: Accelrys
- Sauer WHB, Schwarz MK: Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity. J Chem Inf Comput Sci. 2003, 43: 987-1003. 10.1021/ci025599w.View ArticleGoogle Scholar
- Janoo VC: Quantification of shape, angularity, and surface texture of base course materials. Special Report 98–1. 1998, US Army Corps of Engineers®Google Scholar
- GDB Databases. http://reymond.dcb.unibe.ch/gdb/home.html#gdb accessed March 19, 2013
- Peter Ertl P, Schuffenhauer A: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminf. 2009, 1: 8-10.1186/1758-2946-1-8.View ArticleGoogle Scholar
- Wager TT, Hou X, Patrick R, Verhoest PR, Villalobos A: Moving beyond rules: the development of a central nervous system multiparameter optimization (CNS MPO) approach to enable alignment of druglike properties. ACS Chem Neurosci. 2010, 1: 435-449. 10.1021/cn100008c.View ArticleGoogle Scholar
- Andrew L, Hopkins AL, Groom CR, Alex A: Ligand efficiency: A useful metric for lead selection. Drug Discov Today. 2004, 9: 430-431. 10.1016/S1359-6446(04)03069-7.View ArticleGoogle Scholar
- Hert J, Irwin JJ, Laggner C, Keiser MJ, Shoichet BK: Quantifying biogenic bias in screening libraries. Nat Chem Biol. 2009, 5: 479-483. 10.1038/nchembio.180.View ArticleGoogle Scholar
- Welsch ME, Snyder AA, Stockwell BR: Privileged Scaffolds for Library Design and Drug Discovery. Curr Opin Chem Biol. 2010, 14: 347-361. 10.1016/j.cbpa.2010.02.018.View ArticleGoogle Scholar
- Kaisera M, Wetzelb S, Kumarb K, Waldmann H: Biology-inspired synthesis of compound libraries. Cell Mol Life Sci. 2008, 65: 1186-1201. 10.1007/s00018-007-7492-1.View ArticleGoogle Scholar
- Hartenfeller M, Eberle M, Meier P, Nieto-Oberhuber C, Altmann K-H, Schneider G, Jacoby E, Renner S: Probing the bioactivity-relevant chemical space of robust reactions and common molecular building blocks. J Chem Inf Model. 2012, 52: 1167-1178. 10.1021/ci200618n.View ArticleGoogle Scholar
- Fialkowski M, Bishop KJM, Chubukov VA, Campbell CJ, Grzybowski BA: Architecture and evolution of organic chemistry. Angew Chem Int Ed. 2005, 44: 7263-7269. 10.1002/anie.200502272.View ArticleGoogle Scholar
- Ertl P, Roggo S, Schuffenhauer A: Natural product-likeness score and its application for prioritization of compound libraries. J Chem Inf Model. 2008, 48: 68-74. 10.1021/ci700286x.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.