Molpher: a software framework for systematic chemical space exploration
© Hoksza et al.; licensee Chemistry Central Ltd. 2014
Received: 7 January 2014
Accepted: 17 March 2014
Published: 21 March 2014
Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology.
In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term ‘molecular morphing’, Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called ‘morphing operators’ that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines.
Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.
Chemical space is populated by all chemically meaningful and stable organic compounds [1–3]. It is an important concept in contemporary chemoinformatics research [4, 5], and its exploration leads to the discovery of either novel drugs  or new tools for chemical biology [6, 7]. It is agreed that chemical space is huge, but no accurate approximation of its size exists. Even if only drug-like molecules are taken into account, size estimates vary  between 1023  and 10100  compounds. However, smaller numbers have also been reported. For example, based on the growth of a number of organic compounds in chemical databases, Drew et al.  deduced the size of chemical space to be 3.4 × 109. By assigning all possible combinations of atomic species to the same three-dimensional geometry, Ogata et al.  estimated the size of chemical space to be between 108 and 1019. Also, by analyzing known organic substituents, the size of accessible chemical space was assessed as between 1020 and 1024 .
Such estimates have been put into context by Reymond et al., who produced all molecules that can exist up to a certain number of heavy atoms in their Chemical Universe Databases: GDB-11 [13, 14] (2.64 × 107 molecules with up to 11 heavy atoms); GDB-13  (9.7 × 108 molecules with up to 13 heavy atoms); and GDB-17  (1.7 × 1011 compounds with up to 17 heavy atoms). The GDB-17 database was then used to approximate the number of possible drug-like molecules as 1033 .
While virtual chemical space is very large, only a small fraction of it has been reported in actual chemical databases so far. For example, PubChem contains data for 49.1 million chemical compounds  and Chemical Abstracts consists of over 84.3 million organic and inorganic substances  (numbers as of 12. 3. 2014). Thus, the navigation of chemical space is a very important area of chemoinformatics research [19, 20]. Because chemical space is usually defined using various sets of descriptors , a major problem is the lack of invariance of chemical space [22, 23]. Depending on the descriptors and distance measures used , different chemical spaces show different compound distributions. Unfortunately, no generally applicable representation of invariant chemical space has yet been reported .
Approaches to chemical space navigation can be categorized by the way in which molecular structure and properties are encoded . The two main methods used are descriptor vectors and graphs. In descriptor-based chemical space, molecules are treated as multidimensional vectors consisting of molecular descriptors . To analyze such multidimensional data, dimensionality reduction mapping techniques are used , mainly Principal Component Analysis (PCA)  and/or Multidimensional Scaling (MDS) . For example, the chemical global positioning system ChemGPS  utilizes PCA to create a ‘navigation map’ in drug-like space. Another PCA-based system suitable for the comparison of chemical libraries is Delimited Reference Chemical Subspace (DRCS) [31, 32].
In graph-based chemical space, compounds are typically simplified into their molecular scaffolds . In the scaffold tree algorithm , large molecular data sets are organized into a unique tree hierarchy by iterative removal of rings from more complex scaffolds. The scaffold tree has been successfully applied to the analysis of chemical data [35, 36] and to the identification of new bioactive regions in chemical space . Other scaffold-capturing organization schemes include the molecular equivalence number classification system , the related chemotype approach  and hierarchical scaffold clustering (HierS) .
The computer-assisted de novo design of bioactive molecules is another important way of navigating chemical space. De novo design requires the assembly of candidate compounds, which are then used to search for novel structures . Because ‘combinatorial explosion’  makes it impossible to consider all theoretically conceivable compounds, this search space must first be reduced by incorporating as much chemical knowledge as possible. Two major approaches exist for assembling candidate compounds: atom-based and fragment-based. While atom-based techniques require molecules to be built atom by atom , in fragment-based approaches molecules are formed from predefined molecular building blocks . The key element in fragment-based assembly is an adaptive scheme for virtual molecular evolution . However, despite its advantages, fragment-based molecular structure evolution programs do not follow a structural continuum . Because of this, less coarse-grained approaches have been proposed. The median molecules approach [47, 48] generates structures similar to two different starting compounds by a graph-based genetic algorithm. This algorithm uses multiobjective optimization that applies a Pareto ranking scheme to the evolution of candidate solutions. Molecule Evaluator  uses crossover and mutation operators to evolve a set of molecules, the quality of which is assessed by the user. Bishop et al. developed  an approach that uses chemical reactions as structural mutations. van Duersen et al.  represented chemical space as a graph with molecules as nodes and structural mutations as edges. Their SPACESHIP program generates a structural continuum between two molecules by the iterative application of mutation and selection cycles. Yu devised a molecular enumerator that produces a diverse set of natural-product-like  or drug-like  molecules by attaching randomly selected fragments to the molecular core. Algorithm for Chemical Space Exploration with Stochastic Search (ACSESS) combines molecular evolution and maximum diversity methods to create libraries representative of various chemical spaces .
For the systematic discovery of chemical space, we proposed a method of ‘molecular morphing’ . Our method is inspired by the morphing effect used in animation films, in which one image morphs into another through seamless transition. Similarly, a start molecule is converted into a target molecule by the application of morphing operators that correspond to simple structural changes, such as the addition or removal of an atom or a bond. If the start and target molecules belong to the same mechanistic class of compounds (i.e. they are active at the same receptor), the molecules encountered along the morphing path and within its surroundings represent a focused virtual library. Such a library provides valuable starting data for subsequent in silico experiments that aim to identify more potentially active leads. The predicted leads can be further optimized and their biological activity subsequently assessed in in vivo experiments.
Molpher is a freely-available client–server application that includes the following features: a user-friendly graphical interface; a wide range of molecular representations and similarity measures; the interactive modification of the algorithm’s parameters; the visualization and inspection of explored space; and the export of generated structures. In addition, Molpher is designed to be used as a software framework that can easily incorporate new molecular representations, similarity measures and visualization techniques.
Molpher is written in C++, is open source and can be freely downloaded (GNU Public License v3). For standard tasks, it uses Boost C++ Libraries . Chemical functionality is provided by the open source chemoinformatics toolkit RDKit , which offers a reasonable level of thread-safety, a native C++ application programming interface, and a vast number of fingerprints and similarity coefficients. Molpher leverages the computational power of modern CPUs by dividing chemical space exploration between individual CPU cores. This parallelization is implemented by employing the Intel® Thread Building Block Library (TBB) .
In this section, we briefly describe Molpher’s molecular morphing algorithm. A more detailed description of the algorithm and its parameters is given in Additional file 1.
Molpher is a client–server application where data intensive tasks, such as chemical space exploration, are delegated to the server. The client graphical user interface, developed using the Qt library , provides the only means of changing the server settings. From the client, the user can create and manage jobs, change their settings and display their results. The client–server architecture also enables the exploration process to be divided among multiple clients, any of which can be disconnected from a running job and, if necessary, later reconnected. Both client and server can reside on the same machine or be used as separate components designed to communicate over a network. The server is a command line application that listens for client connections on a specific port. Whenever new results are available, the server broadcasts them to all connected clients. There is no ‘master client’ instance with exclusive rights to control the server; all clients are equal. Any client can create jobs (exploration tasks) on the server and adjust the properties of the currently running jobs. Jobs can be password protected to prevent other clients from modifying them. In addition, the server can be run in batch mode, in which it behaves as a non-interactive program with jobs passed as command line arguments. After performing the specified jobs and storing their results, the server terminates. With its overall functionality, Molpher users can easily incorporate molecular morphing into their drug discovery pipelines.
Graphical user interface and its capabilities
In the following paragraphs, we briefly describe the graphical user interface (GUI) of the Molpher client. For a more detailed description, the reader is referred to the Molpher User manual .
The molecular morphing algorithm is controlled by parameters accessed by clicking the Choose algorithm parameters button (see Additional file 1). The two most important parameters influencing the exploration process are Fingerprint representation and Similarity coefficient. Morgan fingerprints, which belong to the family of circular fingerprints , and the Tanimoto coefficient are set as defaults because, in our experience, they offer reasonable performance in the vast majority of cases. However, many other fingerprints and similarity coefficients are implemented (see Additional file 2). Also, it is possible to limit a morph’s molecular weight (default is 500 Da) so that the algorithm does not produce overly complex molecules. From the same dialog, a prediction of the morph’s synthetic feasibility  can be switched off. When the parameters have been setup, the client’s configuration can be saved in an SNP (snapshot) file and restored as required by using the Save and Load buttons, respectively, in the Create job dialog box.
Typically, Molpher explores relatively direct paths between the start/target molecule pair, but the user may request the process to explore remoter areas of chemical space. This is done by defining additional ‘decoy molecules’ via the Create job dialog box. The presence of these decoys modifies the calculation of the morph’s distance to the target molecule. In this case, the program averages two distances: the distance of the morph to the target molecule, and the distance of the morph to the decoy molecule. In these calculations, only the distance to the closest decoy is considered.
The user can also specify molecules referred to as bookmarks in the Create job dialog box. Bookmarks can be shared between individual jobs during a client session. Bookmarked molecules can be made available for reuse in later sessions by being stored in a user defined group and exported into an SDF file. Bookmarks are most useful when a job is having difficulty in identifying the path. In such a case, it can be helpful to setup a differently configured job using the same start and target molecules. The user can then bookmark promising candidates in the original job and load them as decoys into the new job, thereby facilitating faster algorithm convergence.
A morphing process can be inspected in the Visualization pane (see Figure 3). The visualization of chemical space depends on both the molecular representation and on the visualization technique used to reduce a multidimensional space to two or three dimensions [27, 62]. Currently, two visualization techniques are implemented in Molpher: Principal Component Analysis (PCA), a linear dimension reduction method, and Kamada-Kawai, a graph-layout-based method. The visualization method is selected in the Create job dialog box and may be changed later, even for a running job. Each new iteration, the visualization is recalculated using all present morphs. Because the visualization method has such an impact on the user’s experience, Molpher’s modular architecture enables users to implement additional visualization techniques.
PCA transforms correlated variables into uncorrelated ones . The uncorrelated variables, termed principal components, are constructed as linear combinations of the original variables. The dimension of the original data can be reduced by retaining only a small number of principal components that describe the predefined amount of variability. In Molpher, PCA is used to reduce the original chemical space to two dimensions.
Morphing space can also be considered as a graph, in which each pair of nodes represents two molecules separated by a single morphing operator, which is assigned to the connecting edge. Several graph-based layout algorithms exist for the visualization of graphs in an aesthetically pleasing way . In Molpher, we implemented the force-directed-based Kamada-Kawai (KK) method . Using this method, nodes are positioned in 2D space so that the number of edge crossings is minimized, and both nodes and edges are distributed uniformly. In KK, every pair of nodes is assigned a value d ij that corresponds to the shortest path between these nodes. However, in Molpher d ij corresponds to the structural similarity between morphs. In addition, each node pair is also characterized by its Euclidean distance in 2D space. Each layout is characterized by its energy E, which is derived from the difference between the d ij and Euclidean distances of all node pairs. The KK algorithm iteratively generates a layout with the lowest value of E.
When the job is created, it is immediately run, receiving a unique ID number. The job can be checked in the Job queue (see Figure 3), in which it will appear in one of four possible statuses: Running, Live, Sleeping and Finished. Live status means that the job is waiting in a queue and is scheduled to run as soon as free resources appear. Both Running and Live jobs can be put into Sleeping mode by selecting Set parameters from the Action column of a job’s entry in the Job queue. In addition, the Action column (see Figure 3) enables the user to make on-the-fly modifications to the algorithm’s parameters, as well as to select morphing operators, fingerprints, similarity coefficients and visualization methods.
The Live tab is invoked by clicking the Live button in the Job queue for the relevant job. Visualization in this tab is refreshed after each iteration. For each job, only one Live tab can be opened at a time. The Live tab is indicated in the tab caption by the job’s ID (e.g. ‘1’).
The Detached tab is invoked by clicking the Visualization-Detach button in the Job queue for the relevant job. This displays a graphical snapshot of the job at the time at which the tab is opened. There is no limit to how many of these tabs can be detached from a particular job. The Detached tab is indicated in the tab caption by backslash followed by the job’s ID (e.g. ‘\1’).
The Adhoc tab is invoked by double-clicking a job iteration in the Iteration pane in the Job queue (see Figure 3). This tab can display multiple snapshots of iterations from different jobs at any one time. The Adhoc tab is indicated in the tab caption by the job’s ID and iteration number separated by a colon (e.g. ‘1 : 51’ means the 51st iteration of job 1).
Holding the left mouse button and dragging enables the user to select only the required area of chemical space. Morphs can be added or removed from the selection by clicking the left mouse button while holding the Ctrl key. More specific selections, such as selecting all new candidates or all inner tree leaves, can be made via the Select button from the Iteration pane. Among other things, selected morphs can be exported into an SDF file.
Clicking the middle mouse button on any node highlights the path from the start molecule to the selected morph (see Figure 5). The user can then inspect all morphs lying on this path. The user can move the whole view by holding the right mouse button and zoom it by turning the mouse wheel.
Clicking the right mouse button on any node invokes a context menu, from which the active morph can be bookmarked. In addition, molecules lying on the path between the active morph and the start molecule can be selected, and the whole path easily exported into an SDF file. The active morph can also be copied onto a clipboard as a SMILES string or as a summary formula. When Marvin suite from ChemAxon  is installed, the active morph can be opened externally in Marvin Sketch, Marvin Space or Marvin View. The last two items in the context menu enable the user to perform either an exact match search or a similarity search in the Pubchem , ZINC  or ChEMBL  databases.
The final item in the Action menu, Pubchem, enables the Pubchem database to be searched for an exact match or for neighbourhood generation. Neighbourhood molecules are depicted as yellow circles. If an exact match is found in Pubchem for any of the selected molecules, that molecule changes its shape to a rhombus.
Results and discussion
Molpher is designed to propose new candidates for biological testing by the controlled exploration of chemical space. A focused chemical library is represented by a morphing path and its surroundings. To assess the ability of Molpher to find such a path, we selected three sets of start/target molecule pairs from the PubChem database  (all structures can be found in Additional file 3). These sets differed only in terms of their similarities, which were evaluated using the PubChem fingerprint structural key  and Tanimoto similarity coefficient. Each set consisted of 20 start/target pairs. Molecules in the D1 set shared 70-80% similarity; molecules in the D2 set 50-60% similarity; and molecules in the D3 set 30-40% similarity. To test Molpher’s speed, we used a machine with 4 Intel® Xeon® E5450 3GHz processors running Windows Server 2008 R2. We restricted the experiments to a single CPU thread and limited the exploration process to 1000 iterations. To perform additional computations and further fine tune Molpher’s parameters, we also used the Czech National Grid Infrastructure, MetaCentrum. To accommodate the non-deterministic character of molecular morphing, we ran each start/target exploration five times using the default Morgan fingerprint and Tanimoto distance settings. The molecular weight of morphs was limited to 500 Daltons. Each exploration was run five times with the synthetic feasibility filter turned on and five times with it turned off.
Median number of iterations needed to generate the path
We found that the runtime of the algorithm is influenced by the following factors: the similarity of the start/target molecule pair; the settings of parameters (primarily similarity coefficient and molecular fingerprint); and the hardware on which the calculation is run. Using the default parameters (Tanimoto coefficient and circular ECFP-like Morgan fingerprint), the path was generated in 9.5 minutes on average (averaged over all start/target pairs from all datasets) using a single processor core. However, when Molpher was run on multiple cores, every path was identified within 5 minutes. Such speeds makes Molpher highly suitable for data intensive tasks.
Indeed, Molpher is undergoing continuous development, with several other new features planned for future releases. These include an enhanced graphical user interface, an improved Bayesian synthetic feasibility filter, new visualization methods, the inclusion of various drug-like, lead-like and unwanted substructure filters , and the possibility of generating morphs containing only user-defined substructures. To facilitate predictive compound design in a better way than it is currently possible via directed structural modifications, we also plan to incorporate biological activities and ADME/Tox properties into the morphing process. Furthermore, the algorithm will be modified to implement the multiobjective optimization approach , which will enable the morphing process to be driven by several properties (high activity, low toxicity, etc.) simultaneously.
We have described a molecular morphing tool, Molpher, which, to the best of our knowledge, is the first freely available implementation of the concept also known as ‘chemical space travel’ . Molecular morphing is a computational strategy for the systematic exploration of chemical space. Given a start/target molecule pair, the algorithm iteratively produces a path covering a structural continuum between them. This is done by the iterative application of simple structural changes, such as adding or removing an atom or bond. Molpher can be used via a fully-fledged desktop application or run in batch mode. Molpher’s modular architecture guarantees easy extensibility, thereby enabling the simple adoption of new features in the future. Our results show that Molpher is capable of rapidly finding a path in chemical space, even for relatively distant molecules. The molecules forming the resulting path are restricted to subspace laid out by the start/target molecule pair. Restricting the explored chemical space to those compounds amenable to chemical synthesis does not lead to a significant increase in computational demands.
In our opinion, the compounds encountered on and around a morphing path could provide valuable starting points for future in silico or in vitro experiments aimed at assessing the biological activities of such compounds. We believe that Molpher is a useful software component that could be easily incorporated into any existing computational drug design pipeline.
Availability and requirements
Project name: Molpher.
Project home page: http://siret.cz/molpher/.
Operating system(s): MS Windows (client and server), Linux (server).
Programming language: C++.
Other requirements: MS Windows client and server are provided both as pre-compiled binaries, as well as C++ source code. Linux server is provided as C++ source code only. Precompiled binaries have no other requirements. To compile Molpher, several dependencies must be satisfied. These are listed in readme files in the source code distribution.
License: GNU General Public License Version 3, 29 June 2007.
Any restrictions to use by non-academics: None.
This work has been supported by the Technology Agency of the Czech Republic, grant TA02010212, by the Czech Science Foundation grants P202/11/0968 and 14-29032P, and by the Grant Agency of of Charles University [project Nr. 154613]. The access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme "Projects of Large Infrastructure for Research, Development, and Innovations" (LM2010005) is highly appreciated.
- Bohacek RS, McMartin C, Guida WC: The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996, 16 (1): 3-50. 10.1002/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6.View ArticleGoogle Scholar
- Dobson CM: Chemical space and biology. Nature. 2004, 432 (7019): 824-828. 10.1038/nature03192.View ArticleGoogle Scholar
- Reymond JL, Ruddigkeit L, Blum L, van Deursen R: The enumeration of chemical space. Wires Comput Mol Sci. 2012, 2 (5): 717-733. 10.1002/wcms.1104.View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorga K, Meurice N: Balancing novelty with confined chemical space in modern drug discovery. Expert Opin Drug Discov. 2014, 9 (2): 151-165. 10.1517/17460441.2014.872624.View ArticleGoogle Scholar
- Nisius B, Bajorath J: Mapping of pharmacological space. Expert Opin Drug Discov. 2011, 6 (1): 1-7. 10.1517/17460441.2011.533654.View ArticleGoogle Scholar
- Stockwell BR: Exploring biology with small organic molecules. Nature. 2004, 432 (7019): 846-854. 10.1038/nature03196.View ArticleGoogle Scholar
- Schreiber SL: Small molecules: the missing link in the central dogma. Nat Chem Biol. 2005, 1 (2): 64-66. 10.1038/nchembio0705-64.View ArticleGoogle Scholar
- Polishchuk PG, Madzhidov TI, Varnek A: Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des. 2013, 27 (8): 675-679. 10.1007/s10822-013-9672-4.View ArticleGoogle Scholar
- Ertl P: Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J Chem Inform Comput Sci. 2003, 43 (2): 374-380. 10.1021/ci0255782.View ArticleGoogle Scholar
- Walters WP, Stahl MT, Murcko MA: Virtual screening - an overview. Drug Discov Today. 1998, 3 (4): 160-178. 10.1016/S1359-6446(97)01163-X.View ArticleGoogle Scholar
- Drew KL, Baiman H, Khwaounjoo P, Yu B, Reynisson J: Size estimation of chemical space: how big is it?. J Pharm Pharmacol. 2012, 64 (4): 490-495. 10.1111/j.2042-7158.2011.01424.x.View ArticleGoogle Scholar
- Ogata K, Isomura T, Yamashita H, Kubodera H: A quantitative approach to the estimation of chemical space from a given geometry by the combination of atomic species. Qsar Comb Sci. 2007, 26 (5): 596-607. 10.1002/qsar.200630037.View ArticleGoogle Scholar
- Fink T, Bruggesser H, Reymond JL: Virtual exploration of the small-molecule chemical universe below 160 Daltons. Angew Chem. 2005, 44 (10): 1504-1508. 10.1002/anie.200462457.View ArticleGoogle Scholar
- Fink T, Reymond JL: Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inform Model. 2007, 47 (2): 342-353. 10.1021/ci600423u.View ArticleGoogle Scholar
- Blum LC, Reymond JL: 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009, 131 (25): 8732-8733. 10.1021/ja902302h.View ArticleGoogle Scholar
- Ruddigkeit L, van Deursen R, Blum LC, Reymond JL: Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inform Model. 2012, 52 (11): 2864-2875. 10.1021/ci300415d.View ArticleGoogle Scholar
- PubChem Compound Database. http://www.ncbi.nlm.nih.gov/pccompound?term=all[filt]%26cmd=search,
- Chemical Abstracts Service. http://www.cas.org/,
- Singh N, Guha R, Giulianotti MA, Pinilla C, Houghten RA, Medina-Franco JL: Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J Chem Inform Model. 2009, 49 (4): 1010-1024. 10.1021/ci800426u.View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorga K, Bender A, Marin RM, Giulianotti MA, Pinilla C, Houghten RA: Characterization of activity landscapes using 2D and 3D similarity methods: consensus activity cliffs. J Chem Inform Model. 2009, 49 (2): 477-491. 10.1021/ci800379q.View ArticleGoogle Scholar
- Todeschini R, Consonni V: Handbook of Molecular Descriptors, vol. 11. 2002, Weinheim, Germany: Wiley-VCHGoogle Scholar
- Shanmugasundaram V, Maggiora GM, Lajiness MS: Hit-directed nearest-neighbor searching. J Med Chem. 2005, 48 (1): 240-248. 10.1021/jm0493515.View ArticleGoogle Scholar
- Sheridan RP, Kearsley SK: Why do we need so many chemical similarity search methods?. Drug Discov Today. 2002, 7 (17): 903-911. 10.1016/S1359-6446(02)02411-X.View ArticleGoogle Scholar
- Willett P: Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006, 11 (23–24): 1046-1053.View ArticleGoogle Scholar
- Geppert H, Vogt M, Bajorath J: Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inform Model. 2010, 50 (2): 205-216. 10.1021/ci900419k.View ArticleGoogle Scholar
- Varnek A, Baskin II: Chemoinformatics as a theoretical chemistry discipline. Mol Inform. 2011, 30 (1): 20-32. 10.1002/minf.201000100.View ArticleGoogle Scholar
- Ivanenkov YA, Savchuk NP, Ekins S, Balakin KV: Computational mapping tools for drug discovery. Drug Discov Today. 2009, 14 (15–16): 767-775.View ArticleGoogle Scholar
- Jolliffe IT: Principal Component Analysis. 2010, Heidleberg, Germany: SpringerGoogle Scholar
- Schiffman SS, Lance Reynolds M, Young FW: Introduction to Multidimensional Scaling: Theory, Methods, and Applications. 1981, Bingley, United Kingdom: Emerald Group Publishing LimitedGoogle Scholar
- Oprea TI, Gottfries J: Chemography: the art of navigating in chemical space. J Combin Chem. 2001, 3 (2): 157-166. 10.1021/cc0000388.View ArticleGoogle Scholar
- Le Guilloux V, Colliandre L, Bourg S, Guenegou G, Dubois-Chevalier J, Morin-Allory L: Visual characterization and diversity quantification of chemical libraries: 1. creation of delimited reference chemical subspaces. J Chem Inform Model. 2011, 51 (8): 1762-1774. 10.1021/ci200051r.View ArticleGoogle Scholar
- Colliandre L, Le Guilloux V, Bourg S, Morin-Allory L: Visual characterization and diversity quantification of chemical libraries: 2. Analysis and selection of size-independent, subspace-specific diversity indices. J Chem Inform Model. 2012, 52 (2): 327-342. 10.1021/ci200535y.View ArticleGoogle Scholar
- Bemis GW, Murcko MA: The properties of known drugs. 1. Molecular frameworks. J Med Chem. 1996, 39 (15): 2887-2893. 10.1021/jm9602928.View ArticleGoogle Scholar
- Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H: The scaffold tree–visualization of the scaffold universe by hierarchical scaffold classification. J Chem Inform Model. 2007, 47 (1): 47-58. 10.1021/ci600338x.View ArticleGoogle Scholar
- Koch MA, Schuffenhauer A, Scheck M, Wetzel S, Casaulta M, Odermatt A, Ertl P, Waldmann H: Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc Natl Acad Sci U S A. 2005, 102 (48): 17272-17277. 10.1073/pnas.0503647102.View ArticleGoogle Scholar
- Renner S, van Otterlo WA, Dominguez Seoane M, Mocklinghoff S, Hofmann B, Wetzel S, Schuffenhauer A, Ertl P, Oprea TI, Steinhilber D, Brunsveld L, Rauh D, Waldmann H: Bioactivity-guided mapping and navigation of chemical space. Nat Chem Biol. 2009, 5 (8): 585-592. 10.1038/nchembio.188.View ArticleGoogle Scholar
- Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H: Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol. 2009, 5 (8): 581-583. 10.1038/nchembio.187.View ArticleGoogle Scholar
- Xu YJ, Johnson M: Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries. J Chem Inform Comput Sci. 2002, 42 (4): 912-926. 10.1021/ci025535l.View ArticleGoogle Scholar
- Medina-Franco JL, Petit J, Maggiora GM: Hierarchical strategy for identifying active chemotype classes in compound databases. Chem Biol Drug Des. 2006, 67 (6): 395-408. 10.1111/j.1747-0285.2006.00397.x.View ArticleGoogle Scholar
- Wilkens SJ, Janes J, Su AI: HierS: hierarchical scaffold clustering using topological chemical graphs. J Med Chem. 2005, 48 (9): 3182-3193. 10.1021/jm049032d.View ArticleGoogle Scholar
- Schneider G, Fechner U: Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov. 2005, 4 (8): 649-663. 10.1038/nrd1799.View ArticleGoogle Scholar
- Kutchukian PS, Lou D, Shakhnovich EI: FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space. J Chem Inform Model. 2009, 49 (7): 1630-1642. 10.1021/ci9000458.View ArticleGoogle Scholar
- Miranker A, Karplus M: An automated method for dynamic ligand design. Proteins. 1995, 23 (4): 472-490. 10.1002/prot.340230403.View ArticleGoogle Scholar
- Loving K, Alberts I, Sherman W: Computational approaches for fragment-based and de novo design. Curr Top Med Chem. 2010, 10 (1): 14-32. 10.2174/156802610790232305.View ArticleGoogle Scholar
- Schneider G, Hartenfeller M, Reutlinger M, Tanrikulu Y, Proschak E, Schneider P: Voyages to the (un)known: adaptive design of bioactive compounds. Trends Biotechnol. 2009, 27 (1): 18-26. 10.1016/j.tibtech.2008.09.005.View ArticleGoogle Scholar
- van Deursen R, Reymond JL: Chemical space travel. ChemMedChem. 2007, 2 (5): 636-640. 10.1002/cmdc.200700021.View ArticleGoogle Scholar
- Brown N, McKay B, Gilardoni F, Gasteiger J: A graph-based genetic algorithm and its application to the multiobjective evolution of median molecules. J Chem Inform Comput Sci. 2004, 44 (3): 1079-1087. 10.1021/ci034290p.View ArticleGoogle Scholar
- Brown N, McKay B, Gasteiger J: The de novo design of median molecules within a property range of interest. J Comput Aided Mol Des. 2004, 18 (12): 761-771. 10.1007/s10822-004-6986-2.View ArticleGoogle Scholar
- Lameijer EW, Kok JN, Back T, Ijzerman AP: The molecule evoluator. An interactive evolutionary algorithm for the design of drug-like molecules. J Chem Inform Model. 2006, 46 (2): 545-552. 10.1021/ci050369d.View ArticleGoogle Scholar
- Bishop KJ, Klajn R, Grzybowski BA: The core and most useful molecules in organic chemistry. Angew Chem. 2006, 45 (32): 5348-5354. 10.1002/anie.200600881.View ArticleGoogle Scholar
- Yu MJ: Natural product-like virtual libraries: recursive atom-based enumeration. J Chem Inform Model. 2011, 51 (3): 541-557. 10.1021/ci1002087.View ArticleGoogle Scholar
- Yu MJ: Druggable chemical space and enumerative combinatorics. Journal of cheminformatics. 2013, 5 (1): 19-10.1186/1758-2946-5-19.View ArticleGoogle Scholar
- Virshup AM, Contreras-Garcia J, Wipf P, Yang W, Beratan DN: Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J Am Chem Soc. 2013, 135 (19): 7296-7303. 10.1021/ja401184g.View ArticleGoogle Scholar
- Hoksza D, Svozil D: IEEE 11th International Conference on Bioinformatics and Bioengineering. IEEE 11th International Conference on Bioinformatics and Bioengineering (BIBE). 2011, Taichung, Taiwan: IEEE, 201-208.Google Scholar
- Schäling B: The Boost C++ Libraries. 2011, Laguna Hills, CA, U.S.A: XML PressGoogle Scholar
- RDKit: Cheminformatics and Machine Learning Software. http://www.rdkit.org/,
- Reinders J: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. 2007, Sebastopol, CA, U.S.A: O'Reilly MediaGoogle Scholar
- Ertl P, Schuffenhauer A: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics. 2009, 1 (1): 8-10.1186/1758-2946-1-8.View ArticleGoogle Scholar
- Qt. http://qt.digia.com/,
- Molpher User Manual. https://www.assembla.com/spaces/molpher/wiki/User_Manual,
- Rogers D, Hahn M: Extended-connectivity fingerprints. J Chem Inform Model. 2010, 50 (5): 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla C: Visualization of the chemical space in drug discovery. Curr Comput-Aid Drug. 2008, 4 (4): 322-333. 10.2174/157340908786786010.View ArticleGoogle Scholar
- Ma S, Dai Y: Principal component analysis based methods in bioinformatics studies. Brief Bioinform. 2011, 12 (6): 714-722. 10.1093/bib/bbq090.View ArticleGoogle Scholar
- Dibattista G, Eades P, Tamassia R, Tollis IG: Algorithms for Drawing Graphs - an Annotated-Bibliography. Comp Geom-Theor Appl. 1994, 4 (5): 235-282. 10.1016/0925-7721(94)00014-X.View ArticleGoogle Scholar
- Kamada T, Kawai S: An algorithm for drawing general undirected graphs. Inform Process Lett. 1989, 31 (1): 7-15. 10.1016/0020-0190(89)90102-6.View ArticleGoogle Scholar
- GGA Software Services - Indigo Toolkit. http://www.ggasoftware.com/opensource/indigo,
- ChemAxon Marvin. http://www.chemaxon.com/products/marvin/,
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009, 37 (Web Server issue): W623-633.View ArticleGoogle Scholar
- Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inform Model. 2012, 52 (7): 1757-1768. 10.1021/ci3001277.View ArticleGoogle Scholar
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40 (Database issue): D1100-1107.View ArticleGoogle Scholar
- Daylight Theory: SMILES. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html,
- Daylight Theory: SMARTS. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,
- PubChem Substructure Fingerprint. ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf,
- O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: an open chemical toolbox. Journal of cheminformatics. 2011, 3: 33-10.1186/1758-2946-3-33.View ArticleGoogle Scholar
- Baell JB, Holloway GA: New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem. 2010, 53 (7): 2719-2740. 10.1021/jm901137j.View ArticleGoogle Scholar
- Nicolaou CA, Brown N, Pattichis CS: Molecular optimization using computational multi-objective methods. Curr Opin Drug Discov Dev. 2007, 10 (3): 316-324.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.