Web-based 3D-visualization of the DrugBank chemical space
© Awale and Reymond. 2016
Received: 18 February 2016
Accepted: 27 April 2016
Published: 4 May 2016
Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited.
Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space.
KeywordsDrugBank Chemical space Visualization Fingerprints Molecular shape Pharmacophores
One of the defining features of organic chemistry is the extremely large diversity of possible molecules. The concept of chemical space, whereby molecules are annotated with a set of quantitative molecular properties and placed in a high-dimensional property space with each dimension corresponding to a different property, offers a practical approach to represent the structural diversity of large molecule collections [1–28]. Such high-dimensional spaces cannot be visualized directly but can be subjected to various dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection [29–32].
To make chemical space easier to inspect, we recently reported an interactive Java Applet representing databases of molecules as color-coded maps produced by projection of high-dimensional property spaces, defined by various molecular fingerprints, into two dimensions [32–37]. In these so-called Mapplets the computer screen shows a color-coded 2D-image where each pixel contains one or several molecules projected at that point. The average molecule contained in each pixel is displayed on a side-window on mouse over, with an option to open the complete list of molecules in the pixel in a secondary window, and subsequently to link selected molecules to the database entry, or to perform similarity searches in the parent high-dimensional property space. These Mapplets unfortunately suffer from the typical folding effects encountered when projecting high-dimensional property spaces into 2D [2, 6, 9, 28, 30, 32], which results in (a) many pixels containing molecules piled-up on top of each other, and (b) a poor correlation between distances on the 2D-map and distances in the original high-dimensional property space. In addition the Java Applets must be downloaded and run separately and are not platform independent.
Fingerprints used in this study
42D scalar fingerprint, counts 42 molecular quantum numbers (MQN) counting atom types, bond types, polar groups and topologies
34D scalar fingerprint, counts 34 characters appearing in the SMILES notation of molecules
20D scalar fingerprint, each dimension counts the number of atom pairs at one particular topological distance between 1 and 20 bonds, normalized to HAC
55D scalar fingerprint, category extended version of APfp counting the number of category atom pairs at one particular topological distance between 0 and 10 bonds, normalized to the number of category atoms, for categories: hydrophobic atoms, H-bond donor atoms, H-bond acceptor atoms, sp2 hybridized atoms, and HBA/HBD cross-pairs
1024D binary fingerprint, perceives the presence of substructures
Results and discussion
PCA of multidimensional property spaces
One of the remarkable aspects of the 3D-spaces concerns the resolution of compounds into individual 3D-grid positions after assigning molecules to a 3D-grid point in a 300 × 300 × 300 box covering the range of (PC1, PC2, PC3) values. In the original multidimensional property spaces an excellent resolution is obtained for DrugBank in the sense that almost all DrugBank molecules are encoded by a unique fingerprint bit value combination. This resolution is largely preserved upon PCA and assignment to the 3D-grid, as can be judged by the fact that the percentage of molecules appearing in singly occupied 3D-grid points is comparable to the percentage of molecule having a single fingerprint bit-value combination. The 3D-space is clearly superior in that matter to the 2D-map, where compounds are assigned to 2D-pixels in a 300 × 300 square covering the range of (PC1, PC2) values. In this case a significant folding occurs and only 40–60 % of the compounds appear in single occupied 2D-pixels (Fig. 1c).
The graphical user interface (GUI) of the interactive visualization window is exemplified here with the MQN 3D-space. The GUI consists of a main panel, a molecule view panel, and a control panel. The main panel occupies the entire screen area and displays the 3D-space (Fig. 3b). Each point in the 3D-space is represented as sphere, whose size depends on its distance to the camera. The view angle rotates by dragging the mouse upon left click, and the wheel controls the zoom in/out function.
The view panel is positioned at upper left and shows the structural formula and DrugBank ID of the molecule at the current mouse-over 3D-grid point. Upon selecting a grid point by double click, one can then link to the molecule page at the DrugBank webpage by clicking on the DrugBank ID displayed below the structural formula (Fig. 3c), or access a similarity browser to search for nearest neighbours in the original high-dimensional fingerprint space via the control panel (Fig. 3d/e).
The control panel at top right lists options to change the 3D-space view. Lines 1–3: select a color code according to a descriptor, or a single color code for DrugBank and the uploaded molecule list. Line 4: display the reference 3D-axes. Line 5: hide the DrugBank grid points, leaving only the molecules uploaded by the user as visible points. Line 6: change the 3D-grid point sphere size. Line 7: set the currently selected 3D-grid point as reference pivot point for the 3D-space (after selecting a grid point by double click). Line 8: Reset the view to the default entry view. Line 9: Link to the fingerprint similarity browser, which opens as an additional tab. This browser allows one to perform nearest neighbour searches in DrugBank in any of the five original high-dimensional fingerprint spaces. The browser is built in the same manner as our recently reported ChEMBL similarity browser . Line 10: help function listing the different options.
webDrugCS represents the first online application for visualizing DrugBank in five different 3D property spaces on computers, tablets or phones. In contrast to the other database exploration tools, webDrugCS can be used for curiosity driven exploration independently of specific queries, and is particularly suitable to rapidly gain an overview of the structures of drug molecules. While the present web-based application is currently limited to displaying of a few thousand points, the method might be applicable to displaying larger databases of millions of molecules if significant coding progress can be made.
Databases The DrugBank database was downloaded in SDF format from http://www.drugbank.ca/. Molecules were processed by checking for valency error, removing counter ions and adjusting their ionization state to pH 7.4, using an in-house built java program utilizing Java Chemistry library (JChem) from ChemAxon, Pvt. Ltd., as a starting point. Duplicates and molecules larger than 50 heavy atoms were removed from the database.
Fingerprints Calculation of MQN, SMIfp, APfp and Xfp fingerprints are discussed in detail in the respective publications from our group. Fingerprints were calculated as described previously using plugins provided in JChem chemistry library.
Principal component analysis
The PCA for each database was performed using an in house written Java program utilizing some of the available mathematical functions from JSci (A science API for Java: http://jsci.sourceforge.net/). The Java source code is based on the tutorial of Lindsay I. Smith (http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf).
3D-space and color coding
The PC-1, PC-2 and PC-3 values were calculated for each molecule in the database. The largest (PCmax) and smallest (PCmin) PC values appearing in the PC-1 or PC-2 or PC-3 values were used to define the value range ΔPC = PCmax − PCmin and set the binning scale as ΔPC/300. The PC-1, PC-2 and PC-3 values were binned onto 300 × 300 × 300 3D-grids using the same absolute bin size on the PC-1, PC-2 and PC-3 axis. Each molecule was assigned to a point on this 3D-grid. The Hue–Saturation–Lightness (HSL) color space was used for color coding, setting the hue value according to the average value of the selected molecular property across all molecules residing at that grid point, and the saturation according to the standard deviation of that value across all molecules within ±5 grid points in each direction. As a result the color change blue–cyan–green–yellow–red–magenta shows an increasing average value of property in a grid point, and saturation to grey indicates a strong gradient of the value in the vicinity.
MA designed and realized webDrugCS and wrote the paper. J-LR co-designed and supervised the project and wrote the paper. Both authors read and approved the final manuscript.
This work was supported financially by the University of Berne, the Swiss National Science Foundation and the NCCR TransCure.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Pearlman RS, Smith KM (1998) Novel software tools for chemical diversity. Persp Drug Discov Des 9–11:339–353View ArticleGoogle Scholar
- Oprea TI, Gottfries J (2001) Chemography: the art of navigating in chemical space. J Comb Chem 3:157–166View ArticleGoogle Scholar
- Takahashi Y, Konji M, Fujishima S (2003) MolSpace: a computer desktop tool for visualization of massive molecular data. J Mol Graph Model 21:333–339View ArticleGoogle Scholar
- Haggarty SJ, Clemons PF, Wong JC, Wong JF, Schreiber SL (2004) Mapping chemical space using molecular descriptors and chemical genetics: deacetylase inhibitors. Comb Chem High Throughput Screen 7:669–676View ArticleGoogle Scholar
- Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12:225–233View ArticleGoogle Scholar
- Medina-Franco JL, Maggiora GM, Giulianotti MA, Pinilla C, Houghten RA (2007) A Similarity-based data-fusion approach to the visual characterization and comparison of compound databases. Chem Biol Drug Des 70:393–412View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla C (2008) Visualization of the chemical space in drug discovery. Curr Comput-Aided Drug Des 4:322–333View ArticleGoogle Scholar
- Medina-Franco JL, Martinez-Mayorga K, Bender A, Marin RM, Giulianotti MA, Pinilla C, Houghten RA (2009) Characterization of activity landscapes using 2D and 3D similarity methods: consensus activity cliffs. J Chem Inf Model 49:477–491View ArticleGoogle Scholar
- Rosen J, Gottfries J, Muresan S, Backlund A, Oprea TI (2009) Novel chemical space exploration via natural products. J Med Chem 52:1953–1962View ArticleGoogle Scholar
- Singh N, Guha R, Giulianotti MA, Pinilla C, Houghten RA, Medina-Franco JL (2009) Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J Chem Inf Model 49:1010–1024View ArticleGoogle Scholar
- Ivanenkov YA, Savchuk NP, Ekins S, Balakin KV (2009) Computational mapping tools for drug discovery. Drug Discov Today 14:767–775View ArticleGoogle Scholar
- Akella LB, DeCaprio D (2010) Cheminformatics approaches to analyze diversity in compound screening libraries. Curr Opin Chem Biol 14:325–330View ArticleGoogle Scholar
- Geppert H, Vogt M, Bajorath J (2010) Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 50:205–216View ArticleGoogle Scholar
- Reymond JL, Van Deursen R, Blum LC, Ruddigkeit L (2010) Chemical space as a source for new drugs. MedChemComm 1:30–38View ArticleGoogle Scholar
- Le Guilloux V, Colliandre L, Bourg S, Guénegou G, Dubois-Chevalier J, Morin-Allory L (2011) Visual characterization and diversity quantification of chemical libraries: 1. Creation of delimited reference chemical subspaces. J Chem Inf Model 51:1762–1774View ArticleGoogle Scholar
- Reutlinger M, Guba W, Martin RE, Alanine AI, Hoffmann T, Klenner A, Hiss JA, Schneider P, Schneider G (2011) Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. Angew Chem Int Ed Engl 50:11633–11636View ArticleGoogle Scholar
- Owen JR, Nabney IT, Medina-Franco JL, López-Vallejo F (2011) Visualization of molecular fingerprints. J Chem Inf Model 51:1552–1563View ArticleGoogle Scholar
- Medina-Franco JL, Yongye AB, Pérez-Villanueva J, Houghten RA, Martínez-Mayorga K (2011) Multitarget structure–activity relationships characterized by activity–difference maps and consensus similarity measure. J Chem Inf Model 51:2427–2439View ArticleGoogle Scholar
- Maggiora GM, Shanmugasundaram V (2011) Molecular similarity measures. Methods Mol Biol (Clifton NJ) 672:39–100View ArticleGoogle Scholar
- Yoo J, Medina-Franco J (2011) Chemoinformatic approaches for inhibitors of DNA methyltransferases: comprehensive characterization of screening libraries. Comput Mol Biosci 1:7–16View ArticleGoogle Scholar
- Gutlein M, Karwath A, Kramer S: CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminform. 2012; 4:Article 7. http://www.jcheminf.com/content/4/1/7. Accessed 14 July 2015
- Ertl P, Rohde B: The molecule cloud—compact visualization of large collections of molecules. J Cheminform. 2012; 4:Article 12. http://www.jcheminf.com/content/14/11/12. Accessed 16 Dec 2012
- Lachance H, Wetzel S, Kumar K, Waldmann H (2012) Charting, navigating, and populating natural product chemical space for drug discovery. J Med Chem 55:5989–6001View ArticleGoogle Scholar
- Medina-Franco JL, Aguayo-Ortiz R (2013) Progress in the visualization and mining of chemical and target spaces. Mol Inf 32:942–953View ArticleGoogle Scholar
- Hoksza D, Skoda P, Vorsilak M, Svozil D: Molpher: a software framework for systematic chemical space exploration. J Cheminform. 2014; 6:Article 7. http://www.jcheminf.com/content/6/1/7. Accessed 14 July 2015
- Miyao T, Reker D, Schneider P, Funatsu K, Schneider G (2015) Chemography of natural product space. Planta Med. doi:https://doi.org/10.1055/s-0034-1396322 Google Scholar
- Rodrigues T, Hauser N, Reker D, Reutlinger M, Wunderlin T, Hamon J, Koch G, Schneider G (2015) Multidimensional de novo design reveals 5-HT2B receptor-selective ligands. Angew Chem Int Ed Engl 54:1551–1555View ArticleGoogle Scholar
- Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55:460–473View ArticleGoogle Scholar
- Digles D, Ecker GF (2011) Self-organizing maps for in silico screening and data visualization. Mol Inf 30:838–846View ArticleGoogle Scholar
- Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A (2014) Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. J Chem Inf Model 55:84–94View ArticleGoogle Scholar
- Deng Z-L, Du C-X, Li X, Hu B, Kuang Z-K, Wang R, Feng S-Y, Zhang H-Y, Kong D-X (2013) Exploring the biologically relevant chemical space for drug discovery. J Chem Inf Model 53:2820–2828View ArticleGoogle Scholar
- Awale M, Reymond JL (2015) Similarity mapplet: interactive visualization of the directory of useful decoys and ChEMBL in high dimensional chemical spaces. J Chem Inf Model 55:1509–1516View ArticleGoogle Scholar
- Awale M, Reymond JL (2012) Cluster analysis of the DrugBank chemical space using molecular quantum numbers. Bioorg Med Chem 20:5372–5378View ArticleGoogle Scholar
- Awale M, van Deursen R, Reymond JL (2013) MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 53:509–518View ArticleGoogle Scholar
- Schwartz J, Awale M, Reymond JL (2013) SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model 53:1979–1989View ArticleGoogle Scholar
- Ruddigkeit L, Awale M, Reymond JL (2014) Expanding the fragrance chemical space for virtual screening. J Cheminform 6:27–39View ArticleGoogle Scholar
- Reymond JL (2015) The chemical space project. Acc Chem Res 48:722–730View ArticleGoogle Scholar
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V et al (2011) DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res 39:D1035–D1041View ArticleGoogle Scholar
- Jin X, Awale M, Zasso M, Kostro D, Patiny L, Reymond JL (2015) PDB-Explorer: a web-based interactive map of the protein data bank in shape space. BMC Bioinformatics 16:339View ArticleGoogle Scholar
- Gutlein M, Karwath A, Kramer S (2012) CheS-Mapper—chemical space mapping and visualization in 3D. J Cheminf 4:7View ArticleGoogle Scholar
- Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H (2009) Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol 5:581–583View ArticleGoogle Scholar
- Ertl P, Rohde B (2012) The molecule cloud—compact visualization of large collections of molecules. J Cheminf 4:12View ArticleGoogle Scholar
- Hoksza D, Skoda P, Vorsilak M, Svozil D (2014) Molpher: a software framework for systematic chemical space exploration. J Cheminf 6:7View ArticleGoogle Scholar
- Hilbig M, Rarey M (2015) MONA 2: a light cheminformatics platform for interactive compound library processing. J Chem Inf Model 55:2071–2078View ArticleGoogle Scholar
- Lewis R, Guha R, Korcsmaros T, Bender A (2015) Synergy Maps: exploring compound combinations using network-based visualization. J Cheminf 7:36View ArticleGoogle Scholar
- Korb O, Kuhn B, Hert J, Taylor N, Cole J, Groom C, Stahl M (2016) Interactive and versatile navigation of structural databases. J Med Chem. doi:https://doi.org/10.1021/acs.jmedchem.5b01756 Google Scholar
- Lewell XQ, Jones AC, Bruce CL, Harper G, Jones MM, McLay IM, Bradshaw J (2003) Drug rings database with web interface. A tool for identifying alternative chemical rings in lead discovery programs. J Med Chem 46:3257–3274View ArticleGoogle Scholar
- Goede A, Dunkel M, Mester N, Frommel C, Preissner R (2005) SuperDrug: a conformational drug database. Bioinformatics 21:1751–1753View ArticleGoogle Scholar
- Nickel J, Gohlke B-O, Erehman J, Banerjee P, Rong WW, Goede A, Dunkel M, Preissner R (2014) SuperPred: update on drug classification and target prediction. Nucleic Acids Res 42:W26–W31View ArticleGoogle Scholar
- Cobanoglu MC, Oltvai ZN, Taylor DL, Bahar I (2015) BalestraWeb: efficient online evaluation of drug–target interactions. Bioinformatics 31:131–133View ArticleGoogle Scholar
- Nguyen KT, Blum LC, van Deursen R, Reymond J-L (2009) Classification of organic molecules by molecular quantum numbers. ChemMedChem 4:1803–1805View ArticleGoogle Scholar
- van Deursen R, Blum LC, Reymond JL (2010) A searchable map of PubChem. J Chem Inf Model 50:1924–1934View ArticleGoogle Scholar
- Awale M, Reymond JL (2014) Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. J Chem Inf Model 54:1892–1897View ArticleGoogle Scholar
- Hagadone TR (1992) Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J Chem Inf Comput Sci 32:515–521View ArticleGoogle Scholar