The polypharmacology browser: a web-based multi-fingerprint target prediction tool using ChEMBL bioactivity data
© The Author(s) 2017
Received: 29 August 2016
Accepted: 10 February 2017
Published: 21 February 2017
Several web-based tools have been reported recently which predict the possible targets of a small molecule by similarity to compounds of known bioactivity using molecular fingerprints (fps), however predictions in each case rely on similarities computed from only one or two fps. Considering that structural similarity and therefore the predicted targets strongly depend on the method used for comparison, it would be highly desirable to predict targets using a broader set of fps simultaneously.
Herein, we present the polypharmacology browser (PPB), a web-based platform which predicts possible targets for small molecules by searching for nearest neighbors using ten different fps describing composition, substructures, molecular shape and pharmacophores. PPB searches through 4613 groups of at least 10 same target annotated bioactive molecules from ChEMBL and returns a list of predicted targets ranked by consensus voting scheme and p value. A validation study across 670 drugs with up to 20 targets showed that combining the predictions from all 10 fps gives the best results, with on average 50% of the known targets of a drug being correctly predicted with a hit rate of 25%. Furthermore, when profiling a new inhibitor of the calcium channel TRPV6 against 24 targets taken from a safety screen panel, we observed inhibition in 5 out of 5 targets predicted by PPB and in 7 out of 18 targets not predicted by PPB. The rate of correct (5/12) and incorrect (0/12) predictions for this compound by PPB was comparable to that of other web-based prediction tools.
Publicly accessible web-based target prediction tools
10 different fingerprints
Multilevel neighbourhoods of atoms (MNA) descriptors
WDI and ACD
CHEMBL 16, WOMBAT, MDDR and StarLite
Receptor-based pharmacophore models
TargetBank, DrugBank, BindingDB, PDTD
ECfp6, ECfp4 and Openbabel FP2
ChEMBL 11 and PubChem bioassay
Openbabel FP2, MACSS, SHAFT and USR
ChEMBL 14, BindingDB, DrugBank, KEGG and PDB
Circular fingerprint FCFP
CATS and MOE physiochemical descriptors
Openbabel FP2 and Electroshape descriptors
ChEMBL, SuperTarget and BindingDB
ChEMBL 14, BindingDB, DrugBank, PharmGKB, PubChem bioassay, WOMBAT, IUPHAR, CTD and STITCH
Molecular fingerprints used for target prediction with PPB
21-D atom-pair fingerprint, perceives molecular shape
55-D atom category extended atom-pair fingerprint, perceives pharmacophores
42-D Molecular Quantum Numbers, scalar fingerprint counting atoms, bonds, polarity and ring features, perceives constitution, topology and molecular shape
34-D scalar fingerprint counting occurrence of characters in SMILES, perceives rings, aromaticity, and polarity
1024-D binary daylight type substructure fingerprint, perceives detailed substructures
1024-D binary circular extended connectivity fingerprint, perceives detailed substructures and pharmacophores
Fusion fingerprint, Xfp + SMIfp + Sfp
Fusion fingerprint, Xfp + MQN + SMIfp
Fusion fingerprint, Xfp + SMIfp + Sfp + ECfp4
Fusion fingerprint, Xfp + MQN + SMIfp + Sfp + ECfp4
For each of the 871,673 selected ChEMBL compounds we computed each of the six fingerprints described in Table 2. In short, APfp, Xfp, MQN and SMIfp were calculated by using an in house developed Java program utilizing various calculator plugins from JChem chemistry library, in particular TopologyAnalyserPlugin to determine shortest topological path for atom-pair, HBDAplugins to determine hydrogen bond donor and acceptor atoms, and MajorMicrospeciesPlugin to adjust the ionization state of molecules. The detailed procedure for the generation of these fingerprints can be found in the respective publication [38, 42, 43]. For Sfp, a daylight type 1024-bit hash fingerprint was computed using the ChemicalFingerprint class of the JChem library. The 1024-bit extended connectivity fingerprint (ECfp4) was calculated with bond diameter of 4 using ECFP class of the JChem library. The source codes for computation of fingerprints are freely available for download at www.gdb.unibe.ch.
Fused fingerprints (Data fusion)
p value calculation
Each target prediction for a given query molecule is based on the city-block distance between the query molecule and the closest member of a group of compounds associated with this target. A p value can be computed for each prediction as the degree of randomness of the observed city block distance  and therefore the probability that the corresponding query–target association occurs at random. To compute the p value, we generated a random distance distribution for each of the 4613 targets in each of the ten fingerprint spaces by computing distances between the ChEMBL compounds associated with the target and randomly selected molecules from the ZINC database , taken as representative molecules of screening compounds for which target predictions might be carried out (Fig. 1h). For each target up to 1 M random pairs were considered. We then fitted each of the 46,130 distance distributions using a negative binomial distribution function, and generated the corresponding cumulative density functions giving the p value as a function of the city block distance (Fig. 1i). The choice of a negative binomial distribution function was based on the discrete nature of the city-block distance. As the p value calculation is specific for each target protein and fingerprint space it can only be used in this context, and should not be used to compare molecules from different targets or fingerprint spaces. The curve fitting was carried out using the R statistical package version 3.2.5 using the “fitdistrplus” library with default maximum likelihood method for parameters estimation.
The validation set containing 670 drugs and their targets annotation was created as follows: (a) compounds labelled as approved drug or drug in clinical trial were extracted from ChEMBL database, (b) for each drug, target list was constructed by comparing SMILES string of drug to SMILES of bioactive compounds of targets used in the PPB. When the SMILES string was matched, corrosponding PPB target was added to known target list for the drug and (c) retained the drugs with ≤20 targets in the list.
The PPB web-interface
Given a query molecule, PPB computes each of the 10 fps for this compound, sorts the 871,673 ChEMBL compounds by city-block distance to the query using each fp independently, and collects a predefined number of compound associated targets in each case. The p value for each target is calculated from the nearest neighbor found for that target. Finally, PPB merges the different target lists.
Clicking on the number of selected compounds per target (furthest right column) opens a new tab displaying the structures of these compounds labelled with fingerprints, city-block distance and ChEMBL compound id (Fig. 2b). The selection of a row in the table displays the full name of the target in the “Target name” field, which is an active link to the parent ChEMBL database to obtain further information on this target. The result (targets and molecules) can be stored as text file using the “save” button provided in each window.
Results and discussion
In terms of the fraction of correctly predicted known targets (Fig. 4a) the fusion fingerprints Ffp1, Ffp3 and Ffp4 showed the highest average value, followed by the binary fingerprints Sfp and ECfp4, the pharmacophore fingerprint Xfp, and finally MQN, SMIfp and APfp. To evaluate if the performance of these fingerprints significantly differ from one another or not, the student t tests (at confidence interval of 0.95) were performed for all the possible pairs of fingerprints (Additional file 1: Tables S1 and S2). Although most of the individual fingerprint pairs show significant differences in performance, no significant differences were found between performance of Sfp, ECfp4, Ffp1 and Ffp2. Overall the performance trend followed the complexity of each fingerprint and highlights that detailed structural encoding tends to increase prediction performance.
These single fp approaches were outperformed by using the combination of all 10 results lists (combined method), however at the expense of a relatively low hit rate resulting from checking a larger number of targets for each drug as compared to individual fps (Fig. 4b/c). The combined method also showed the highest success rate for finding at least one known target among the 5 top predicted targets for each fp (Fig. 4d). For all methods except SMIfp the predicted target list with p value ≤0.01 showed a significantly greater chance of success compared to the target lists with p value >0.01. The importance of a low p value for target prediction was particularly striking with the fused fingerprints Ffp1–4 and the combined method, for which more than 75% of all correct predictions originated from predicted targets with p value ≤0.01.
Although several of the fps were not statistically different in terms of performance, the pairwise overlap between predicted targets by each of the 10 different fps showed that on average less than 45% of targets were common between any two fps (Fig. 4f). Furthermore each fp retrieved a significant number of unique targets which are not found by any other fp, further highlighting the utility of each fp. Interestingly, APfp and Xfp, which perceive the shape and pharmacophore patterns in molecules, returned the highest percentages of unique targets (52 and 31% respectively). For fused fingerprints Ffp1–4 the percentages of unique targets were relatively low (3–10%) due to the considerable overlap among themselves.
A similar analysis performed by categorizing targets according to their p values (Fig. 4g/h) showed that the pairwise overlap between high confidence (low p value) targets of different fingerprints were significantly higher (on average 47% overlap) as compared to low confidence targets (high p value, on average 24% overlap). This can be explained by the fact that at high p values the structural similarity between a query and its nearest neighbour compound associated with the target becomes less obvious and difficult to capture (Fig. 4e).
Prediction of off-targets of a new TRPV6 inhibitor
In-vitro profiling showed that CIS22a bound significantly (>50% inhibition at 10 µM) to 12 targets of the 24 selected targets (Fig. 5b). Five of these targets (ADR1A, DRD1, DRD2, DRD4 and 5HTR1A) were proposed by multiple fps in the PPB. Only two of the Ffps (Ffp1 and Ffp3) and the combined method were able to predict all of the five targets, illustrating the usefulness of similarity fusion methods and combined data analysis. The analysis of bioactive compounds which linked these five targets to CIS22a showed that the linking compounds were shared by different fps and closely related targets (1–8 in Fig. 5d). Interestingly, the linking compounds for three targets (DRD2, DRD4 and HTR1A) suggested by Xfp were not shared by any other fps, probably because of their relatively low substructure similarity to the query compound. On the other hand Xfp predicted an activity on ADRA2A, which was not confirmed experimentally (only 31% inhibition at 10 µM).
For comparison we successfully ran target predictions for CIS22a using six of the fourteen target prediction web-based tools listed in Table 1. Results comparable to PPB were obtained with SwissTarget, SuperPred, TargetHunter, ChemMapper and ChEMBLPred. On the other hand, SEA only returned a single, correct target, and PharmMapper did not predict any of the tested targets.
The PPB web tool features a unique, intuitive and exhaustive search platform for target prediction. The comparative view of target list from various fingerprint spaces provides a simple yet efficient way for selection of targets by consensus voting. PPB provides a valuable support to drug discovery projects.
MA designed and realized the project and wrote the paper. J-LR co-designed and supervised the project and wrote the paper. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
The website, source code and all the data used for development and validation of PPB is freely available for download at the PPB home page.
This work was supported financially by the University of Berne, the Swiss National Science Foundation and the NCCR TransCure.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5(12):993–996View ArticleGoogle Scholar
- Anighoro A, Bajorath J, Rastelli G (2014) Polypharmacology: challenges and opportunities in drug discovery. J Med Chem 57(19):7874–7887View ArticleGoogle Scholar
- Lavecchia A, Cerchia C (2016) In silico methods to address polypharmacology: current status, applications and future perspectives. Drug Discov Today 21(2):288–298View ArticleGoogle Scholar
- Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(suppl 1):D668–D672View ArticleGoogle Scholar
- Olah M, Rad R, Ostopovici L, Bora A, Hadaruga N, Hadaruga D, Moldovan R, Fulias A, Mractc M, Oprea TI (2008) WOMBAT and WOMBAT-PK: bioactivity databases for lead and drug discovery, chemical biology: from small molecules to systems biology and drug design. Wiley-VCH Verlag GmbH, Weinheim, pp 760–786Google Scholar
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37(Web Server issue):W623–W633View ArticleGoogle Scholar
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100–D1107View ArticleGoogle Scholar
- Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2015) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053View ArticleGoogle Scholar
- Rose PW, Prlić A, Bi C, Bluhm WF, Christie CH, Dutta S, Green RK, Goodsell DS, Westbrook JD, Woo J et al (2015) The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res 43(D1):D345–D356View ArticleGoogle Scholar
- Lagunin A, Stepanchikova A, Filimonov D, Poroikov V (2000) PASS: prediction of activity spectra for biologically active substances. Bioinformatics 16(8):747–748View ArticleGoogle Scholar
- Jenkins JL, Bender A, Davies JW (2006) In silico target fishing: predicting biological targets from chemical structure. Drug Discov Today Technol 3(4):413–421View ArticleGoogle Scholar
- Li H, Gao Z, Kang L, Zhang H, Yang K, Yu K, Luo X, Zhu W, Chen K, Shen J et al (2006) TarFisDock: a web server for identifying drug targets with docking approach. Nucleic Acids Res 34(suppl 2):W219–W224View ArticleGoogle Scholar
- Nidhi, Glick M, Davies JW, Jenkins JL (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model 46(3):1124–1133View ArticleGoogle Scholar
- Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25(2):197–206View ArticleGoogle Scholar
- Nigsch F, Bender A, Jenkins JL, Mitchell JBO (2008) Ligand–target prediction using winnow and naive Bayesian algorithms and the implications of overall performance statistics. J Chem Inf Model 48(12):2313–2325View ArticleGoogle Scholar
- Wale N, Karypis G (2009) Target fishing for chemical compounds using target–ligand activity data and ranking based methods. J Chem Inf Model 49(10):2190–2201View ArticleGoogle Scholar
- Liu X, Ouyang S, Yu B, Liu Y, Huang K, Gong J, Zheng S, Li Z, Li H, Jiang H (2010) PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach. Nucleic Acids Res 38(suppl 2):W609–W614View ArticleGoogle Scholar
- Luo H, Chen J, Shi L, Mikailov M, Zhu H, Wang K, He L, Yang L (2011) DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome. Nucleic Acids Res 39(Suppl 2):W492–W498View ArticleGoogle Scholar
- Koutsoukas A, Simms B, Kirchmair J, Bond PJ, Whitmore AV, Zimmer S, Young MP, Jenkins JL, Glick M, Glen RC et al (2011) From in silico target prediction to multi-target drug design: current databases, methods and applications. J Proteom 74(12):2554–2574View ArticleGoogle Scholar
- AbdulHameed MDM, Chaudhury S, Singh N, Sun H, Wallqvist A, Tawa GJ (2012) Exploring polypharmacology using a ROCS-based target fishing approach. J Chem Inf Model 52(2):492–505View ArticleGoogle Scholar
- Mavridis L, Mitchell JB (2013) Predicting the protein targets for athletic performance-enhancing substances. J Cheminform 5(1):1–13View ArticleGoogle Scholar
- Wang L, Ma C, Wipf P, Liu H, Su W, Xie X-Q (2013) TargetHunter: an in silico target identification tool for predicting therapeutic potential of small organic molecules based on chemogenomic database. AAPS J 15(2):395–406View ArticleGoogle Scholar
- Gong J, Cai C, Liu X, Ku X, Jiang H, Gao D, Li H (2013) ChemMapper: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method. Bioinformatics 29(14):1827–1829View ArticleGoogle Scholar
- Liu X, Vogt I, Haque T, Campillos M (2013) HitPick: a web server for hit identification and target prediction of chemical screenings. Bioinformatics 29(15):1910–1912View ArticleGoogle Scholar
- Peragovics Á, Simon Z, Tombor L, Jelinek B, Hári P, Czobor P, Málnási-Csizmadia A (2013) Virtual affinity fingerprints for target fishing: a new application of drug profile matching. J Chem Inf Model 53(1):103–113View ArticleGoogle Scholar
- Reker D, Rodrigues T, Schneider P, Schneider G (2014) Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc Natl Acad Sci 111(11):4067–4072View ArticleGoogle Scholar
- Clark AM, Sarker M, Ekins S (2014) New target prediction and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0. J Cheminform 6(1):1–17View ArticleGoogle Scholar
- Gfeller D, Grosdidier A, Wirth M, Daina A, Michielin O, Zoete V (2014) SwissTargetPrediction: a web server for target prediction of bioactive small molecules. Nucleic Acids Res 42(W1):W32–W38View ArticleGoogle Scholar
- Nickel J, Gohlke B-O, Erehman J, Banerjee P, Rong WW, Goede A, Dunkel M, Preissner R (2014) SuperPred: update on drug classification and target prediction. Nucleic Acids Res 42(W1):W26–W31View ArticleGoogle Scholar
- Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Pujadas G, Garcia-Vallve S (2015) Tools for in silico target fishing. Methods 71:98–103View ArticleGoogle Scholar
- Lusci A, Fooshee D, Browning M, Swamidass J, Baldi P (2015) Accurate and efficient target prediction using a potency-sensitive influence-relevance voter. J Cheminform 7(1):1–13View ArticleGoogle Scholar
- Afzal AM, Mussa HY, Turner RE, Bender A, Glen RC (2015) A multi-label approach to target prediction taking ligand promiscuity into account. J Cheminform 7(1):1–14View ArticleGoogle Scholar
- Liu X, Gao Y, Peng J, Xu Y, Wang Y, Zhou N, Xing J, Luo X, Jiang H, Zheng M (2015) TarPred: a web application for predicting therapeutic and side effect targets of chemical compounds. Bioinformatics 31(12):2049–2051View ArticleGoogle Scholar
- Cao R, Wang Y (2016) Predicting molecular targets for small-molecule drugs with a ligand-based interaction fingerprint approach. ChemMedChem 11(12):1352–1361View ArticleGoogle Scholar
- Wang Z, Liang L, Yin Z, Lin J (2016) Improving chemical similarity ensemble approach in target prediction. J Cheminform 8(1):1–10View ArticleGoogle Scholar
- Wang X, Pan C, Gong J, Liu X, Li H (2016) Enhancing the enrichment of pharmacophore-based target prediction for the polypharmacological profiles of drugs. J Chem Inf Model 56(6):1175–1183View ArticleGoogle Scholar
- Kringelum J, Kjaerulff SK, Brunak S, Lund O, Oprea TI, Taboureau O. ChemProt-3.0: a global chemical biology diseases mapping. Database 2016; 2016.Google Scholar
- Nguyen KT, Blum LC, van Deursen R, Reymond J-L (2009) Classification of organic molecules by molecular quantum numbers. ChemMedChem 4(11):1803–1805View ArticleGoogle Scholar
- Blum LC, Reymond J-L (2009) 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131(25):8732–8733View ArticleGoogle Scholar
- Blum LC, van Deursen R, Bertrand S, Mayer M, Bürgi JJ, Bertrand D, Reymond J-L (2011) Discovery of α7-nicotinic receptor ligands by virtual screening of the chemical universe database GDB-13. J Chem Inf Model 51(12):3105–3112View ArticleGoogle Scholar
- Ruddigkeit L, Blum LC, Reymond J-L (2013) Visualization and virtual screening of the chemical universe database GDB-17. J Chem Inf Model 53(1):56–65View ArticleGoogle Scholar
- Schwartz J, Awale M, Reymond J-L (2013) SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model 53(8):1979–1989View ArticleGoogle Scholar
- Awale M, Reymond J-L (2014) Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. J Chem Inf Model 54(7):1892–1907View ArticleGoogle Scholar
- Awale M, Reymond J-L (2014) A multi-fingerprint browser for the ZINC database. Nucleic Acids Res 42:W234–W239View ArticleGoogle Scholar
- Reymond JL (2015) The chemical space project. Acc Chem Res 48(3):722–730View ArticleGoogle Scholar
- Simonin C, Awale M, Brand M, van Deursen R, Schwartz J, Fine M, Kovacs G, Häfliger P, Gyimesi G, Sithampari A et al (2015) Optimization of TRPV6 calcium channel inhibitors using a 3D ligand-based virtual screening method. Angew Chem Int Ed 54(49):14748–14752View ArticleGoogle Scholar
- Willett P (2013) Fusing similarity rankings in ligand-based virtual screening. Comput Struct Biotechnol J 5(6):1–6View ArticleGoogle Scholar
- Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49(23):6789–6801View ArticleGoogle Scholar
- Hagadone TR (1992) Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J Chem Inf Comput Sci 32(5):515–521View ArticleGoogle Scholar
- Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754View ArticleGoogle Scholar
- Baldi P, Nasr R (2010) When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values. J Chem Inf Model 50(7):1205–1222View ArticleGoogle Scholar
- Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768View ArticleGoogle Scholar