DASPfind: new efficient method to predict drug–target interactions
© Ba-alawi et al. 2016
Received: 26 November 2015
Accepted: 8 March 2016
Published: 16 March 2016
Identification of novel drug–target interactions (DTIs) is important for drug discovery. Experimental determination of such DTIs is costly and time consuming, hence it necessitates the development of efficient computational methods for the accurate prediction of potential DTIs. To-date, many computational methods have been proposed for this purpose, but they suffer the drawback of a high rate of false positive predictions.
Here, we developed a novel computational DTI prediction method, DASPfind. DASPfind uses simple paths of particular lengths inferred from a graph that describes DTIs, similarities between drugs, and similarities between the protein targets of drugs. We show that on average, over the four gold standard DTI datasets, DASPfind significantly outperforms other existing methods when the single top-ranked predictions are considered, resulting in 46.17 % of these predictions being correct, and it achieves 49.22 % correct single top ranked predictions when the set of all DTIs for a single drug is tested. Furthermore, we demonstrate that our method is best suited for predicting DTIs in cases of drugs with no known targets or with few known targets. We also show the practical use of DASPfind by generating novel predictions for the Ion Channel dataset and validating them manually.
Despite large research and development expenditures , only 27 new molecular entities were approved by the Food and Drug Administration (FDA) in 2013, illustrating the continued decline in drug discovery . The approach to drug discovery based on in silico methods is thus becoming more attractive. Many efforts are put into developing methods for the prediction of drug–target interactions (DTIs) that mitigate the expensive and time consuming experimental identification of lead compounds and their interactors . Moreover, such methods allow for the identification of potentially new therapeutic applications for the existing drugs (drug repositioning) that may reduce research cost and time due to the existing extensive clinical history and toxicology information of the drugs . Furthermore, prediction of DTIs reveals drugs acting on multiple targets, i.e. those that exhibit polypharmacology, which may aid in understanding side effects caused by the use of drugs . For example, one such in silico DTI prediction method  uses the crystal structure of the target binding site to yield a good prediction of druggability and to identify the less-druggable targets before the deployment of any substantial funding and effort for experiments. The study  further successfully and experimentally, tested two of the generated predictions using high-throughput screening of a diverse collection of compounds, thereby demonstrating the utility of their approach when dealing with difficult targets. Other studies, such as [7, 8], also successfully demonstrated the use of similar docking methods in identifying DTIs and in drug repositioning. The drawback of these docking methods is that they require high-resolution X-ray crystal (3D) structures of proteins, which are not known for membrane-bound proteins, that account for more than 40 % of current drug targets [9, 10]. An alternative ligand-based approach has therefore been developed based on the use of machine learning methods to predict the binding of a candidate ligand based on the known ligands of a target protein [11, 12]. One such ligand-based method to predict DTIs using a drug two-dimensional (2D) structural similarity is presented in , and is known as the similarity ensemble approach (SEA). The study experimentally confirmed 23 new DTIs, five of which were potent . However, the performance of this ligand-based prediction method decreases as the number of known ligands of a particular target protein decreases. To further minimize the drawbacks of the above-mentioned methods, recently developed techniques have been based on supervised classification [4, 14–16] and graph interaction models . In  the supervised inference method based on a bipartite graph is used to predict unknown DTIs. The study demonstrated the use of bipartite local models for generating two independent predictions: (a) prediction of target proteins of a given drug, and (b) prediction of drugs targeting a given protein. The obtained information is combined to give definitive predictions for each interaction. These predictions included known DTIs involving human enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, and also suggested potential novel DTIs . On the other hand,  used a network-based inference (NBI) method that implements a simplified version of the algorithm proposed in . The results clearly show good performance, but one limitation is that the knowledge pertaining to drug–drug similarities and target–target similarities has not been utilized, since only information from known drug–protein interactions has been used. Another limitation of NBI is its inability to give predictions for new drugs without known targets. The first limitation has been dealt with in  by adding information from the drug similarity and the targets similarity to the function used by NBI, which produced improved results. However, the resulting method (DT-Hybrid) still does not resolve the second limitation related to the target predictions for new drugs without known targets. The method, denoted as HGBI, presented in , also added a drug–similarity graph and a protein–similarity graph to the interactions graph used by NBI. This method allowed reducing both limitations of NBI. However, it used a restricted way of traversing the resultant network, so only partial information from the graph topology has been utilized and only partial benefits were achieved. Another approach that deals with both limitations is NRWRH , which uses a method of network-based random walk with a restart applied to a heterogeneous network.
In this study, we propose a novel method (DASPfind) that relies on the graph interaction model. Our method uses a heterogeneous graph consisting of three sub-graphs connected to each other. These sub-graphs represent: drug–drug similarity, protein–protein similarity, and known drug–protein interactions. Our algorithm for predicting new drug–protein interactions is based on all simple paths of particular lengths on such a graph model. The main idea in our method is to utilize the similarity information within the sub networks and combine it with information from the topology of the heterogeneous graph. In the results, we predict DTIs (targets here are proteins from several groups), and show that our method is capable of correctly predicting individual DTI in 27–53 % of cases (depending on the dataset and the target protein group) using only the single top-ranked prediction for a drug, achieving on average the correct prediction in 46.17 % of cases across the four gold standard datasets. Moreover, on the same datasets, the single top ranked DTI predictions by DASPfind are correct on average in 49.22 % of cases when predicting any of the known DTIs for a single drug, assuming there are no known DTIs for the drug. This last scenario corresponds to the case of predicting DTIs for a new drug without known targets. These results significantly outperform those produced by the other state-of-the-art methods. The notable advantage of DASPfind is demonstrated when considering the single top-ranked predictions and when there are no known or when there are very few known targets for a drug. We verified the utility of our method by providing a list of new predictions (not present in our datasets) several of which had been experimentally confirmed in other studies.
Results and discussion
To measure the performance of our method we compared it with the method reported in , denoted as NRWRH, the method reported in  denoted as DT-Hybrid, and the method reported in  denoted as HGBI. HGBI and DT-Hybrid, to the best of our knowledge, is the most recent works that have shown to outperform other state-of-the-art methods such as NBI  and BLM . HGBI was demonstrated to perform better than NBI and BLM in terms of AUC and the ‘top 1’% of the predictions based on the leave-one-out-cross-validation (LOOCV). For our comparison, we also adopted the LOOCV scheme. The following procedure is repeated for each known DTI. For a drug, a single known DTI was removed from the graph and treated as a testing link. We made predictions of DTIs for that drug and all targets, and ranked in descending order the prediction scores generated. For each specific score threshold, if the testing score of the link is above the threshold, it is considered as true positive, and if the score of an unknown interaction is found above the threshold, it is considered as false positive. By varying the thresholds, we calculated true positive rate (TPR) and false positive rate (FPR) and hence generate the ROC curve. We then use the area under the curve (AUC) to show the overall performance of the method. More practically, we also counted the cases where the correct predictions were among the ‘top 5’, ‘top 2’ or represent the single top-ranked prediction (‘top 1’). ‘Top 5’ means that the link under the test is found among the ‘top 5’ predictions for that specific drug and so we report how many known interactions were found among these ‘top 5’. The same applies to the ‘top 2’ and ‘top 1’ prediction. Overall, the single top-ranked (‘top 1’) predictions are important for the utility of the method, as the aim is to find reliable predictions that can significantly reduce the set of required validation experiments.
Comparison between methods over six different datasets based on LOOCV for each known DTI
‘Top 1’ (%)
‘Top 2’ (%)
‘Top 5’ (%)
Comparison between methods over six different datasets based on LOOCV for each drug, assuming no DTIs are known for each drug. This is equivalent to estimating capacity to predict DTIs for new drugs without known targets
‘Top 1’ (%)
To complement our performance comparison study of different methods, we also used the same criterion as in  for NRWRH, where the predicted DTI is considered correct if it appears to be the top-ranked prediction after the removal of all predicted known DTIs. We demonstrate that in this case too, our method outperforms NRWRH when the targets for a drug are not known, or when there are only a few known targets of the drug (Additional file 1: Table S2, Figure S1).
In our study, we utilized information representing chemical similarity and protein similarity in addition to the information of the drug–protein interactions. In future, we plan to add different types of information, like drug side effects  and information derived from integrating multiple biological databases . The complete lists of ‘top 1’ ranked predictions across all six datasets we used are provided in additional files (Additional file 1: Tables S3–S8).
Fifteen novel ‘top 1’ predictions over the whole ion channel dataset
Type of evidence
[KEGG: D00538, PMID: 20025128]
[KEGG: D00552, PMID: 22874086]
[DrugBank: DB00393, PMID: 17705883]
[GeneCards: CACNA1C, PMID: 16306443]
[KEGG: D00733, Drug2Gene: 103927525]
[KEGG: hsa6328, PMID: 22185904]
[SuperTarget: has776, PubChem: CID39186]
[KEGG: D03830, PMID: 17949410]
[DrugBank: DB00661], unpublished data ( http://edoc.ub.uni-muenchen.de/5321/ )
Our study introduces a method (DASPfind) that infers drug–protein interactions from a heterogeneous graph accompanied with information about similarities between drugs and similarities between targets. DASPfind relies on finding all simple paths of a specific length between any drug–protein pair, efficiently utilizing the drug similarity and protein similarity information and topology of the graph than the other current methods. We show that our method is significantly more accurate than the other state-of-the-art approaches when the single top-ranked DTI predictions are considered, as well as for the new drugs without known DTIs or for drugs with only a few known DTIs. These make DASPfind important and relevant for practical use. We show that our method is able to reliably predict novel DTIs with very high confidence, and validation of these DTIs proved DASPfind’s utility.
Summary of the datasets used in this study
Our method differs from other methods in that it uses, in a specific manner not utilized by other methods, the similarities between drugs and between proteins along with the topology of the heterogeneous graph. For example, our method can utilize the following path over the network: drug1 (start) → protein1 → drug2 → protein2 (end), which maps to the following example path in the bottom-right side of Fig. 1: D00316 → hsa5915 → D00094 → hsa6096. Such a path gives the additional information that drugs interacting with the same target have some degree of similarity between them. Also, in most of the studies utilizing the network structure, all paths over the network contribute equally to the score, while we apply a decay function so that longer paths would have a lower total score.
WB and VBB conceptualized the problem. WB was responsible for solution development and implementation. OS helped define the task, solutions and evaluation of the results. ME was responsible for validating the new predictions. PK and VBB supervised the study. All authors wrote the manuscript. All authors read and approved the final manuscript.
Research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST). The computational analysis for this study was performed on the Dragon and SnapDragon compute clusters of the Computational Bioscience Research Center at KAUST. The authors thank Valentin Rodionov (KAUST) for valuable discussion. We also thank Wenhui Wang (Case Western Reserve University) for help with HGBI and providing us with the dataset used in their study.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL (2010) How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9(3):203–214Google Scholar
- CDER (2014) CDER’s 2013 novel new drugs. U.S. Food and Drug Administration. http://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugInnovation/ucm381263.htm
- Glick M, Jacoby E (2011) The role of computational methods in the identification of bioactive compounds. Curr Opin Chem Biol 15(4):540–546View ArticleGoogle Scholar
- Perlman L, Gottlieb A, Atias N, Ruppin E, Sharan R (2011) Combining drug and gene similarity measures for drug–target elucidation. J Comput Biol 18(2):133–145View ArticleGoogle Scholar
- Hopkins AL (2009) Drug discovery: predicting promiscuity. Nature 462(7270):167–168View ArticleGoogle Scholar
- Cheng AC, Coleman RG, Smyth KT, Cao Q, Soulard P, Caffrey DR, Salzberg AC, Huang ES (2007) Structure-based maximal affinity model predicts small-molecule druggability. Nat Biotechnol 25(1):71–75View ArticleGoogle Scholar
- Xie L, Evangelidis T, Xie L, Bourne PE (2011) Drug discovery using chemical systems biology: weak inhibition of multiple kinases may contribute to the anti-cancer effect of nelfinavir. PLoS Comput Biol 7(4):e1002037View ArticleGoogle Scholar
- Yang L, Wang K, Chen J, Jegga AG, Luo H, Shi L, Wan C, Guo X, Qin S, He G et al (2011) Exploring off-targets and off-systems for adverse drug reactions via chemical-protein interactome—clozapine-induced agranulocytosis as a case study. PLoS Comput Biol 7(3):e1002016View ArticleGoogle Scholar
- Overington JP, Al-Lazikani B, Hopkins AL (2006) How many drug targets are there? Nat Rev Drug Discov 5(12):993–996View ArticleGoogle Scholar
- Carpenter EP, Beis K, Cameron AD, Iwata S (2008) Overcoming the challenges of membrane protein crystallography. Curr Opin Struct Biol 18(5):581–586View ArticleGoogle Scholar
- Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Relating protein pharmacology by ligand chemistry. Nat Biotechnol 25(2):197–206View ArticleGoogle Scholar
- Gonzalez-Diaz H, Prado-Prado F, Garcia-Mera X, Alonso N, Abeijon P, Caamano O, Yanez M, Munteanu CR, Pazos A, Dea-Ayuela MA et al (2011) MIND-BEST: web server for drugs and target discovery; design, synthesis, and assay of MAO-B inhibitors and theoretical-experimental study of G3PDH protein from Trichomonas gallinae. J Proteome Res 10(4):1698–1718View ArticleGoogle Scholar
- Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB et al (2009) Predicting new molecular targets for known drugs. Nature 462(7270):175–181View ArticleGoogle Scholar
- van Laarhoven T, Nabuurs SB, Marchiori E (2011) Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics 27(21):3036–3043View ArticleGoogle Scholar
- Mei JP, Kwoh CK, Yang P, Li XL, Zheng J (2013) Drug–target interaction prediction by learning from local information and neighbors. Bioinformatics 29(2):238–245View ArticleGoogle Scholar
- Soufan O, Ba-alawi W, Afeef M, Essack M, Rodionov V, Kalnis P, Bajic VB (2015) Mining chemical activity status from high-throughput screening assays. PLoS One 10(12):e0144426View ArticleGoogle Scholar
- Bleakley K, Yamanishi Y (2009) Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics 25(18):2397–2403View ArticleGoogle Scholar
- Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y (2012) Prediction of drug–target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8(5):e1002503View ArticleGoogle Scholar
- Zhou T, Ren J, Medo M, Zhang YC (2007) Bipartite network projection and personal recommendation. Phys Rev E Stat Nonlinear Soft Matter Phys 76(4 Pt 2):046115View ArticleGoogle Scholar
- Alaimo S, Pulvirenti A, Giugno R, Ferro A (2013) Drug–target interaction prediction through domain-tuned network-based inference. Bioinformatics 29(16):2004–2008View ArticleGoogle Scholar
- Wang W, Yang S, Li J (2013) Drug target predictions based on heterogeneous graph inference. Pac Symp Biocomput 18:53–64Google Scholar
- Chen X, Liu MX, Yan GY (2012) Drug–target interaction prediction by random walk on the heterogeneous network. Mol BioSyst 8(7):1970–1978View ArticleGoogle Scholar
- Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M (2008) Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24(13):i232–i240View ArticleGoogle Scholar
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V et al (2011) DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res 39(Database issue):D1035–D1041View ArticleGoogle Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36(Database issue):D901–D906Google Scholar
- Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P (2008) Drug target identification using side-effect similarity. Science 321(5886):263–266View ArticleGoogle Scholar
- Chen B, Ding Y, Wild DJ (2012) Assessing drug target association using semantic linked data. PLoS Comput Biol 8(7):e1002574View ArticleGoogle Scholar
- Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30View ArticleGoogle Scholar
- Gunther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ et al (2008) SuperTarget and Matador: resources for exploring drug–target relationships. Nucleic Acids Res 36(Database issue):D919–D922Google Scholar
- Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37(Web server issue):W623–W633View ArticleGoogle Scholar
- Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D (1997) GeneCards: integrating information about genes, proteins and diseases. Trends Genet 13(4):163View ArticleGoogle Scholar
- Beyder A, Strege PR, Bernard C, Farrugia G (2012) Membrane permeable local anesthetics modulate Na(V)1.5 mechanosensitivity. Channels 6(4):308–316View ArticleGoogle Scholar
- Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768View ArticleGoogle Scholar
- Roider HG, Pavlova N, Kirov I, Slavov S, Slavov T, Uzunov Z, Weiss B (2014) Drug2Gene: an exhaustive resource to explore effectively the drug–target relation network. BMC Bioinform 15:68View ArticleGoogle Scholar
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(Database issue):D354–D357View ArticleGoogle Scholar
- Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D (2004) BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 32(Database issue):D431–D433View ArticleGoogle Scholar
- Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 125(39):11853–11865View ArticleGoogle Scholar
- Smith TF, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197View ArticleGoogle Scholar