In Silicotarget fishing: addressing a “Big Data” problem by ligand-based similarity rankings with data fusion
© Liu et al.; licensee Chemistry Central Ltd. 2014
Received: 12 March 2014
Accepted: 10 June 2014
Published: 18 June 2014
Ligand-based in silico target fishing can be used to identify the potential interacting target of bioactive ligands, which is useful for understanding the polypharmacology and safety profile of existing drugs. The underlying principle of the approach is that known bioactive ligands can be used as reference to predict the targets for a new compound.
We tested a pipeline enabling large-scale target fishing and drug repositioning, based on simple fingerprint similarity rankings with data fusion. A large library containing 533 drug relevant targets with 179,807 active ligands was compiled, where each target was defined by its ligand set. For a given query molecule, its target profile is generated by similarity searching against the ligand sets assigned to each target, for which individual searches utilizing multiple reference structures are then fused into a single ranking list representing the potential target interaction profile of the query compound. The proposed approach was validated by 10-fold cross validation and two external tests using data from DrugBank and Therapeutic Target Database (TTD). The use of the approach was further demonstrated with some examples concerning the drug repositioning and drug side-effects prediction. The promising results suggest that the proposed method is useful for not only finding promiscuous drugs for their new usages, but also predicting some important toxic liabilities.
With the rapid increasing volume and diversity of data concerning drug related targets and their ligands, the simple ligand-based target fishing approach would play an important role in assisting future drug design and discovery.
For many decades, the drug discovery and development have been directed by the idea of ‘one drug–one target–one disease’. The paradigm is shifting since many drugs elicit their therapeutic activities by modulating multiple targets, as indicated by the polypharmacology [1–3]. However, multi-target interactions are either unknown or insufficiently understood in most cases, which inspired many efforts to predict and characterize drug–target associations.
Use of in silico tools to predict targets of small molecules has drawn more and more attentions in recent years. These predicted drug targets can be divided into two types: I) unexploited novel drug targets that can be used alone or with other drugs in combination chemotherapy treatment ; II) existing drug targets that provide new uses and indications for existing drugs . One of the most prominent examples for drug repositioning is Sildenafil, which was initially developed for use for hypertension and angina, and then repositioned for the treatment of male erectile dysfunction . Other notable drug repositioning examples include Memantine , Buprenorphine , Requip [8, 9], Colesevelam , and so on. Numerous computational strategies for target fishing have been published. These studies enable researchers to deepen the understanding of the bioactive space of new chemical entities, which provide an efficient way in designing ligands with favorable pharmacological and safety profile. Generally, available target fishing approaches fall into the following two major categories:
1. Target-based Methods
Target-based methods use the information of target proteins, which includes molecular docking, similarity comparison of protein sequence or binding pocket, and so on. For example, INVDOCK  and TarFisDock  screen a query small molecule against a panel of predefined target protein structures whereby putative targets are sorted by docking score . This approach has been demonstrated to be useful in target identification, and some of the predicted results have been verified by bioassay and crystallographic studies [14, 15]. Although significant improvements have been made in this area, there are still practical limitations for target structure-based approaches, such as unavailable crystal structures (especially for most trans-membrane proteins), high false positive rate, the choice of an appropriate scoring function and high requirement of computational resources . To circumvent these issues, several target-based methods relying on the analysis of existing drug-target interaction data have been developed. For instance, Luo et al. developed a web server DRAR-CPI to identify drug repositioning and adverse drug reactions by mining chemical–protein interactome . Milletti et al. and Wang et al. predicted polypharmacology by comparing the structural similarity of binding sites. Recently, Jacob et al. and Wang et al. constructed chemogenomics approaches for qualitatively predicting ligand-protein interaction that only require the primary sequence of proteins and the structural features of small molecules. These approaches transform the target fishing problem to a machine learning problem in the ligand–target space. Though potentially useful, they are sensitive to how a given target protein or ligand-protein pair is represented by descriptor vectors, and have a limited application domain defined by their training set range.
2. Ligand-based Methods
Ligand-based methods simplify the problem to a similarity searching problem, and only use ligand information to predict target. Compared with the structure-based approaches, ligand-based approaches do not rely on the complete knowledge of ligand-target interaction mechanisms and requires relatively low computational cost. Based on how a given ligand is represented, these methods can be divided as 2D fingerprint, molecular shape, pharmacophore, and bioactivity spectrum-based, etc.
Chemically similar drugs often bind biologically relevant protein targets. To uncover the pharmacological relationships among proteins, Keiser et al. developed a statistics-based chemoinformatics approach called similarity ensemble approach (SEA) , in which each target was represented solely by the structures of its set of known ligands. SEA has been applied to quantitatively identify pharmacological links between targets by the similarity of the ligands bind to them, expressed as expectation values (E-value). It was further successfully applied to large-scale test for drug repurposing . Furthermore, three dimensional (3D) molecular shape descriptors have turned out to be especially successful in describing and comparing molecular profiles. Abdul Hameed et al. developed a novel approach by comparing shape similarity using program ROCS . In their approach, target profiles were generated for a given query molecule by computing the maximal 3D-shape and chemistry-based similarity to the collection of drugs assigned to each protein target . Pharmacophore, like molecular docking, can also be reversely used for in silico drug target identification. Recently, Liu et al. reported a free web interface PharmMapper that uses pharmacophore to predict protein targets for small molecules . This approach automatically performs reverse mapping against the deposited pharmacophore models and outputs the top ranked hits. With the rapid growth in bioactivity data of small molecules and their targets, it is possible to employ the information to infer targets for drugs or bioactive compounds. Cheng et al. developed an approach named bioactivity profile similarity search (BASS), for associating targets to small molecules by comparing the bioactivity profiles that are derived from the NCI-60 cell lines .
A notable strategy for similarity searching is data fusion (DF) that utilizes multiple reference structures to search against a database. A DF process is to combine the information provided by multiple independent sensors in order to make judgments on an event, which was firstly proposed by Peter Willett and his coworkers . Afterwards, Whittle et al.  and Hert et al. used 2D fingerprint similarity ranking with DF for virtual screening, and demonstrated its effectiveness over conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules . Due to its high searching quality and low computational cost, this approach is especially fit for the exponential growth in biological data.
Although many advances have been made over the last decades, drug target prediction is still a very challenging task as reflected by the low clinical target validation success rate. The reasons are manifold, yet what poses the greatest difficulty might be the amount of protein targets and known active small molecules. For example, the current version ChEMBL database (version 17)  contains 12,077,491 bioassay data for 9,356 targets and 1,324,941 compounds. Such data collection is so large and complex that it becomes difficult to process using traditional molecular modeling process and target-ligand interaction applications. In this regard, we may consider the target fishing as a ‘big data’ problem. As defined by Donglas Laney , big data problems mainly have three aspects of features of data growth, i.e. having increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). For target fishing, vast amount of data in various measurements (Ki, Kd, IC50, inhibition rates and so on) are being generated daily from different sources, with the fast development of many high-throughput bioassay systems. Given the data of these features, we need firstly make a practical trade-off between the amount of employed data and the complexity of models. Though under debate, it has been widely realized that using more data is more beneficial, because it provides the contextual richness in data and does not rely on unproven assumptions and weak correlations. From this aspect, we may argue that more emphasis should be placed on the data set used for target fishing, instead of developing algorithms that are more sophisticated. In this study, we try to address the target fishing problem from the ‘big data’ perspective. A large reference ligand library is first established, with each ligand set to represent a single target. Here, the DF strategy is adopted to calculate the highest K similarity scores (or their average value) between the query and the ligands sets in reference library, using Tanimoto coefficients (Tc) of ECFP4 fingerprints. The value of K can be 1, 3, and 5 (denoted as Max, 3NN, and 5NN, respectively), and the average fusion similarity is a centroid score, which is described in Methods section. The target profile of a query chemical is then provided according to the ranked fusion scores. The performance of this scheme is tested on two test sets, and a further validation is made to identify new (off-) targets and hERG related toxicity. The aim of the study is to benchmark the target fishing capability by using a simple ligand-based similarity searching approach, in the meantime, by employing the available data as much as possible.
The SEA approach represents a notable recent advance in identifying protein targets. Here, a locally implemented SEA approach was run in parallel with our approach for accuracy assessment, of which the E-value was used to rank potential targets . We perform this comparison because both SEA and our approach use active ligand set to represent target, and use 2D fingerprint based similarity to obtain the score of a target (The SEA approach can be considered as a data fusion scheme, where the score of a target is normalized by the size of its ligand set). It should be pointed out that SEA requires that the product of the ligand set sizes is not less than 100 to guarantee statistically reliable result . It means that the current SEA is not appropriate for fishing targets without sufficient reference ligands. Nevertheless, its result can serve as a control to see how existing approaches perform on the current data set.
Results and discussion
Given a ligand that has m experimentally verified targets, a target fishing scheme yields n predicted targets for the ligand (i.e., the top ranked n targets), we used the following evaluation metrics to measure the performance of the scheme: Precision (PR n ), Recall (RE n ), F-measure (F n ) , and the uninterpolated precision (PR’) . PR’ is given by the averaged precision values PR i from the ranking places 1 to m. Here, m for a query ligand is the number of its interacting targets. The detailed definitions of these terms are provided in the Methods section.
1. Ten-fold cross-validation
The result ( PR’ ) of 10-folds cross-validation on the reference set
2. The performance of 3NN with increasing size of reference Set
In general, 3NN achieves high PR’ in the internal validation, which is partially attributed to the close analogues that exists in the both test and reference sets. To assess the performance of the approach further on practical cases, two external tests are performed and analyzed in the following section.
3. Predicting targets for approved drugs from drug bank and TTD
Many drugs from a wide range of therapeutic areas have more than one interacting targets, and the multiple on-target and off-target bindings are essential for their efficacy and side effects. For example, the number of reported interacting targets for the drugs treating central nervous system disorders is even up to 64 in our validation set. We compared 3NN and SEA for target identification for these multi-target drugs. For each test compound, we considered the top 20 predicted targets, in terms of the metrics including PR n , RE n , F n and PR’. The averaged results of the 711 drugs presented in the DrugBank and 476 drugs in TTD are reported.
From Figure 6(B), we may also notice that 3NN and SEA show a similar tendency on TTD, of which the F n curves rapidly decline when n > 3. It suggests that the therapeutic targets can be well identified in the top three predictions, and considering more targets ranked outside the top three would result in a significant number of false predictions. However, if one aims to predict non-therapeutic targets as well, the prediction rank list should be extended. As shown in Figure 6(A), the decline of 3NN is still slow when n > 6. Another point of notice is that for both 3NN and SEA the maximum F n value obtained on TTD is higher that on DrugBank. This observation suggests that therapeutic targets could be more reliably predicted. One of the possible reasons is that the therapeutic targets usually form specific interaction with their corresponding drugs with high affinities. However, the non-therapeutic targets, e.g. CYP450s, may exhibit enormous promiscuity, and they interact with a huge range of structurally unrelated ligands. The weak and non-specific interactions may lead to inferior performance on predicting drugs interacting with these targets. More details about this test are provided in Additional file 1: Table S1 and Table S2.
Alternative target fishing methods include 3D similarity searching methods as well as those based on machine learning. The 3D similarity searching methods rely on the generation of active conformations for both references and queries, which are difficult to obtain for some flexible compounds and involve high computational cost. For the machine learning methods, both known active and inactive molecules should be present to form a training set. However, the true inactive data are hardly available in most public databases, thus significantly restricted their usages in target fishing. In comparison, the 2D similarity searching methods only require the positive data and the chemical fingerprints fast to compute, making it an efficient method for large-scale target fishing.
4. Identification of New (Off-) target-drug interactions
From the previous analysis, we may notice that 3NN DF scheme based on a large reference set is suitable for the ligands with multiple targets. It is therefore interesting to investigate its performance on identifying the new and off-targets from the experimentally verified drug-target associations. To this end, we tested 3NN using Keiser’s data that were previously used to verify the prediction of SEA . The first test set includes within-boundary predictions for 10 GPCR drugs and cross-boundary predictions for 4 non-GPCR drugs, and the second set includes 32 drugs with 39 off-targets associations.
Target ranking results of the 3NN scheme on the novel drug-target association set of Keiser et al .
New (off-) targets
New aminergic GPCR targets
αe adrenergic blocker
αn adrenergic blocker
βa adrenergic agonist
Adrenergical blocker; antihypertensive; antimigraine
5-HT reuptake inhibitor; antidepressant
β adrenergic blocker
Antiemetic; peristaltic stimulant
αe adrenergic blocker
5-HT reuptake inhibitor; antidepressant
β adrenergic blocker
New cross-class targets
αe adrenergic receptor (GPCR)
HIV-1reverse transcriptase (enzyme)
H4 receptor (GPCR)
NMDAR (ion channel)
μMDAR (i receptor (GPCR)
NMDAR (ion channel)
D4 receptor (GPCR)
κoradren receptor (GPCR)
Alcohol Deterrent Antiamyloidogenic Agent Antipsychotic Treatment of Cocaine Dependency
Prolactin secretion inhibitor
ACE Inhibitor Antihypertensive Cardiotonic
Leukotriene A4 Hydrolase Inhibitor
Adrenergic (β) Blocker
Adrenergic (β) Agonist
MAO A Inhibitor
Antihistaminic Inflammatory Bowel Disease
Farnesyl Protein Transferase Inhibitor
Adrenoceptor (renoceptor Vas
α- adrenergic Blocker
Antiulcerative Enzyme inhibitor Enzyme inhibitor (Histidine decarboxylase)
Xanthine Oxidase Inhibitor
Antisecretory (gastric acid) Antiulcerative
αr adrenergic Blocker Antihypertensive
Cinitapride hygrogen tartrate
Antiulcerative Stimulant, Peristaltic
Antiparkinsonian Dopamine Autoreceptor Agonist Prolactin Secretion Inhibitor
Adrenergic Agents Adrenergic Uptake Inhibitors Central Nervous System Stimulants Dopamine Agents Dopamine Uptake Inhibitors Sympathomimetics
Antiparkinsonian, Dopamine Agonist
Squalene Epoxidase Inhibitor
αr adrenergic Blocker Antihypertensive
5. hERG toxicity prediction
Target ranking results of the 3NN scheme on nine drugs with hERG toxicity
3NN ranking (TT)
3NN ranking (hERG)
5-hydroxytryptamine 2A/3/4 receptor
5-hydroxytryptamine 2A/2C/6 receptor
α-hydroxytr adrenergic receptor
Histamine H1 receptor
Potassium voltage-gated channel subfamily H member 2
DNA topoisomerase 4 subunit A
DNA gyrase subunit A
DNA topoisomerase 2α
D2 dopamine receptor
α2 adrenergic receptor
μAceta opioid receptor
Neuronal acetylcholine receptor
1. Reference set preparation
DrugBank provides a list of FDA-approved drug targets, among which all protein sequences of drug targets of small molecules were downloaded. Sequences of protein targets deposited in BindingDB were used to create a local BLAST database with NCBI blast .
The downloaded sequences from DrugBank were used to perform similarity search against the local BLAST database, to find drug-target related targets in BindingDB and to retrieve their interacting ligands. Using an E-value threshold (1E-50), we obtained target mapping between BindingDB sequences and drug target sequences from DrugBank. A protein target of BindingDB exhibiting high homology with any of the drug targets was considered as a potential drug target.
The ligands were further filtered to eliminate those with weak binding affinity to a specific protein. The threshold for “active” ligand was set as IC50, Ki, Kd or EC50 < 10 μM, or ΔG <28.53 kJ/mol.
The above retrieved protein targets were redundant (i.e. there are identical proteins with different names), and some of them are highly homologous to each other (e.g. mutants or from different source organisms). To address the issue, we combined the proteins showing high sequence similarities by another round of BLAST searches, with a more stringent E-value threshold of 1E-120. All the active ligands of a “combined” protein target were pooled together. The resulting database contained 725 targets in all.
To ensure every target has a certain amount of ligand representatives, we filtered those targets whose active ligands were less than or equal to 10. At last, our curated database covers 533 targets with 179,807 active ligands in total. Approved drugs are used as an independent test set for additional validation.
This established chemical reference library is organized according to DRTs, and each of them is represented by a set of corresponding active ligands. In our reference library, the ligand set contains unique ligands for each target. All the data preparation procedures are performed with in-house Python scripts. The reference library is designed to enable further updating by adding new target-ligand interaction data.
2. Validation sets preparation
The following two datasets were used to test the target predicting performance of different approaches, including approved drugs from Therapeutic Target Database (TTD) , and approved drugs from DrugBank 3.0 . These datasets contains drug or drug-like compounds and their protein target sequences. For each set, the small molecules existing in the reference library were firstly removed, and the sequences were mapped onto DRTs by similarity searching against the local BLAST database mentioned above.
With the rapid advancement of high-throughput screening technology, the shear amount of bioassay data is so huge and increasing so fast that many traditional frameworks encounter difficulties on launching a large campaign of target fishing. The exploration of more efficient approach in the context of ‘big data’ is needed for the challenging task. In this study, we exploited a simple scheme using 2D fingerprint similarity ranking with a DF strategy to predict drug-relevant targets based on a reference library containing 533 targets with 179,807 active ligands. This scheme exhibits good performance on predicting both therapeutic and non-therapeutic targets for the approved drugs from DrugBank and TTD. It can also reproduce 62 out of 65 new drug-target associations identified by SEA, and successfully predict both on-target and off-target interactions for 9 drugs withdrawn due to hERG toxicity. Encouraged by the results, we expect that the proposed scheme will enable large-scale target fishing, which is useful for both systematically identifying the new uses of old drugs and exploring the molecular basis of their adverse events.
1. Similarity fusion for target fishing
KNN score (KSj) is the average similarity of K most similar ligands of the target j to the query;
Max score (MSj) is a special case of KNN when K equals to 1, which only considers the most similar ligand of the target j to the query;
Centroid score (CSj) is the average similarity of Nj ligands of the target j to the query.
2. Evaluation metrics
In this study, PR n (eq.1) means the fraction of positive predictions that are “true” (experimentally verified targets) where TP n is the number of true positive prediction in the top ranked n targets; RE n (eq.2) means the fraction of the “true” targets that can be recognized (predicted as positive). Both PR n and RE n are therefore based on an understanding and measure of a model's ability to identify true targets. F n (eq.3) is the harmonic mean of PR n and RE n , and a higher F n score means a better performance on discriminating true targets based on an overall consideration.
The PR’ was introduced by Amini et al. For every correctly predicted target that appears at the i-th position in the top m ranked targets, which corresponds to the number of true targets of the ligand, the precision value at that position PR i was calculated. PR’ is given by the averaged precision values PR i from the ranking places 1 to m (eq.4). According to this definition, the relevant targets that do not appear in the top m ranked targets receive a precision score of 0. In the end, the averaged values of the PR n , RE n , F n and PR’ for all compounds of validation datasets were reported.
This work was supported by the Hi-TECH Research and Development Program of China (Grant 2012AA020302), the National Science and Technology Major Project “Key New Drug Creation and Manufacturing Program” (Grants 2013ZX09507001 and 2014ZX09507002), and the National Natural Science Foundation of China (Grants 81230076 and 21210003).
- Boran ADW, Iyengar R: Systems approaches to polypharmacology and drug discovery. Curr Opin Drug Discovery Dev. 2010, 13: 297-309.Google Scholar
- Knight ZA, Lin H, Shokat KM: Targeting the cancer kinome through polypharmacology. Nat Rev Cancer. 2010, 10: 130-137. 10.1038/nrc2787.View ArticleGoogle Scholar
- Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL: Global mapping of pharmacological space. Nat Biotechnol. 2006, 24: 805-815. 10.1038/nbt1228.View ArticleGoogle Scholar
- Ashburn TT, Thor KB: Drug repositioning: Identifying and developing new uses for existing drugs. Nat Rev Drug Discovery. 2004, 3: 673-683. 10.1038/nrd1468.View ArticleGoogle Scholar
- Terrett NK, Bell AS, Brown D, Ellis P: Sildenafil (VIAGRA(TM)), a potent and selective inhibitor of type 5 cGMP phosphodiesterase with utility for the treatment of male erectile dysfunction. Bioorg Med Chem Lett. 1996, 6: 1819-1824. 10.1016/0960-894X(96)00323-X.View ArticleGoogle Scholar
- Reisberg B, Doody R, Stoffler A, Schmitt F, Ferris S, Mobius HJ, Grp MS: Memantine in moderate-to-severe Alzheimer’s disease. N Engl J Med. 2003, 348: 1333-1341. 10.1056/NEJMoa013128.View ArticleGoogle Scholar
- Bodkin JA, Zornberg GL, Lukas SE, Cole JO: Buprenorphine Treatment of Refractory Depression. J Clin Psychopharmacol. 1995, 15: 49-57. 10.1097/00004714-199502000-00008.View ArticleGoogle Scholar
- Eden RJ, Costall B, Domeney AM, Gerrard PA, Harvey CA, Kelly ME, Naylor RJ, Owen DAA, Wright A: Preclinical Pharmacology of Ropinirole (Sk-and-F-101468-a) a Novel Dopamine-D2 Agonist. Pharmacol Biochem Be. 1991, 38: 147-154. 10.1016/0091-3057(91)90603-Y.View ArticleGoogle Scholar
- Tompson DJ, Vearer D: Steady-state pharmacokinetic properties of a 24-hour prolonged-release formulation of ropinirole: Results of two randomized studies in patients with Parkinson’s disease. Clin Ther. 2007, 29: 2654-2666. 10.1016/j.clinthera.2007.12.010.View ArticleGoogle Scholar
- Davidson MH, Dillon MA, Gordon B, Jones P, Samuels J, Weiss S, Isaacsohn J, Toth P, Burke SK: Colesevelam hydrochloride (Cholestagel) - A new, potent bile acid sequestrant associated with a low incidence of gastrointestinal side effects. Arch Intern Med. 1999, 159: 1893-1900. 10.1001/archinte.159.16.1893.View ArticleGoogle Scholar
- Chen X, Ung CY, Chen YZ: Can an in silico drug-target search method be used to probe potential mechanisms of medicinal plant ingredients?. Nat Prod Rep. 2003, 20: 432-444. 10.1039/b303745b.View ArticleGoogle Scholar
- Gao ZT, Li HL, Zhang HL, Liu XF, Kang L, Luo XM, Zhu WL, Chen KX, Wang XC, Jiang HL: PDTD: a web-accessible protein database for drug target identification. BMC Bioinf. 2008, 9: 104-10.1186/1471-2105-9-104.View ArticleGoogle Scholar
- Rognan D: Structure-Based Approaches to Target Fishing and Ligand Profiling. Mol Inf. 2010, 29: 176-187. 10.1002/minf.200900081.View ArticleGoogle Scholar
- Jeong CH, Bode AM, Pugliese A, Cho YY, Kim HG, Shim JH, Jeon YJ, Li H, Jiang H, Dong Z: -Gingerol suppresses colon cancer growth by targeting leukotriene A4 hydrolase. Cancer Res. 2009, 69: 5584-5591. 10.1158/0008-5472.CAN-09-0491.View ArticleGoogle Scholar
- Cai J, Han C, Hu T, Zhang J, Wu D, Wang F, Liu Y, Ding J, Chen K, Yue J: Peptide deformylase is a potential target for anti‒Helicobacter pylori drugs: reverse docking, enzymatic assay, and X‒ray crystallography validation. Protein Sci. 2006, 15: 2071-2081. 10.1110/ps.062238406.View ArticleGoogle Scholar
- Koutsoukas A, Simms B, Kirchmair J, Bond PJ, Whitmore AV, Zimmer S, Young MP, Jenkins JL, Glick M, Glen RC, Bender A: From in silico target prediction to multi-target drug design: Current databases, methods and applications. J Proteomics. 2011, 74: 2554-2574. 10.1016/j.jprot.2011.05.011.View ArticleGoogle Scholar
- Luo H, Chen J, Shi L, Mikailov M, Zhu H, Wang K, He L, Yang L: DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical-protein interactome. Nucleic Acids Res. 2011, 39: W492-W498. 10.1093/nar/gkr299.View ArticleGoogle Scholar
- Milletti F, Vulpetti A: Predicting Polypharmacology by Binding Site Similarity: From Kinases to the Protein Universe. J Chem Inf Model. 2010, 50: 1418-1431. 10.1021/ci1001263.View ArticleGoogle Scholar
- Wang J, Li ZX, Qiu CX, Wang D, Cui QH: The relationship between rational drug design and drug side effects. Briefings Bioinf. 2012, 13: 377-382. 10.1093/bib/bbr061.View ArticleGoogle Scholar
- Jacob L, Vert JP: Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008, 24: 2149-2156. 10.1093/bioinformatics/btn409.View ArticleGoogle Scholar
- Wang F, Liu DX, Wang HY, Luo C, Zheng MY, Liu H, Zhu WL, Luo XM, Zhang J, Jiang HL: Computational Screening for Active Compounds Targeting Protein Sequences: Methodology and Experimental Validation. J Chem Inf Model. 2011, 51: 2821-2828. 10.1021/ci200264h.View ArticleGoogle Scholar
- Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK: Relating protein pharmacology by ligand chemistry. Nat Biotechnol. 2007, 25: 197-206. 10.1038/nbt1284.View ArticleGoogle Scholar
- Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KLH, Edwards DD, Shoichet BK, Roth BL: Predicting new molecular targets for known drugs. Nature. 2009, 462: 175-181. 10.1038/nature08506.View ArticleGoogle Scholar
- Rush TS, Grant JA, Mosyak L, Nicholls A: A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. J Med Chem. 2005, 48: 1489-1495. 10.1021/jm040163o.View ArticleGoogle Scholar
- AbdulHameed MDM, Chaudhury S, Singh N, Sun H, Wallqvist A, Tawa GJ: Exploring Polypharmacology Using a ROCS-Based Target Fishing Approach. J Chem Inf Model. 2012, 52: 492-505. 10.1021/ci2003544.View ArticleGoogle Scholar
- Liu XF, Ouyang SS, Yu BA, Liu YB, Huang K, Gong JY, Zheng SY, Li ZH, Li HL, Jiang HL: PharmMapper server: a web server for potential drug target identification using pharmacophore mapping approach. Nucleic Acids Res. 2010, 38: W609-W614. 10.1093/nar/gkq300.View ArticleGoogle Scholar
- Cheng TJ, Li QL, Wang YL, Bryant SH: Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining. J Chem Inf Model. 2011, 51: 2440-2448. 10.1021/ci200192v.View ArticleGoogle Scholar
- Ginn CMR, Willett P, Bradshaw J: Combination of molecular similarity measures using data fusion. Perspect Drug Discovery Des. 2000, 20: 1-16. 10.1023/A:1008752200506.View ArticleGoogle Scholar
- Whittle M, Gillet VJ, Willett P, Alex A, Loesel J: Enhancing the effectiveness of virtual screening by fusing nearest neighbor lists: A comparison of similarity coefficients. J Chem Inf Comput Sci. 2004, 44: 1840-1848. 10.1021/ci049867x.View ArticleGoogle Scholar
- Hert J, Willett P, Wilton DJ: Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J Chem Inf Comput Sci. 2004, 44: 1177-1185. 10.1021/ci034231b.View ArticleGoogle Scholar
- Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A: New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching. J Chem Inf Model. 2006, 46: 462-470. 10.1021/ci050348j.View ArticleGoogle Scholar
- Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kruger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP: The ChEMBL bioactivity database: an update. Nucleic Acids Res. 2014, 42: D1083-D1090. 10.1093/nar/gkt1031.View ArticleGoogle Scholar
- Beyer MA, Laney D: The Importance of ‘Big Data’: A Definition. [http://www.gartner.com/doc/2057415/importance-big-data-definition],
- Keiser MJ, Hert J: Off-target networks derived from ligand set similarity. Chemogenomics.575 575. Edited by: Jacoby E. 2009, New York: Humana Press, 195-205.View ArticleGoogle Scholar
- Powers DM: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J of Mach Lear Tech. 2011, 2: 37-63.Google Scholar
- Amini M-R, Truong T-V, Goutte C: A boosting algorithm for learning bipartite ranking functions with partially labeled data. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, USA: ACM press, 99-106.Google Scholar
- Roden DM: Drug therapy: Drug-induced prolongation of the QT interval. N Engl J Med. 2004, 350: 1013-1022. 10.1056/NEJMra032426.View ArticleGoogle Scholar
- Sanguinetti MC, Tristani-Firouzi M: hERG potassium channels and cardiac arrhythmia. Nature. 2006, 440: 463-469. 10.1038/nature04710.View ArticleGoogle Scholar
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40: D1100-D1107. 10.1093/nar/gkr777.View ArticleGoogle Scholar
- Liu TQ, Lin YM, Wen X, Jorissen RN, Gilson MK: BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007, 35: D198-D201. 10.1093/nar/gkl999.View ArticleGoogle Scholar
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS: DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res. 2011, 39: D1035-D1041. 10.1093/nar/gkq1126.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.View ArticleGoogle Scholar
- Zhu F, Shi Z, Qin C, Tao L, Liu X, Xu F, Zhang L, Song Y, Liu XH, Zhang JX, Han BC, Zhang P, Chen YZ: Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery. Nucleic Acids Res. 2012, 40: D1128-D1136. 10.1093/nar/gkr797.View ArticleGoogle Scholar
- Rogers D, Hahn M: Extended-Connectivity Fingerprints. J Chem Inf Model. 2010, 50: 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Pipeline Pilot. San Diego: Accelrys Software Inc, CA92121Google Scholar
- Flower DR: On the properties of bit string-based measures of chemical similarity. J Chem Inf Comput Sci. 1998, 38: 379-386. 10.1021/ci970437z.View ArticleGoogle Scholar
- Willett P, Barnard JM, Downs GM: Chemical similarity searching. J Chem Inf Comput Sci. 1998, 38: 983-996. 10.1021/ci9800211.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.