Analysis of in vitrobioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds
© Southan et al; licensee Chemistry Central Ltd. 2011
Received: 14 March 2011
Accepted: 13 May 2011
Published: 13 May 2011
Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years.
From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical in vitro binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis.
The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability.
An important factor in assessing the global progress in drug research is the number of targets for which therapeutic small-molecule modulators have been, are being, or could be, generated. This question was addressed in the landmark publication in 2002 that introduced the "druggable genome" concept .
This total of approximately 3,000 human proteins was arrived at by homologous family extrapolation from the targets of approved drugs at that time. The count of successful targets was updated in 2006 and stood then at 324, of which the subset of human proteins was 207 . Despite many publications covering this topic, the inclusion of explicit listings of target identifiers, extrinsic to the data sets from which they were derived, are rare, with the partial exception of a poster that included 185 human targets of approved oral drugs .
Notwithstanding, there are now public databases from which it is possible to browse and extract targets with explicit links to bioactive compounds. DrugBank is one such resource . It has a total of 6,827 drug entries including 1,431 FDA-approved small molecule drugs and 5,212 research compounds linked to 4,477 non-redundant protein sequences. These include primary targets, cross-screening targets, metabolising enzymes and associations inferred from compound name with protein name co-occurrences automatically extracted from the literature. The Therapeutic Targets Database (TTD) contains conceptually similar information to DrugBank but organised into a different data structure . It provides sequence subsets of their total of 1,675 targets divided into 348 approved, 260 clinical trial and 1,067 research targets. The BindingDB resource also includes approved and research targets with a focus on measured small-molecule binding affinities and ligands. It currently includes 5,526 protein targets and 271,419 compounds . The largest public resource of this type is the ChEMBL database with 8,091 targets and 658,075 compounds extracted from medicinal chemistry journal papers (N.B. a subset of ChEMBL data is now incorporated into BindingDB) . Three of the databases above, DrugBank, TTD and ChEMBL, have recently been included in a comparative study of compounds and targets .
Databases and Processing
The company GVKBIO  has developed a suite of databases over the last 9 years that are now unified under a single query interface, termed GVKBIO Online Structure Activity Relationships (GOSTAR) [9, 10]. The results we present are from two of the six GOSTAR components, the Medicinal Chemistry (MCD) and Target (TGD) Databases. Their combined utility for mining drug research data has already been described [11–14]. In addition, the comparison of compound and target content of these with other bioactivity databases has been reported in publications that included the expansion of coverage between 2006 and 2008 [15, 16].
At GVKBIO the relationships between these five entities of document, assay description, assay result, compound structure and protein target (D-A-R-C-P) are manually abstracted by a team of expert curators and transferred to document-centric relational databases. These contain data predominantly from the research phases of drug discovery but, because this extends back over 30 years, much of the primary data for approved drugs is included. The difference between them is that MCD extracts data from 120 journals selected for their high content of D-A-R-C-P relationships on a per-journal basis. TGD extracts the same relationships from patents covering the "big ten" target classes (kinases, GPCRs, proteases, nuclear hormone receptors, ion-channels, transporters, lipases, phosphatases, oxidoreductases and transferases). The process involves a triage to select a representative of the patent family for extraction. The addition of compounds to the database is limited to exemplified structures linked to quantitative or qualitative assay data. While all structures with quantitative results are extracted, where the activity data is ranged or only qualitative, the number of compounds extracted is capped at 200 or 100 examples, respectively .
Details of these databases are described elsewhere but briefly, structures and related metadata for the GOSTAR database records are stored in an Oracle database . The compound counts are defined by a unique structure identifier based on the Standard InChIKey . Protein information was added using NCBI Entrez Gene as primary source for protein (gene) names and identifiers (EGID) . Where documents specified distinct alternative splice forms in assays, the common name used by the authors for that splice form was included with the EGID.
Target classes were assigned according to an internal schema. GVKBIO internally developed tools were also used to generate frameworks, scaffolds, and graph skeletons. The data was mined by running SQL queries against MCD and TGD subsets of the GOSTAR database. Additional filters were species, targets having an Entrez Gene name and assay type. Tables and graphs were generated in Excel.
Results and Discussion
Content statistics and stringency triages for the combination of MCD and TGD.
Unique compound structures
Unique compound structures from patents
Unique compound structures from journals
Total quantitative assay results
Quantitative assay results from papers
Quantitative assay results from patents
Type-B assay results
Target names (all species) with type-B assay results
Protein identifiers (all species) with type-B assay results
Human proteins with type-B assay results
Human gene identifiers with type-B assay results
Unique compounds linked to human protein identifiers with type-B assay results
The following aspects can be expanded. The average redundancy (records-per-unique structure) is 1.5 because some compounds, particularly reference reagents and established drugs, have assay data included in many documents. The predominant assay type is termed "type-B" or binding assay because it encompasses the enzyme inhibition and receptor binding assays most commonly reported for compounds tested against molecular targets in vitro and, implicitly, with binding specificity. The last three rows show the stringency used to define the final target listing. The target names in row 12 encompass both defined and undefined molecular targets (e.g. protein complexes or unresolved subfamilies) that are linked to compounds via a type-B assay result. These are further restricted in row 13 to only those molecular targets mapping to a protein identifier (e.g. an Entrez Gene ID or a Swiss-Prot accession). We added a final restriction to human sequences (row 14). We made this simplification choice for two reasons. The first was to exclude the many proteins used as cross-screening targets from mouse, rat and other model organisms. The second reason is that resolving anti-infective molecular target protein IDs also comes up against the problem of orthologous redundancy due to the multiplicity of viral, as well as bacterial sub-types, strains and species.
Counting Distinct Human Protein Targets
Protein Identifier Content for Additional file 1
Distinct protein names
Entrez Gene ID (EGIDs)
Symbol matching HGNC
Splice form names
EGIDs with Splice forms
To maximise the curatorial specificity of mapping compounds to protein sequences, a number of splice form designations are includes where these names have been used in assays descriptions (mainly from journal papers in MCD). These cases produced 135 entries for 48 Entrez Gene IDs (EGIDs). While, in general, only small numbers of compounds are linked to these non-canonical protein sequences (i.e. alternative splice forms of the UniProt or RefSeq sequences corresponding to the EGID), these are important to capture for pharmacological differences. The human EGID total in Additional file 1 is thus 1,654.
Ranking of top-50 targets by numbers of compounds and documents.
Binned distribution of compounds-per-target.
Targets above bin
1051 (90% total)
Inspection of our results indicated, not unexpectedly, a correlation between the number of compounds and number of documents. However, this was a very broad distribution because the extraction averages (given in Table 1) of 14 compounds-per-paper and 44 compounds-per-patent, varied by at least one order of magnitude for the former and two orders of magnitude for the latter. In the following section target proteins will thus be referred to by their rank on the basis of compounds. Those within the top-50 are listed in Table 2 while any below these in the ranking are listed in Additional file 1. The triage we have used is stringent in that it maps 28% of the compounds and 22% of the documents in Table1. Consequently, it represents target-to-compound-to-assay mappings indicative of activity modulation of a single defined human protein target. Complex targets that cannot be resolved to a single EGID (e.g. the 20s proteosome) are not included.
Content of bona fideDrug Targets
Detailed elaboration of what constitutes a drug target is outside the scope of this work but this has been reviewed . We, as do most descriptions for sources of this type, use the term "target" broadly to encompass any compound-to-protein mapping in our large dataset. We consider the target figures and divisions given by TTD to be a good approximation (they include a proportion of authenticated one-to-many mappings) to a set of bona fide primary targets (i.e. where the interaction in vitro is mechanistically causative for the therapeutic effect in vivo). It should be noted that, without inspection of the individual documents or "prior knowledge", it is difficult to discriminate within database records per se between a bona fide drug target, a protein assay included for the purpose of discerning compound selectivity, off-target effects or modulating multiple targets with the same compound (i.e. polypharmacology) . This classification problem is encountered for any large-scale collation of compound-to-protein mappings. It cannot be discerned clearly enough to be specified in the TCD database records because, while journal authors will typically explain the context and objectives of multiple assays, patent applicants often do not.
Nevertheless, it is clear from Table 2 that many of the top-50 proteins are not (yet) successful targets of approved drugs. A formal test was applied by determining the gene symbol intersect between Table 2 and the 185 targets of approved oral drugs from 2006 . Despite there being some new targets for post-2006 approved drugs the result was only 23 in common, indicating that a high compound ranking per se, is not necessarily a predictor of successful approval. The targets-in-common across the entire list were 160. Inspection of the 25 targets not matched indicated that, in most cases, the primary literature either included assay data from non-human proteins (e.g. mouse or rat) or that the cell-based receptor pharmacology assays were not classed as "type B". One interesting exception is what could be classified as orphan target, tyrosine-3-hydroxylase, TH [Swiss-Prot P07101]. While a drug was approved for it, α-methyl tyrosine (CID 441350) many decades ago to treat pheochromocytoma, this is now rarely used because of side effects. Consequently, this protein identifier has not been linked to new research compounds within this set of extracted journal papers and patents.
Cross-screening and Para-targets
The difficulty of discriminating primary targets from cross-screening activities is illustrated at the top of Table 2 for factor X, F10 [Swiss-Prot P00742] and thrombin, F2 [Swiss-Prot P00734]. They are not only the individual primary targets for the development of therapeutic protease inhibitors but also, because they are related as paralogues with a high sequence similarity and biochemical functions, they are typically chosen as cross-screening targets for each other. They can thus be termed "para-targets". We confirmed the extent of cross-screening by determining that there were 13,504 compounds-in-common and 357 documents-in-common (i.e. containing both thrombin and factor X inhibition data). This has the effect of pushing each of them higher in the compounds-per-target ranking. The second ranked para-target pair in Table 2 is the cannabinoid receptors 1, CNR1 [Swiss-Prot P21554] and 2, CNR2 [Swiss-Prot P34972] ranked at positions 2 and 10 respectively. These have 11,818 compounds-in-common and 342 documents-in-common. However, there is a difference for this pair in that antagonists have been predominantly pursued for CNR1 but agonists for CNR2 . In addition, CNRI provides an example of screening data derived from a specific splice variant with unique pharmacological profile, designated as cannabinoid receptor 1B [Swiss-Prot P21554-3] ranked at 1216 . Other para-target pairs illustrate different aspects. For the beta amyloid cleaving enzymes BACE1 [Swiss-Prot P56817], and BACE2 [Swiss-Prot Q9Y5Z0] clearly the former, ranked at 52, is the primary target but there is also some cross-screening for BACE2 ranked at 338. Another series of paralogues in the table, in order of compound ranking, are Cathepsin S, CTSS [Swiss-Prot P25774], Cathepsin K, CTSK [Swiss-Prot P43235], Cathepsin L, CTSL1 [Swiss-Prot P07711] and Cathepsin G, CTSG [Swiss-Prot P08311]. These are all cysteine proteases being explored for different diseases but are, as one might expect, extensively cross-screened for selectivity .
The first anti-target (i.e. cross-screening for potential liabilities in development) in the list, ranked at 83, is the hERG Kv11.1 potassium channel, KCNH2 [Swiss-Prot Q12809]. This is unsurprising considering the importance of checking for hERG inhibition . Another anti-target is the drug efflux pump, ATP-binding cassette, sub-family B (MDR/TAP) member 1, ABCB1 [Swiss-Prot P08183] ranked at 313. However, a recent analysis suggests that, while an anti-target for anticancer agents, it can also be classified as a drug target for non-sedating antihistamines .
The first non-target (i.e. without an established therapeutic context) is Trypsin, PRSS1 [Swiss-Prot Q3SY19], ranked at 114, because of its use as a mechanistic exemplar for cross-screening serine protease inhibitors. A second non-target, ranked at 261, is Albumin, ALB [Swiss-Prot P02768]. This is due to the routine testing of development compounds in albumin binding assays. Strictly speaking, this protein has no activity modulation but the compounds are nonetheless "mapped" in the binding sense. Slightly below this, at rank 295, is the amyloid beta A4 precursor protein, APP [Swiss-Prot P05067]. As an intact protein it is a non-target but inspection of the documents reveals two distinct strategies for compound testing. The first is the use of assays that measure down-regulation of APP production in cell lines. While this is clearly a therapeutic option to reduce amyloid peptide deposits, there is no data to suggest that the active compounds are actually binding APP. The second document set specifies beta amyloid aggregation antagonists (i.e. the peptide could be considered the target). Optimisation of these compounds would have the same therapeutic objective but would show different SAR. While some type of mechanistic splitting terminology for this target classification problem could be considered, it is important to note that the use of the APP identifier has at least facilitated data capture.
Target names can be recognised in the list where compounds in Phase III trials have been publically declared as either having safety concerns or did not show efficacy. An example of the former, the cannabinoid receptor 1, CNR1 [Swiss-Prot P21554] is ranked second but the clinical trial results for rimonabant (CID 104850) precluded approval because of an increased risk of depression and suicide . During the initial drafting of this manuscript we selected the cholesterol ester transfer protein, CETP [Swiss-Prot P11597], ranked at 263 as a failed target example because the progression of torcetrapib (CID 159325) was halted . However, within months, there was a more successful phase III outcome for anacetrapib (CID 11556427) targeting the same protein . Thus, the extent to which late-stage failures constitute de-validation remains an open question, given not only that some of those targets can still "make it" but also that efficacy in a pharmacogenetically stratified cohort or repurposing for an alternative indication might still be achievable.
Nevertheless, the ability to flag likely de-validation in the listing we have produced would be valuable. However, the capture of historical data has the limitation that targets can achieve a high ranking if many compounds have been generated during validation and proof-of-concept studies even where these eventually fail. In addition, negative data produced during the research phase is less likely to be published. Our data can be analysed on a per-year basis, so the observation of a sustained decline in compounds (i.e. less publications on that target) can infer that validation has stalled (data not shown).
Tractability Assessment by Molecular Frameworks Analysis
The upper part of our compounds-per-target distribution (Table 2 and Additional file 1) provides a de facto chemical tractability ranking. The term is used here as a measure of the probability that a useful level of potency for chemical modulation of the therapeutically relevant biochemical activity of a protein can be readily achieved in vitro. While this is likely to be related to the HTS primary hit-rate, it must be remembered that a high proportion of the compounds in MCD and TGD have gone through some hit-to-lead optimisation. We thus choose to differentiate, on a target basis, between chemical tractability and druggability. We consider the latter to be the likelihood of developing compounds with appropriate in vivo bioavailability, efficacy and safety profiles . These two characteristics are usually related because high chemical tractability facilitates the generation of more compound series in vitro which, in turn, provide more optimisation options in vivo. The main caveat with ranking targets just by compound numbers (as in Table 3) is that, in order to be useful, a tractability metric needs to factor-in the chemical diversity of the compound set. For example, targets mapped to large numbers of highly similar analogues might actually be less tractable than those with smaller absolute compound numbers but covering a broader range of chemotypes.
We have consequently exploited the compound listing to produce a detailed assessment of chemical diversity by comparing molecular frameworks and scaffolds on a per-target basis. These are well-developed concepts in medicinal chemistry and there are a number of ways in which chemical structures can be abstracted. An approach, initially described by Bemis and Murcko , considers such frameworks as a collection of ring systems connected by linkers, after removing side chains. A more detailed hierarchy was used by Xu and Johnson  to define Molecular Equivalence Indices (MEQIs) as tools for molecular similarity measures. These approaches have been used for classifying and visualising compound collections [33, 34], scaffold-hopping , comparing small sets of bioactive molecules  and large vendor libraries , target selectivity  and to differentiate between drugs, clinical candidate and bioactive molecules .
Molecular Framework 1 (MF1): This is generated from the normalised molecular structure by removing all terminal side chains. Exocyclic double bonds (atoms connected to ring systems through multiple bonds) and double bonds directly attached to the linker are kept.
Molecular Framework 2 (MF2): This is derived from MF1 by removing exocyclic double bonds and double bonds directly attached to the linker.
Carbon Scaffold (CS): This is derived from MF2 by ignoring all atom types other than Carbon.
Atom Type Scaffold (ATS): Also derived from MF2 but ignoring bond types.
Graph Scaffold (GS): Also derived from MF2 but ignoring bond types or atom types.
Top-20 target rankings by compound count and molecular frameworks.
We can see that the metalloprotease MMP1 drops from its original compound ranking at 11 down to 19 when ranked by MF2. The cathepsin CTSS moves in the opposite directed from 28 in the original ranking up to 7 by MF2. In the GS ranking we see the elastase ELNA rising from 49 to 20 but the kinase MAPK14 dropping from 4 to 14. Thus, for an individual target the tractability depends significantly on the molecular framework level used for ranking.
More compounds with fewer MF2 scaffolds indicate lower tractability (e.g. an MF2: compound ratio of 0.13 for ESR2 from a total of 6,695 compounds). A larger ratio indicates higher tractability (e.g. 0.36 for HDAC1 from a total of 6,124 compounds). We suggest this complements the ranking by compounds alone and, in this case, clearly differentiates the relative rankings of 67 for ESR2 and 73 for HDAC1.
The molecular scaffold results can be conceived as collapsing the ensemble of structures mapped to a target in progressive stages of abstraction. Thus, moving from MF2 and GS we see a reduction as more compounds collapse into the latter. The target trends in Figure 3 are different for MF2 and GS. In addition, the spiked shape of the abstractions show these can be highly target-specific. As an example of utility, the visualisation of the chemotype landscape for targets with very large compound sets (e.g. over 10,000) is much easier when the GS ring-type abstractions can be displayed and browsed.
The utility of using public data for examining tractability before embarking on drug discovery project directed against targets and the correlation with ligand-based experimental assessments has recently been pointed out .
We have triaged a commercial database to provide human target protein identifiers ranked by the numbers of compounds linked to them via direct biochemical assay data and the numbers of documents from which these associations were extracted. As far as we are aware, this is the largest published listing of this type and presents a detailed assessment of the major part of the human molecular target landscape that has been, or is, under active investigation . The unique of scale of this is exemplified by comparing the equivalent compound-to-target count for F10 in ChEMBL of 5,871 against 42,869 in this work. This is because the process includes the extraction compounds and data not only from journal articles but also from patents.
Nevertheless, there are limitations (beyond our triage choices) that preclude this being a complete capture of the available data. The first is that in the PubChemBioAssay database, while the direct assay methods may have been published as documents, the compound structures, protein identifiers and result sets are only instantiated in silico [42, 43]. The second limitation is the necessity to cap the number of examples extracted from a patent. The third is that patent data extraction is currently limited to the "big ten" target classes and English language applications (but efforts are underway at GVKBIO to expand this). The fourth is journal selection as opposed to all journals. Whilst these pragmatic constraints may bias the extractions, we propose that, in SAR terms, they are selective for higher quality data.
Our complete set of results include many proteins that would not be considered bona fide drug target candidates, not only for the reasons already pointed out in the review of the list, but also by being in the tail of the compound distribution. However, the inclusion of even the singletons (one compound from one publication) is useful not only because they have been authenticated by expert extraction but also both the target and the compound may have a wider set of relationships using different species and/or assay type restrictions. Imposing any cut-off for "target likelihood" is clearly arbitrary but taking, a lower limit of 20 compounds-per-target still covers just over 1000 proteins. This brings it into congruence with the data-supported target count of 836 human proteins for which moderately potent small-molecule chemical starting points had previously been reported .
Our breakdown of the compound sets into molecular scaffolds provides a useful measure of target-specific chemical tractability. Nevertheless, we can point out factors that may be skewing the ranking upwards. The first is the cross-screening effect already mentioned where many compounds mapped to a target are not being optimised for that target. A second effect is that resources assigned to target projects are determined by factors such as market potential, competitive positioning and unmet clinical need. This skews the distribution away from an objectively neutral ranking of tractability per se towards those targets the research community is collectively "working hardest" on. This intense focus also produces patent thickets (in the sense that many of the synthetically feasible chemotypes and analogues that can bind to a particular active site have already been claimed) that will also drive the expansion of chemical diversity for popular targets.
Readers are encouraged to explore their own additional analyses for Additional file 1. These could include generating intersects and differences with, for example, disease associated protein lists or other target protein lists extracted from public databases. In addition, the proteins could be further divided by sub-family and/or the existence of representative 3D structures in PDB. Further large-scale studies of the target landscape analogous to those reported here will be important as drug discovery continues to expand towards new therapeutic areas, new targets, broader cross-screening activities, repurposing and polypharmacology.
Protein designations first used in the text are given as their common name followed by the HGNC approved human gene symbol as used in the result tables. These are followed by the Swiss-Prot ID. Drug names are accompanied by their PubChem compoundidentifiers (CIDs).
We would like to thank Niklas Blomberg for his encouragement and perceptive reviewing of the manuscript.
- Hopkins AL, Groom CR: The druggable genome. Nat Rev Drug Discov. 2002, 1 (9): 727-730. 10.1038/nrd892.View ArticleGoogle Scholar
- Overington JP, Al-Lazikani B, Hopkins AL: How many drug targets are there?. Nat Rev Drug Discov. 2006, 5 (12): 993-996. 10.1038/nrd2199.View ArticleGoogle Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research. 2008, 36 (suppl 1): D901-906.Google Scholar
- Chen X, Ji ZL, Chen YZ: TTD: Therapeutic Target Database. Nucleic Acids Research. 2002, 30 (1): 412-415. 10.1093/nar/30.1.412.View ArticleGoogle Scholar
- Liu T, Lin Y, Wen X, Jorissen RN, Gilson MK: BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research. 2007, 35 (suppl 1): D198-201.View ArticleGoogle Scholar
- ChEMBL. (accessed Sep 10, 2010), [http://www.ebi.ac.uk/chembldb/index.php]
- Muresan S, Sitzmann M, Southan C: Mapping Between Databases of Compounds and Protein Targets. Biocomputing and Drug Discovery. Edited by: Larson RS. 2011,Google Scholar
- GVK BIO. (accessed Sep 10, 2010), [http://www.gvkbio.com]
- About GOSTAR. (accessed Sep 10, 2010), [http://gostardb.com/gostar/doc/HyperlinkDownloadPDF.pdf]
- Jagarlapudi SARP, Kishan KVR: Database Systems for Knowledge-Based Discovery. Chemogenomics: Methods and Applications (Methods in Molecular Biology, vol 575). Edited by: Jacoby E. 2009, New York: Humana Press, 159-172.View ArticleGoogle Scholar
- Leeson PD, Springthorpe B: The influence of drug-like concepts on decision-making in medicinal chemistry. Nat Rev Drug Discov. 2007, 6 (11): 881-890. 10.1038/nrd2445.View ArticleGoogle Scholar
- Tyrchan C, Blomberg N, Engkvist O, Kogej T, Muresan S: Physicochemical property profiles of marketed drugs, clinical candidates and bioactive compounds. Bioorg Med Chem Lett. 2009, 19 (24): 6943-6947. 10.1016/j.bmcl.2009.10.068.View ArticleGoogle Scholar
- Lovering F, Bikker J, Humblet C: Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. J Med Chem. 2009, 52 (21): 6505-6950. 10.1021/jm9008136.View ArticleGoogle Scholar
- Scheiber J, Chen B, Milik M, Sukuru SC, Bender A, Mikhailov D, Whitebread S, Hamon J, Azzaoui K, Urban L, Glick M, Davies JW, Jenkins JL: Gaining Insight into Off-Target Mediated Effects of Drug Candidates with a Comprehensive Systems Chemical Biology Analysis. J Chem Inf Model. 2009, 49 (2): 308-317. 10.1021/ci800344p.View ArticleGoogle Scholar
- Southan C, Varkonyi P, Muresan S: Complementarity Between Public and Commercial Databases: New Opportunities in Medicinal Chemistry Informatics. Curr Topics Med Chem. 2007, 7 (15): 1502-1508. 10.2174/156802607782194761.View ArticleGoogle Scholar
- Southan C, Varkonyi P, Muresan S: Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. J Cheminfo. 2009, 1: 10-10.1186/1758-2946-1-10.View ArticleGoogle Scholar
- Devidas S: Curation of inhibitor-target data: process and impact on pathway analysis. Protein Networks and Pathway Analysis (Methods in Molecular Biology vol 563, part 1). Edited by: Nikolsky Y, Bryant J. 2009, New York: Human Press, 51-62.View ArticleGoogle Scholar
- History of InChI. (accessed Sep 10, 2010), [http://www.inchi-trust.org/index.php?q=node/2]
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Research. 2011, 39 (suppl 1): D52-D57.View ArticleGoogle Scholar
- Harland L, Gaulton A: Drug target central. Expert Opinion on Drug Discovery. 2009, 4 (8): 857-872. 10.1517/17460440903049290.View ArticleGoogle Scholar
- Hopkins AL: Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol. 2008, 4 (11): 682-690. 10.1038/nchembio.118.View ArticleGoogle Scholar
- Mackie K: Cannabinoid receptors as therapeutic targets. Annu Rev Pharmacol Toxicol. 2006, 46 (1): 101-122. 10.1146/annurev.pharmtox.46.120604.141254.View ArticleGoogle Scholar
- Ryberg E, Vu HK, Larsson N, Groblewski T, Hjorth S, Elebring T, Sjögren S, Greasley PJ: Identification and characterisation of a novel splice variant of the human CB1 receptor. FEBS Letters. 2005, 579 (1): 259-264. 10.1016/j.febslet.2004.11.085.View ArticleGoogle Scholar
- Yasuda Y, Kaleta J, Brömme D: The role of cathepsins in osteoporosis and arthritis: Rationale for the design of new therapeutics. Adv Drug Deliv Rev. 2005, 57 (7): 973-993. 10.1016/j.addr.2004.12.013.View ArticleGoogle Scholar
- Bowlby MR, Peri R, Zhang H, Dunlop J: hERG (KCNH2 or Kv11.1) K Channels: Screening for Cardiac Arrhythmia Risk. Curr Drug Metab. 2008, 9: 965-970. 10.2174/138920008786485083.View ArticleGoogle Scholar
- Fabio B, Emanuele C, Gabriele C, Tudor IO: Transporter-Mediated Efflux Influences CNS Side Effects: ABCB1, from Antitarget to Target. Molecular Informatics. 2010, 29 (1-2): 16-26. 10.1002/minf.200900075.View ArticleGoogle Scholar
- Christensen R, Kristensen PK, Bartels EM, Bliddal H, Astrup A: Efficacy and safety of the weight-loss drug rimonabant: a meta-analysis of randomised trials. The Lancet. 2007, 370 (9600): 1706-1713. 10.1016/S0140-6736(07)61721-8.View ArticleGoogle Scholar
- Joy T, Hegele RA: The end of the road for CETP inhibitors after torcetrapib?. Current Opinion in Cardiology. 2008, 24 (4): 364-371.View ArticleGoogle Scholar
- Cannon CP, Shah S, Dansky HM, Davidson M, Brinton EA, Gotto AM, Stepanavage M, Liu SX, Gibbons P, Ashraf TB, Zafarino J, Mitchel Y, Barter P, Determining the Efficacy and Tolerability Investigators: Safety of Anacetrapib in Patients with or at High Risk for Coronary Heart Disease. New England Journal of Medicine. 2010, 363 (25): 2406-2415. 10.1056/NEJMoa1009744.View ArticleGoogle Scholar
- Keller TH, Pichota A, Yin Z: A practical view of 'druggability'. Curr Opin Chem Biol. 2006, 10 (4): 357-361. 10.1016/j.cbpa.2006.06.014.View ArticleGoogle Scholar
- Bemis GW, Murcko MA: The Properties of Known Drugs. 1. Molecular Frameworks. Journal of Medicinal Chemistry. 1996, 39 (15): 2887-2893. 10.1021/jm9602928.View ArticleGoogle Scholar
- Xu YJ, Johnson M: Using Molecular Equivalence Numbers To Visually Explore Structural Features that Distinguish Chemical Libraries. Journal of Chemical Information and Computer Sciences. 2002, 42 (4): 912-926.Google Scholar
- Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H: The Scaffold Tree - Visualization of the Scaffold Universe by Hierarchical Scaffold Classification. J Chem Inf Model. 2007, 47 (1): 47-58. 10.1021/ci600338x.View ArticleGoogle Scholar
- Wetzel S, Klein K, Renner S, Rauh D, Oprea TI, Mutzel P, Waldmann H: Interactive exploration of chemical space with Scaffold Hunter. Nat Chem Biol. 2009, 5 (8): 581-583.View ArticleGoogle Scholar
- Jenkins JL, Glick M, Davies JW: A 3D similarity method for scaffold hopping from the known drugs or natural ligands to new chemotypes. Journal of Medicinal Chemistry. 2004, 47 (25): 6144-6159. 10.1021/jm049654z.View ArticleGoogle Scholar
- Ye H, Jürgen B: Scaffold Distributions in Bioactive Molecules, Clinical Trials Compounds, and Drugs. ChemMedChem. 2010, 5 (2): 187-190. 10.1002/cmdc.200900419.View ArticleGoogle Scholar
- Monge A, Arrault A, Marot C, Morin-Allory L: Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers. Molecular Diversity. 2006, 10 (3): 389-403. 10.1007/s11030-006-9033-5.View ArticleGoogle Scholar
- Yang Y, Chen H, Nilsson I, Muresan S, Engkvist O: Investigation of the Relationship between Topology and Selectivity for Druglike Molecules. Journal of Medicinal Chemistry. 2010, 53 (21): 7709-7714. 10.1021/jm1008456.View ArticleGoogle Scholar
- Chen H, Yang Y, Engkvist O: Molecular Topology Analysis of the Differences between Drugs, Clinical Candidate Compounds, and Bioactive Molecules. Journal of Chemical Information and Modeling. 2010, 50 (12): 2141-2150. 10.1021/ci1002558.View ArticleGoogle Scholar
- Edfeldt FNB, Breeze AL, Folmer RHA: Fragment screening to predict druggability (ligandability) and lead discovery success. Drug Discovery Today. 2011, 16 (7-8): 284-7. 10.1016/j.drudis.2011.02.002.View ArticleGoogle Scholar
- Campbell SJ, Gaulton A, Marshall J, Bichko D, Martin S, Brouwer C, Harland L: Visualizing the drug target landscape. Drug Discovery Today. 2010, 15 (1-2): 3-15. 10.1016/j.drudis.2009.09.011.View ArticleGoogle Scholar
- Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, Wang J, Xiao J, Zhang J, Bryant SH: An overview of the PubChem BioAssay resource. Nucleic Acids Research. 2010, 38 (suppl 1): D255-D266.View ArticleGoogle Scholar
- Li Q, Cheng T, Wang Y, Bryant S: PubChem as a public resource for drug discovery. Drug Discovery Today. 2010, 15 (23-24): 1052-7. 10.1016/j.drudis.2010.10.003.View ArticleGoogle Scholar
- Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL: Global mapping of pharmacological space. Nat Biotechnol. 2006, 24 (7): 805-815. 10.1038/nbt1228.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.