Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases
© Paricharak et al.; licensee Chemistry Central Ltd. 2013
Received: 13 August 2013
Accepted: 26 November 2013
Published: 13 December 2013
‘Phylogenetic trees’ are commonly used for the analysis of chemogenomics datasets and to relate protein targets to each other, based on the (shared) bioactivities of their ligands. However, no real assessment as to the suitability of this representation has been performed yet in this area. We aimed to address this shortcoming in the current work, as exemplified by a kinase data set, given the importance of kinases in many diseases as well as the availability of large-scale datasets for analysis. In this work, we analyzed a dataset comprising 157 compounds, which have been tested at concentrations of 1 μM and 10 μM against a panel of 225 human protein kinases in full-matrix experiments, aiming to explain kinase promiscuity and selectivity against inhibitors. Compounds were described by chemical features, which were used to represent kinases (i.e. each kinase had an active set of features and an inactive set).
Using this representation, a bioactivity-based classification was made of the kinome, which partially resembles previous sequence-based classifications, where particularly kinases from the TK, CDK, CLK and AGC branches cluster together. However, we were also able to show that in approximately 57% of cases, on average 6 kinase inhibitors exhibit activity against kinases which are located at a large distance in the sequence-based classification (at a relative distance of 0.6 – 0.8 on a scale from 0 to 1), but are correctly located closer to each other in our bioactivity-based tree (distance 0 – 0.4). Despite this improvement on sequence-based classification, also the bioactivity-based classification needed further attention: for approximately 80% of all analyzed kinases, kinases classified as neighbors according to the bioactivity-based classification also show high SAR similarity (i.e. a high fraction of shared active compounds and therefore, interaction with similar inhibitors). However, in the remaining ~20% of cases a clear relationship between kinase bioactivity profile similarity and shared active compounds could not be established, which is in agreement with previously published atypical SAR (such as for LCK, FGFR1, AKT2, DAPK1, TGFR1, MK12 and AKT1).
In this work we were hence able to show that (1) targets (here kinases) with few shared activities are difficult to establish neighborhood relationships for, and (2) phylogenetic tree representations make implicit assumptions (i.e. that neighboring kinases exhibit similar interaction profiles with inhibitors) that are not always suitable for analyses of bioactivity space. While both points have been implicitly alluded to before, this is to the information of the authors the first study that explores both points on a comprehensive basis. Excluding kinases with few shared activities improved the situation greatly (the percentage of kinases for which no neighborhood relationship could be established dropped from 20% to only 4%). We can conclude that all of the above findings need to be taken into account when performing chemogenomics analyses, also for other target classes.
KeywordsKinase inhibitor Selectivity Phylogenetics Chemogenomics Polypharmacology
Deregulations of kinases as a consequence of mutations as well as over- or underexpression can cause abnormal changes in cellular signaling and, as a consequence, have implications for the progression of diseases such as cancer, diabetes and inflammation . In case of cancer, the kinases involved are often over-activated, generally leading to excessive cell proliferation (and decreased response to apoptotic signals). Therefore, kinases are common targets of anti-cancer drugs and cancer treatment by kinase inhibition has been one of the most active areas across the pharmaceutical industry in the last decade [1, 7]. An example of a highly successful drug in this area is the kinase inhibitor Gleevec which was first developed as a selective Bcr-Abl inhibitor for treatment of chronic myelogenous leukemia (CML) [8–10]. Later, also its secondary activities against KIT were exploited for treating gastrointestinal stromal tumors . Other anti-cancer drugs that were the result of approvals in the last decade include Iressa (indicated in case of EGFR upregulation)  and Tykerb (indicated in case of ERBB2 overexpression) .
While kinases are involved in various signaling processes, they are still structurally very similar when it comes to their ATP binding sites, which are highly conserved [14, 15]. Despite the success of type I kinase inhibitors on the market (i.e. inhibitors that target the ATP binding site), virtually no kinase inhibitor is truly selective (although this promiscuity might very well be tolerated in the clinic) . Whilst the promiscuity of kinase inhibitors may hence not necessarily be a problem and may even be beneficial in some cases (such as in case of repurposing Gleevec as described above), it is generally important to understand the inhibition profile of kinase inhibitors early on in the drug discovery process in order to be able to assess efficacy, off-target effects and to anticipate possible safety problems [17–20].
In an attempt to understand the inhibition profile of kinase inhibitors and drug candidates in general, various chemogenomics methods have been employed to analyze compound activity against a series of targets in recent years [21–29]. Many of those studies have indicated that sequence similarity between kinases does not always correlate with kinase inhibitor interaction (i.e. kinases with dissimilar sequences can also bind to the same compound). One such example is a study by Karaman et al., where the bioactivity profiles of 38 kinase inhibitors tested against 317 kinases was analyzed. The authors found that for the 317 kinases analyzed, compounds originally described as tyrosine kinase inhibitors indeed bound tyrosine kinases more frequently than serine/threonine kinases; however many of the serine/threonine kinase inhibitors were found to interact with tyrosine kinases more frequently . Fabian et al. showed that BIRB-796 was able to bind the serine-threonine kinase p38, and the tyrosine kinase ABL(T315I) rather tightly (at around 40 nM), despite both kinases having only a 23% sequence identity . Similarly, the tyrosine kinase inhibitor dasatinib  also interacts with serine/threonine kinases, albeit with a 2.9-fold lower selectivity at a concentration of 3 μM than for tyrosine kinases (i.e. dasatinib bound to 2.9 times as many tyrosine kinases as it did to serine/threonine kinases) . Also surprising cases of relative selectivity exist, however: while imatinib inhibits LCK, it is selective over the closely related kinase SRC, as shown in the analysis by Fabian et al.
While above methods did not consider the spatial structure of the ligand binding pocket, also structure-based studies have been performed on kinases, such as by Kuhn et al. as well as others . The approach by Kuhn et al., which incorporates the comparison of 3D binding site descriptors across kinases via Cavbase, has shown that kinase binding site properties can be used to predict kinase interaction with inhibitors, such as the cross-reactivity of Gleevec. The model showed separation of serine/threonine and tyrosine kinases and a clustering on the subfamily level could be achieved, where 12 out of the 16 subfamily clusters formed included at most one member from another kinase class. Moreover, the sequence-based similarity of kinases was compared to their Cavbase similarity: in many cases kinase pairs exhibit a sequence identity below 50%, while possessing a Cavbase R1 similarity score of 22 or above (i.e. high predicted SAR similarity, where SAR similarity specifically refers to similarity in terms of the compounds target proteins bind to). Also in the area of predictive modeling, Martin et al. developed Bayesian QSAR models on 92 kinases that were diverse in terms of sequence, covering most of kinase sequence space. Subsequently, activities of compounds on previously untested kinases could be predicted as a weighted average of prediction of the same compounds from neighboring models, allowing for assessment of compound promiscuity within the kinome . Whilst this approach has generally proven useful for prediction of bioactivity profiles (an R2 value of 0.48 was obtained when tested on validation data from 18 assays) , the assumption that kinases that are similar in terms of protein sequence have a similar interaction profile with inhibitors has not been verified thoroughly in this previous work (which is one of the foci of the current work).
As an extension of the work mentioned above and complementary to sequence-based analysis of kinases, Bamborough et al. analyzed kinase bioactivity data based on inhibitor affinity fingerprints, and used this approach to rationalize cross-reactivity of compounds . The kinome tree was reclassified using affinity fingerprints, and the relationship between domain sequence identity and kinase SAR similarity was analyzed. The main finding was that there was no linear relationship between kinase sequence similarity and SAR similarity. However, two groups of distinct kinase-pair relationships were observed: pairs of kinases with below 40-50% sequence identity in their kinase domains were found to exhibit significantly lower SAR similarity than kinase pairs with more than 40-50% sequence identity. A similar analysis was performed on another kinase panel by Davis et al. where selectivity scores were computed for each kinase by dividing the number of compounds bound with Kd < 3 μM by the total number of compounds screened. The results primarily illustrated kinase promiscuity: 60% of the kinases interacted with 10-40% of the compounds and most compounds had interactions with kinases from multiple groups, which was in line with the analysis by Bamborough et al..
We will now outline how the current study extends previous approaches. In both the preceding analyses, binary affinity fingerprints were used; i.e. inhibitors were classified as either ‘active’ or ‘inactive’. In this work, we extend that approach by incorporating the analysis of chemical features of the inhibitors (i.e. by classifying inhibitors as a collection of chemical features), which considerably enhances the statistical power of models (since there are many more features that can be matched than entire compounds). Kinase-pair distance were calculated based on the presence and absence of these chemical features in active and inactive inhibitors, hereby adding more chemical information to the dataset for better comparison of inhibitor cross-reactivity (actual percentage inhibition values were not used for this purpose, because in this case it would not be possible to incorporate the information of chemical features). We set out to analyze a dataset of 157 kinase inhibitors, selected on basis of structural diversity, cell permeability, reversibility and potency  and assayed at concentrations of 1 μM and 10 μM against a panel of 225 human protein kinases (this dataset has been made publicly available via ChEMBL recently) . The classification of the kinome was revised, based on bioactivity data and chemical feature enrichments with the aim to rationalize (and predict) cross-reactivity of compounds within the kinome. We show that this classification will more accurately define kinase neighbors in terms of bioactivity similarity in response to inhibitors, and will therefore be more valuable in predicting kinase inhibitor promiscuity. In particular, we will analyze the influence of data density on chemogenomics analyses (which was found to be very important, to the extent that part of the data effectively needs to be removed), as well as revisit the assumptions that phylogenetic trees make when representing similarities between proteins according to ligand similarity (where the assumption that close neighbors exhibit similar compound interaction is invalid in some cases).
Results and discussion
We firstly aimed to understand the nature of our dataset by analyzing physicochemical property diversity and scaffold diversity. The chemical diversity of the kinase inhibitor library analyzed here, compared to 11,577 protein kinase inhibitors retrieved from ChEMBL exhibiting IC50 values lower than 10 μM, is shown in Additional file 1: Figure S1 with diverse structures being visualized. PC1 (principal component 1) and PC2 (principal component 2) capture 46% of all variance in the dataset and are related to molecular size (PC1) and charge and lipophilicity (PC2). The Calbiochem library used in the current study covers the left hand side of the PCA space (representing smaller compounds) rather well, whereas the right hand side (representing larger compounds) is not covered as well. The frequency of the top 10 most prevalent scaffolds in the inhibitors is shown in Additional file 2: Figure S2. Given that there were over 110 scaffolds present in a dataset with only 157 inhibitors, we consider this dataset to be highly diverse, which was also one of its original design principles.
Bioactivity-based classification of kinases compared to earlier classifications
In addition, we examined an earlier evaluation of interaction maps of 37 known kinase inhibitors  against a panel of 317 kinases in more detail. For six of the existing inhibitors (Gefitinib, SU-14813, BIRB-796, Staurosporine, Dasatinib and Sunitinib) the sequence-based kinase distance  was plotted against the bioactivity distance for pairs of kinases (shown in Additional file 5: Figure S5). In approximately 57% of the cases, kinases inhibited by the same compound are quite distant according to the sequence-based classification (distance 0.6 – 0.8), but rather close according to the bioactivity-based classification employed in the current work (distance 0 – 0.4). Furthermore, it is known that the gatekeeper residue in kinases is important for determining selectivity against inhibitors . Hence, we also examined whether kinase pairs sharing the same gatekeeper residue were close in bioactivity distance. The results are shown in Additional file 6: Figure S6, where it can be seen that kinase pairs with the same gatekeeper residue occur much more often in the lower pairwise bioactivity distance ranges, than they do in the higher ranges: the occurrence of kinase pairs with the same gatekeeper residue averaged over the first 5 bins (distance range 0 to 0.50) is 23%, whereas it is only 13% for the last 5 bins (distance range > 0.50). These results suggest that the bioactivity-based classification may be more useful globally in predicting kinase inhibitor cross-reactivity than the previous sequence-based classification .
Relationship between SAR similarity and bioactivity distance
Next, the integrity of the kinase tree was examined, which also puts those points into a statistically meaningful context. In this part of the analysis, we attempted to determine to what extent the tree in question was useful for predicting promiscuity of kinase inhibitors; i.e. whether kinases which share a similar bioactivity profile and hence are close in bioactivity space are also represented as close neighbors in the tree (and vice versa). We hence assessed the number of shared active compounds between each pair of kinases as a measure for SAR similarity and compared this number to the distance based on the bioactivity profiles (shown in Additional file 7: Figure S7). For each kinase, except for NEK7, which was not inhibited by any compound, this pairwise comparison was carried out against all 224 kinases in the dataset.
However, there are a number of factors that deserve attention in this type of analysis: the number of kinases exhibiting a negative relationship between SAC score and bioactivity distance changes drastically, depending on the normalization method used. When the number of shared active compounds was normalized by the total number of active compounds against the common kinase in the pairwise comparison (i.e. the kinase which does not change in the pairwise comparisons: when kinase 1 is compared to itself and all other 224 kinases, kinase 1 is referred to as the ‘common’ kinase), the expected negative relationship between SAC score and bioactivity distance (distant kinases having a relatively low percentage of shared active compounds and neighboring kinases having a relatively high percentage of shared active compounds) was only observed in 25% of all kinases (55 out of 224). When the number of shared active compounds was normalized by the total number of compounds active, against both the common kinase and the variable kinase (i.e. the other kinase in the comparison), the expected negative relationship was already observed in 46% of all kinases (103 out of 224). When normalized against the total number of compounds active against the variable kinase however, the expected relationship was observed in up to 81% of all kinases (181 out of 224). This can likely be explained as follows: when normalizing by the total number of active compounds against the common kinase, all data points in the series are normalized by the same value, and therefore, variable kinases with a high total number of active compounds are more likely to have higher SAC scores (more chances of having shared active compounds), compared to kinases with a lower total number of active compounds. This bias leads to incorrect comparisons, for example in the situation where the variable kinase has a low total number of active compounds, of which a higher percentage is shared with the common kinase, where normalizing by the total number of active compounds against the common kinase would underestimate SAR similarity due to a lower count in shared active compounds (despite a high percentage). Therefore, this percentage, which is the result of normalization by the total number of compounds active against the variable kinase, was used in subsequent analyses, also since it was consistent with the SAR trend in the highest number of kinases (81%) included in the analysis. Examples of series showing the expected negative relationship between SAC score and bioactivity distance and series not showing this relationship (kinase outliers) are shown in Additional file 8: Figure S8.
Alternative method of assessing kinase bioactivity distance as described by Bamborough et al..
Comparison to 3D methods
An earlier study by Kuhn et al. described a 3D protein binding pocket description and comparison method (Cavbase), which has been utilized to predict kinase inhibitor interaction profiles . In this previous study, the sequence-based similarity of kinases was compared to their Cavbase (i.e. three-dimensional similarity) similarity: in many cases kinase pairs exhibit a sequence identity below 50%, while possessing a Cavbase R1 similarity score of 22 or above (i.e. a high similarity in binding site properties, and hence, a high predicted SAR similarity) . Of the kinase outliers detected in our analysis, Kuhn et al. also discovered that the kinases LCK, FGFR1, AKT2, DAPK1 and TGFR1 have unexpected binding site similarities (and hence, unexpected predicted SAR similarities) with sequence-wise distant kinases, which is in accordance with our analysis . In addition, the kinase MK12 (outlier from group 2) also showed low Cavbase predicted SAR similarity against closely related kinases. Similarly, Vieth et al. have also shown that the kinases AKT1 (outlier from group 2) and LCK (outlier from group 1) have unexpected SAR similarity with one or more other kinases (which may represent either high SAR similarity despite low sequence similarity, or low SAR similarity despite high sequence similarity) . Our findings show that whilst the majority of kinases (approximately 80% of the kinases in the dataset) exhibit consistent SAR with their neighbors, a subset of kinases does not. Therefore, accurately extrapolating compound activities to these ‘atypical’ kinases, as performed in the study by Martin et al., poses an even larger challenge than is generally the case in the area of structure-activity modeling.
Limitations of phylogenetic clustering of the kinome
Hence, based on the data used in this study, the kinome tree may not be an entirely accurate representation of the information at hand when analyzing and representing chemogenomics relationships between receptors. Both cases with too little data and those that show inconsistent SAR with neighboring kinases are the root of those problems: some kinases show SAR that is similar to other kinases, but not to kinases nearby, and they can thus not be assigned a proper position in a phylogenetic tree. Apart from the problem mentioned earlier – that outliers in bioactivity space can be caused by kinases with insufficient number of shared active compounds (care needs to be taken with respect to data density) – the assumption that kinase SAR can be projected into a metric space represents in our view the second widely used, but still not entirely correct way to represent chemogenomic relationships between targets and their similarities in SAR space. The latter assumption is made by phylogenetic kinome trees and should be reconsidered when conducting chemogenomics analyses.
Visualization of kinases using multi-dimensional scaling (MDS)
Analyses for subset excluding kinases with few shared activities
In our revised analysis, kinases showed much better agreement with respect to the expected relationship in SAC score-distance plots: according to the analysis based on fingerprint enrichment profiles, only 7 kinases (VEGFR3, ACK1, LYN, CSK21, CSK22, IGF1R and WNK2) were classified as outliers (see Methods section for more details on assessment of relationships in SAC score-distance plots), mainly in the tyrosine kinase branch (see Figure 1). Previously, 43 kinases were classified as outliers, of which only 8 were omitted due to lack of shared activities with other kinases in the panel. Hence, based on the data analyzed in this study, our revision of the kinome phylogenetic tree shows that omission of kinases with 16 or less shared activities with other kinases in the panel leads to the construction of a more reliable ligand-based kinome tree, which is more consistent with the observed SAR than previous efforts.
Hence, while our removal of kinases with too few data points led to improvement in the reliability of the ligand-based kinome tree produced, there are still structure-activity relationships that are intrinsically difficult to transfer between kinases (bioactivity similarity is after all a local concept , not a global one), however on a much smaller scale than before. The main purpose of this work is to draw particular attention to this fact, which is here applied to the chemogenomics analysis of kinase inhibitors, but which is also transferable to other target families. In addition, while it is possible that different assay types (e.g. a binding assay instead of a functional assay) may influence the conclusions drawn here, we believe this is unlikely due to the fact that the dataset did not consist of agonists (in which case binding assays and functional assays may not correlate well in every case), but only of antagonists.
Understanding kinase inhibitor promiscuity still remains a great challenge within the field of drug discovery . In this work, we introduced a revised kinome classification of 225 kinases, based on a complete bioactivity matrix. While kinases from the same group generally tend to arrange in the same cluster, we also observed inconsistencies in the SAR-based kinome trees generated: 80% of all kinases exhibit an expected negative relationship between SAR similarity and bioactivity distance, whilst approximately 20% do not. Two groups of kinase outliers were observed. The first group of outliers resulted from the analysis based on fingerprint enrichment profiles, and show inconsistent SAR similarity to neighboring kinases. The second group of outliers resulted from the analysis based on the Tanimoto comparison between bioactivity fingerprints of kinases, and were observed because these kinases have too few shared activities (data points) to reliably include in the analysis. Exclusion of kinases with a low number of shared activities across the kinase panel (16 or fewer activities) resulted in more robust data with less noise (i.e. no kinases with too few data points) and is therefore an improvement on our earlier analysis. This analysis resulted in only 7 out of 188 kinases (4%) being classified as outliers. Interestingly, these outliers were grouped together in 2 clusters in an MDS plot based on bioactivity. Further investigation of their SAR-distance relationships showed that each cluster showed a different relationship between SAR similarity and distance, explaining their MDS classification into 2 groups. Our findings show that while the phylogenetic tree based on bioactivity data shows a good overview of kinases in terms of SAR similarity, it does not explain kinase SAR in all cases (~4% of kinases do not exhibit global SAR similarity to other kinases which share local SAR similarity). Some kinases still need to be repositioned from both the sequence-based kinome tree  as well as from previous bioactivity-based kinome classifications, as tree-like structures do not always truly resemble the distance between kinases in SAR space. Hence, based on the data analyzed here, we are able to show that (1) kinases with few shared activities are difficult to establish neighborhood relationships for, and (2) phylogenetic tree representations make implicit assumptions regarding kinase similarities (i.e. that neighboring kinases exhibit similar interaction profiles with inhibitors) that are not always suitable for chemogenomics analyses of bioactivity space. These findings are conceptually transferable to other target families.
The dataset consists of 157 inhibitors (Calbiochem InhibitorSelect™ Protein Kinase inhibitor library, Cat# 539743-1EA) assayed at concentrations of 1 μM and 10 μM against a panel of 225 kinases (which is publicly available via ChEMBL). Bioactivity values are displayed as percentage inhibition, relative to native kinase activity. Compounds that inhibited kinase activity by 50% or more at the concentration screened were considered active. Moreover, same compounds at different concentrations were considered unique for the generation of bioactivity enrichment profiles and the assessment of shared activities (which is elaborated below). Given that all inhibitors were assayed at 2 concentrations, we were able to differentiate between e.g. linking kinases by one strong inhibitor and two weak inhibitors. In the former case, features that are present in the inhibitor are counted twice (as the strong inhibitor is active at both concentrations), whereas in the latter case, the features are only present in the active set once per inhibitor (as the weak inhibitors are active only at the higher concentration), and therefore, lead to a different bioactivity profile than in the former case. Using 2 concentrations, we hence emphasize the importance of features present in strong inhibitors.
The dataset analyzed was generated at Merck Millipore using the KinaseProfiler service . Kinases were diluted in buffers of different composition, depending on the kinase assay, consisting of one or more of the following chemicals: MOPS, EDTA, Brij-35, Glycerol, NaCl, β-mercaptoethanol, BSA, HEPES, Triton X-100, DTT, Triton Surfactant, Glycerol, TRIS, EGTA, Tween 20, Na-β-glycerophosphate and Na3VO4. Kinase assays involved incubation of the kinase in a buffered solution, followed by the initiation of reaction by addition of an MgATP mix. Reactions were terminated by the addition of a 3% phosphoric acid solution (for some kinases, other solutions were used) after an incubation period (time dependent on kinase) at room temperature. For most kinase assays, 10 μL of the reaction mixes were spotted onto a P30 filtermat and washed thrice for 5 minutes in phosphoric acid (concentration dependent on kinase) and once in methanol prior to drying and scintillation counting . More details on the buffer concentrations used and the specific procedure for each specific kinase assay can be found in the KinaseProfiler Service Assay Protocols .
Chemical diversity assessment of inhibitors
MOE version 2011.10  was used to wash and to assign partial charges (the Gasteiger PEOE force field was used) to both the protein kinase inhibitors from ChEMBL (IC50 < 10 μM) as well as the inhibitors from Calbiochem InhibitorSelect™ Protein Kinase inhibitor library, Cat# 539743-1EA. Subsequently, principal components of 186 2D molecular descriptors were calculated for all inhibitors.
Generation of bioactivity-based fingerprint enrichment profiles
This resulted in a bioactivity-based fingerprint enrichment profile for each kinase (kinase vector), referred to as ‘fingerprint enrichment profile’ in the main text. This representation of kinases is somewhat similar to the FragSim similarity measure used by Sutherland et al. due to the fact that both measures assess protein similarity by the structures of their inhibitors, but differs in two important aspects. Firstly, the FragSim similarity measure uses larger fragments consisting of 4 to 17 heavy atoms to describe the inhibitors, whereas our fingerprint enrichment profile uses smaller ECFP_4 features. Secondly, the FragSim similarity measure does not take into account the presence of its fragments in the inactive set of compounds, hereby not distinguishing between features which are present only in the active set of inhibitors and features which are present in both the active set as well as the inactive set of inhibitors. This is taken into account in our ‘fingerprint enrichment profile’.
Generation of distance matrices and kinase inhibitor response-distance relationships
Each kinase was compared pairwise against all other kinases using both of the above measures. The percentage of shared active compounds was normalized by the total number of active compounds in either the ‘common’ kinase (i.e. the kinase which does not change in the pairwise comparisons: when kinase 1 is compared to itself and all other 224 kinases, kinase 1 is referred to as the ‘common’ kinase), the ‘variable’ kinase (i.e. the other kinase in the comparison) or in both the kinases. The normalized values were converted to percentages and were plotted against the distance, resulting in a trend series for every kinase. In order to better visualize the collection of data points, mean centering was performed on the series with respect to each axis: the average distance was set to 0.5 and the average percentage was set to 50% and was called ‘SAC score’ after mean centering.
Assessment of sequence-based similarity distance-bioactivity distance plots
The sequence-based kinase distance matrix was calculated using T-Rex  from the tree file obtained from the human kinome project . Kinase pairs targeted by the inhibitor were automatically extracted from the supplementary material provided by Karaman et al. and looked up in the sequence-based distance matrix .
Kinase gatekeeper analysis
The kinase gatekeepers were determined by performing a multiple sequence alignment on the kinases using MEGA version 5 , using the default parameters (Protein Weight Matrix: Gonnet, Gap Open penalty: 10, Gap Extension penalty: 0.20 and Gap Distances: 5). Subsequently, bioactivity distance between kinases pairs was compared to their gatekeeper residues.
Generation of phylogenetic trees
PHYLIP was used to create tree files from the distance matrix using the neighbor-joining method (no outgroup root was specified)  and Archaeopteryx and iTOL were used for visualization [56–58]. In addition, the Merck Millipore DART tool  was used to visualize the sequence-based kinome tree as defined by Manning et al. Kinases were colored based on the classification of kinase groups as defined by the sequence-based tree .
Assessment of relationship between SAC score and bioactivity distance
In order to assess the relationship of the 224 SAC score-distance, a second degree polynomial function was fitted through the data points of each series using R . Series with a negative slope at distance = 0.40 and distance = 0.67 (with the highest distance being approximately 1.9), and an R2 value greater than 0.2 were considered to be exhibiting neighborhood behavior.
Generation of multidimensional-scaling (MDS) plots
The Hamming distance was calculated between kinases based on their binary bioactivity fingerprints (compounds inhibiting kinase activity by 50% or more were considered active and compounds inhibiting kinase activity by less than 50% were considered inactive) and an MDS plot was generated using Orange Canvas . Signed relative stress levels were minimized (with stopping conditions being a minimum stress change of 0.00005 and a maximum number of steps of 5000) and kinases were colored either according to their group as determined by sequence-based classification  (for Figure 11) or by class (i.e. outlier group 1, outlier group 2 or non-outlier – for Figure 9 and Additional file 8: Figure S8).
Abelson murine leukemia viral oncogene homologue 1
Tyrosine kinase non-receptor protein 2
V-akt murine thymoma viral oncogene homologue 1
V-akt murine thymoma viral oncogene homologue 2
Breakpoint cluster region protein
Cell division protein kinase 6
CDC-like kinase 1
Chronic myelogenous leukemia
Casein kinase II subunit alpha
Casein kinase II subunit alpha’
Death-associated protein kinase 1
Extended connectivity fingerprints
Fibroblast growth factor receptor 1
Insulin-like growth factor 1 receptor
Mast/stem cell growth factor receptor
Protein kinase C alpha type
Protein kinase C beta type
Protein kinase C iota type
Lymphocyte cell-specific protein-tyrosine kinase
V-yes-1 Yamaguchi sarcoma viral related oncogene homolog
Mitogen-activated protein kinase p38 beta
Mitogen-activated protein kinase p38 gamma
Mitogen-activated protein kinase p38 delta
Mitogen-activated protein kinase p38 alpha
Never in mitosis A-related kinase 6
Never in mitosis A-related kinase 7
Quantitative structure-activity relationship
Simplified molecular-input line-entry
Proto-oncogene tyrosine-protein kinase Src
Transforming growth factor-beta receptor type I
Vascular endothelial growth factor receptor 3
Serologically defined colon cancer antigen 43.
SP thanks the Dutch Minister of Education, Culture and Science (Huygens Programme) and AB thanks Unilever for funding. In addition, the authors would like to thank Oscar Méndez-Lucio for help with data preparation.
- Liao JJ-L: Molecular recognition of protein kinase binding pockets for design of potent and selective kinase inhibitors. J Med Chem. 2007, 50: 409-424. 10.1021/jm0608107.View ArticleGoogle Scholar
- Hanks SK, Hunter T: The eukaryotic protein kinase superfamily: kinase (catalytic) domain structure and classification. FASEB J. 1995, 9: 576-596.Google Scholar
- Fabian MA, Biggs WH, Treiber DK, Atteridge CE, Azimioara MD, Benedetti MG, Carter TA, Ciceri P, Edeen PT, Floyd M, Ford JM, Galvin M, Gerlach JL, Grotzfeld RM, Herrgard S, Insko DE, Insko MA, Lai AG, Lélias J-M, Mehta SA, Milanov ZV, Velasco AM, Wodicka LM, Patel HK, Zarrinkar PP, Lockhart DJ: A small molecule-kinase interaction map for clinical kinase inhibitors. Nat Biotechnol. 2005, 23: 329-336. 10.1038/nbt1068.View ArticleGoogle Scholar
- Zhang Z: Protein tyrosine phosphatases: structure and function, substrate specificity, and inhibitor development. Annu Rev Pharmacol Toxicol. 2002, 42: 209-234. 10.1146/annurev.pharmtox.42.083001.144616.View ArticleGoogle Scholar
- Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S: The protein kinase complement of the human genome. Science (80-. ). 2002, 298: 1912-1934. 10.1126/science.1075762.View ArticleGoogle Scholar
- Hanks SK: Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 2003, 4: 111-10.1186/gb-2003-4-5-111.View ArticleGoogle Scholar
- Weinmann H, Metternich R: Drug discovery process for kinase inhibitors. Chembiochem. 2005, 6: 455-459. 10.1002/cbic.200500034.View ArticleGoogle Scholar
- Johnson JR, Bross P, Cohen M, Rothmann M, Chen G, Zajicek A, Gobburu J, Rahman A, Staten A, Pazdur R: Approval summary: imatinib mesylate capsules for treatment of adult patients with newly diagnosed philadelphia chromosome-positive chronic myelogenous leukemia in chronic phase. Clin Cancer Res. 2003, 9: 1972-1979.Google Scholar
- Garcia-Manero G, Faderl S, O’Brien S, Cortes J, Talpaz M, Kantarjian HM: Chronic myelogenous leukemia: a review and update of therapeutic strategies. Cancer. 2003, 98: 437-457. 10.1002/cncr.11520.View ArticleGoogle Scholar
- Koskela H, Koskenvesa P, Mustjoki S, Porkka K: Current therapy of chronic myeloid leukemia. Duodecim. 2012, 128: 579-588.Google Scholar
- Arce-Lara C, Shah MH, Jimenez RE, Patel VR, Benson DM, Clinton SK, Monk JP: Gastrointestinal stromal tumors involving the prostate: presentation, course, and therapeutic approach. Urology. 2007, 69: 1209.e5-1209.e7.View ArticleGoogle Scholar
- Bongers M, Coupé VMh, Jansma E, Smit E, Uyl-de-Groot CA: Cost effectiveness of treatment with new agents in advanced non-small-cell lung cancer: a systematic review. Pharmacoeconomics. 2012, 30: 17-34. 10.2165/11595000-000000000-00000.View ArticleGoogle Scholar
- Rexer B, Arteaga C: Intrinsic and acquired resistance to HER2-targeted therapies in HER2 gene-amplified breast cancer: mechanisms and clinical implications. Crit Rev Oncog. 2012, 17: 1-16. 10.1615/CritRevOncog.v17.i1.20.View ArticleGoogle Scholar
- Metz JT, Johnson EF, Soni NB, Merta PJ, Kifle L, Hajduk PJ: Navigating the kinome. Nat Chem Biol. 2011, 7: 200-202. 10.1038/nchembio.530.View ArticleGoogle Scholar
- Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL: Global mapping of pharmacological space. Nat Biotechnol. 2006, 24: 805-815. 10.1038/nbt1228.View ArticleGoogle Scholar
- Krystal GW: Imatinib mesylate (STI571) for myeloid malignancies other than CML. Leuk Res. 2004, 28 (Suppl 1): S53-S59.View ArticleGoogle Scholar
- Donner A: Stem cells: Imatinib gets beta. Nat Chem Biol. 2012, 8: 502-View ArticleGoogle Scholar
- Nagano T, Kotani Y, Kobayashi K, Hatakeyama Y, Hori S, Kasai D, Funada Y, Nishimura H, Kondoh T, Nishimura Y: Long-term outcome after multidisciplinary approach for leptomeningeal carcinomatosis in a non-small cell lung cancer patient with poor performance status. Intern Med. 2011, 50: 3019-3022. 10.2169/internalmedicine.50.5903.View ArticleGoogle Scholar
- Tsang RY, Finn RS: Beyond trastuzumab: novel therapeutic strategies in HER2-positive metastatic breast cancer. Br J Cancer. 2012, 106: 6-13. 10.1038/bjc.2011.516.View ArticleGoogle Scholar
- Knight ZA, Lin H, Shokat KM: Targeting the cancer kinome through polypharmacology. Nat Rev Cancer. 2010, 10: 130-137. 10.1038/nrc2787.View ArticleGoogle Scholar
- Bamborough P, Drewry D, Harper G, Smith GK, Schneider K: Assessment of chemical coverage of kinome space and its implications for kinase drug discovery. J Med Chem. 2008, 51: 7898-7914. 10.1021/jm8011036.View ArticleGoogle Scholar
- Vieth M, Higgs RE, Robertson DH, Shapiro M, Gragg EA, Hemmerle H: Kinomics-structural biology and chemogenomics of kinase inhibitors and targets. Biochim Biophys Acta. 2004, 1697: 243-257. 10.1016/j.bbapap.2003.11.028.View ArticleGoogle Scholar
- Vieth M, Sutherland JJ, Robertson DH, Campbell RM: Kinomics: characterizing the therapeutically validated kinase space. Drug Discov Today. 2005, 10: 839-846. 10.1016/S1359-6446(05)03477-X.View ArticleGoogle Scholar
- Van der Horst E, Peironcely JE, Van Westen GJP, Van den Hoven OO, Galloway WRJD, Spring DR, Wegner JK, Van Vlijmen HWT, IJzerman AP, Overington JP, Bender A: Chemogenomics approaches for receptor deorphanization and extensions of the chemogenomics concept to phenotypic space. Curr Top Med Chem. 2011, 11: 1964-1977. 10.2174/156802611796391230.View ArticleGoogle Scholar
- Van der Horst E, Peironcely JE, IJzerman AP, Beukers MW, Lane JR, Van Vlijmen HWT, Emmerich MTM, Okuno Y, Bender A: A novel chemogenomics analysis of G protein-coupled receptors (GPCRs) and their ligands: a potential strategy for receptor de-orphanization. BMC Bioinforma. 2010, 11: 316-10.1186/1471-2105-11-316.View ArticleGoogle Scholar
- Bender A, Young DW, Jenkins JL, Serrano M, Mikhailov D, Clemens PA, Davies JW: Chemogenomic data analysis: prediction of small-molecule targets and the advent of biological fingerprint. Comb Chem High Throughput Screen. 2007, 10: 719-731. 10.2174/138620707782507313.View ArticleGoogle Scholar
- Lounkine E, Kutchukian P, Petrone P, Davies JW, Glick M: Chemotography for multi-target SAR analysis in the context of biological pathways. Bioorg Med Chem. 2012, 20: 5416-5427. 10.1016/j.bmc.2012.02.034.View ArticleGoogle Scholar
- Cheng F, Zhou Y, Li J, Li W, Liu G, Tang Y: Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. Mol Biosyst. 2012, 8: 2373-2384. 10.1039/c2mb25110h.View ArticleGoogle Scholar
- Sos ML, Rode HB, Heynck S, Martin P, Fischer F, Klüter S, Pawar VG, Reuter C, Heuckmann JM, Weiss J, Ruddigkeit L, Rabiller M, Koker M, Simard JR, Getlik M, Yuza Y, Chen TH, Greulich H, Thomas RK, Rauh D: Chemogenomic profiling provides insights into the limited activity of irreversible EGFR inhibitors in tumor cells expressing the T790M EGFR resistance mutation. Cancer Res. 2010, 70: 868-874. 10.1158/0008-5472.CAN-09-3106.View ArticleGoogle Scholar
- Karaman MW, Herrgard S, Treiber DK, Gallant P, Atteridge CE, Campbell BT, Chan KW, Ciceri P, Davis MI, Edeen PT, Faraoni R, Floyd M, Hunt JP, Lockhart DJ, Milanov ZV, Morrison MJ, Pallares G, Patel HK, Pritchard S, Wodicka LM, Zarrinkar PP: A quantitative analysis of kinase inhibitor selectivity. Nat Biotechnol. 2008, 26: 127-132. 10.1038/nbt1358.View ArticleGoogle Scholar
- Lombardo LJ, Lee FY, Chen P, Norris D, Barrish JC, Behnia K, Castaneda S, Cornelius LAM, Das J, Doweyko AM, Fairchild C, Hunt JT, Inigo I, Johnston K, Kamath A, Kan D, Klei H, Marathe P, Pang S, Peterson R, Pitt S, Schieven GL, Schmidt RJ, Tokarski J, Wen M-L, Wityak J, Borzilleri RM: Discovery of N-(2-chloro-6-methyl- phenyl)-2-(6-(4-(2-hydroxyethyl)- piperazin-1-yl)-2-methylpyrimidin-4- ylamino)thiazole-5-carboxamide (BMS-354825), a dual Src/Abl kinase inhibitor with potent antitumor activity in preclinical assays. J Med Chem. 2004, 47: 6658-6661. 10.1021/jm049486a.View ArticleGoogle Scholar
- Kuhn D, Weskamp N, Hüllermeier E, Klebe G: Functional classification of protein kinase binding sites using Cavbase. ChemMedChem. 2007, 2: 1432-1447. 10.1002/cmdc.200700075.View ArticleGoogle Scholar
- Milletti F, Vulpetti A: Predicting polypharmacology by binding site similarity: from kinases to the protein universe. J Chem Inf Model. 2010, 50: 1418-1431. 10.1021/ci1001263.View ArticleGoogle Scholar
- Martin E, Mukherjee P: Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. J Chem Inf Model. 2012, 52: 156-170. 10.1021/ci200314j.View ArticleGoogle Scholar
- Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP: Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol. 2011, 29: 1046-1051. 10.1038/nbt.1990.View ArticleGoogle Scholar
- Inhibitor Libraries and Pathway Panels: http://www.emdmillipore.com/life-science-research/inhibitor-libraries-and-pathway-panels/c_bKSb.s1O9nQAAAEitTV5hDHD?/p_uuid (accessed Oct 16, 2012)
- Gao Y, Davies SP, Augustin M, Woodward A, Patel UA, Kovelman R, Harvey KJ: A broad activity screen in support of a chemogenomic map for kinase signalling research and drug discovery. Biochem J. 2013, 451: 313-328. 10.1042/BJ20121418.View ArticleGoogle Scholar
- Anastassiadis T, Deacon SW, Devarajan K, Ma H, Peterson JR: Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat Biotechnol. 2011, 29: 1039-1045. 10.1038/nbt.2017.View ArticleGoogle Scholar
- Bender A, Bojanic D, Davies JW, Crisman TJ, Mikhailov D, Scheiber J, Jenkins JL, Deng Z, Hill W, Popov M, Jacoby E, Glick M: Which aspects of HTS are emperically correlated with downstream success?. Curr Opin Drug Discov Devel. 2008, 11: 327-337.Google Scholar
- Zuccotto F, Ardini E, Casale E, Angiolini M: Through the “gatekeeper door”: exploiting the active kinase conformation. J Med Chem. 2010, 53: 2681-2694. 10.1021/jm901443h.View ArticleGoogle Scholar
- Kalliokoski T, Vulpetti A: Large-scale evaluation of CavBase for analyzing the polypharmacology of Kinase inhibitors. Mol Inf. 2011, 30: 923-925. 10.1002/minf.201100112.View ArticleGoogle Scholar
- Levitzki A: Tyrosine kinase inhibitors: views of selectivity, sensitivity, and clinical performance. Annu Rev Pharmacol Toxicol. 2013, 53: 161-185. 10.1146/annurev-pharmtox-011112-140341.View ArticleGoogle Scholar
- Wawer M, Peltason L, Weskamp N, Teckentrup A, Bajorath J: Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. J Med Chem. 2008, 51: 6075-6084. 10.1021/jm800867g.View ArticleGoogle Scholar
- Norman RA, Toader D, Ferguson AD: Structural approaches to obtain kinase selectivity. Trends Pharmacol Sci. 2012, 33: 273-278. 10.1016/j.tips.2012.03.005.View ArticleGoogle Scholar
- Merck Millipore KinaseProfiler Service Assay Protocols: http://www.millipore.com/techpublications/tech1/pf3036 (accessed Oct 27, 2012)
- Chemical Computing Group Inc: Molecular Operating Environment (MOE). http://www.chemcomp.com/software.htm (accessed Oct 27, 2012)
- Rogers D, Hahn M: Extended-connectivity fingerprints. J Chem Inf Model. 2010, 50: 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Bender A, Jenkins JL, Scheiber J, Sukuru SCK, Glick M, Davies JW: How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009, 49: 108-119. 10.1021/ci800249s.View ArticleGoogle Scholar
- Bender A, Mussa HY, Glen RC: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Model. 2004, 44: 1708-1718. 10.1021/ci0498719.View ArticleGoogle Scholar
- Hinselmann G, Rosenbaum L, Jahn A, Fechner N, Zell A: jCompoundMapper: an open source Java library and command-line tool for chemical fingerprints. J Cheminform. 2011, 3: 3-10.1186/1758-2946-3-3.View ArticleGoogle Scholar
- Sutherland JJ, Higgs RE, Watson I, Vieth M: Chemical fragments as foundations for understanding target space and activity prediction. J Med Chem. 2008, 51: 2689-2700. 10.1021/jm701399f.View ArticleGoogle Scholar
- Alix B, Boubacar DA, Vladimir M: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 2012, 40: W573-W579. 10.1093/nar/gks485.View ArticleGoogle Scholar
- The Human Kinome: http://kinase.com/human/kinome/ (accessed Oct 27, 2012)
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.View ArticleGoogle Scholar
- Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
- Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007, 23: 127-128. 10.1093/bioinformatics/btl529.View ArticleGoogle Scholar
- Zmasek CM: Archaeopteryx: Visualization, Analysis, and Editing of Phylogenetic Trees. http://www.phylosoft.org/archaeopteryx/ (accessed 25 July, 2012)
- Letunic I, Bork P: Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 2011, 39: W475-W478. 10.1093/nar/gkr201.View ArticleGoogle Scholar
- Merck Millipore DART: http://www.millipore.com/life_sciences/flx4/dart (accessed Oct 27, 2012)
- Dessau RB, Pipper CB: “R”–project for statistical computing. Ugeskr Laeger. 2008, 170: 328-330.Google Scholar
- Orange Canvas: http://orange.biolab.si/ (accessed Oct 27, 2012)
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.