The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation
© Franco et al.; licensee Chemistry Central Ltd. 2014
Received: 28 October 2013
Accepted: 2 January 2014
Published: 1 February 2014
In the European Union, medicines are authorised for some rare disease only if they are judged to be dissimilar to authorised orphan drugs for that disease. This paper describes the use of 2D fingerprints to show the extent of the relationship between computed levels of structural similarity for pairs of molecules and expert judgments of the similarities of those pairs. The resulting relationship can be used to provide input to the assessment of new active compounds for which orphan drug authorisation is being sought.
143 experts provided judgments of the similarity or dissimilarity of 100 pairs of drug-like molecules from the DrugBank 3.0 database. The similarities of these pairs were also computed using BCI, Daylight, ECFC4, ECFP4, MDL and Unity 2D fingerprints. Logistic regression analyses demonstrated a strong relationship between the human and computed similarity assessments, with the resulting regression models having significant predictive power in experiments using data from submissions of orphan drug medicines to the European Medicines Agency. The BCI fingerprints performed best overall on the DrugBank dataset while the BCI, Daylight, ECFP4 and Unity fingerprints performed comparably on the European Medicines Agency dataset.
Measures of structural similarity based on 2D fingerprints can provide a useful source of information for the assessment of orphan drug status by regulatory authorities.
The discovery, testing and registration of a novel drug is both time-consuming and extremely expensive, with a review by Morgan et al. quoting costs in the range $161 million to $1.8 billion for the development of a novel therapeutic agent . Such huge costs are acceptable to a pharmaceutical company if, and only if, there is a reasonable expectation that they can be recouped and a profit achieved when the drug is made available to large numbers of patients suffering from the target disease. There are, however, many diseases where there is a clear need for treatment but where there are insufficient patients world-wide to support the costs of modern drug research. These medical conditions are normally referred to as rare diseases and there is much current interest in the development of orphan drugs for the treatment of such diseases [2–4].
There is no single definition of a rare disease, since account may need to be taken not only of the number of patients affected by it but also its severity and the availability of existing, adequate treatments. Different regulatory authorities have hence adopted rather different definitions [5–7]. In the European Union (EU), which is the context for this paper, the evaluation of orphan drugs is coordinated by the European Medicines Agency (hereafter the EMA). According to article 3 (1) of Regulation (EC) No 141/2000 of the European Parliament and of the Council of 16 December 1999 on orphan medicinal products, a medicine must meet a number of criteria if it is to qualify as an orphan drug: “it must be intended for the treatment, prevention or diagnosis of a disease that is life-threatening or chronically debilitating; the prevalence of the condition in the EU must not be more than 5 in 10,000 or it must be unlikely that marketing of the medicine would generate sufficient returns to justify the investment needed for its development; and no satisfactory method of diagnosis, prevention or treatment of the condition concerned can be authorised, or, if such a method exists, the medicine must be of significant benefit to those affected by the condition”.
The EU provides a range of incentives to encourage the development of orphan drugs, the most important of which is a high level of market exclusivity: once a medicine has been awarded an orphan drug authorisation by the European Commission, no similar medicinal product can be brought to the European market for a period of ten years. The criteria and incentives were detailed formally, in the regulation noted above, but without any explicit specification of the nature or the extent of the similarity required to define a “similar medicinal product”. This lack was addressed, in part at least, in a subsequent regulation - Commission Regulation (EC) No 847/2000 of 27 April 2000 laying down the provisions for implementation of the criteria for designation of a medicinal product as an orphan medicinal product and definitions of the concepts ‘similar medicinal product’ and ‘clinical superiority’ - which defined a similar active substance as “an identical active substance, or an active substance with the same principal molecular structural features (but not necessarily all of the same molecular structural features) and which acts via the same mechanism”.
When a company applies to register a new medicine for an indication that has already been granted for an orphan medicine it is the responsibility of the EMA’s Committee for Medicinal Products for Human Use (CHMP) to decide if the new drug is indeed similar to an existing orphan drug, with an application being successful only when the CHMP decides that this is not the case. To date, the evaluations carried out by the CHMP have been based largely on human judgments of similarity. In this paper, we discuss the use of computed measures of structural similarity based on 2D fingerprints to provide an additional source of information that could be used when the CHMP considers the relationships that may exist between existing and proposed new medicines for rare diseases.
Results and discussion
Human similarity judgements
We ascribe these levels of disagreement to the inherently subjective nature of similarity [8, 9], with an individual’s perception that two objects are similar depending on a range of factors (such as their state of mind, gender, age, personality and previous scientific experience inter alia). That being so, it is hardly surprising that different experts responded in different ways to the molecule-pairs that were presented to them, a finding that is consistent with previous experimental studies that have demonstrated that different individuals do not perceive chemical structure information in the same way. For example, Lajiness et al. report a study of medicinal chemists at Pharmacia, who were asked to review lists of compounds in order to assess their potential as leads in a drug discovery programme ; not only were there marked inconsistencies between the chemists, but even the same chemist might give different assessments on different occasions. Hack et al. reported an analogous study that sought to enhance the diversity of the Johnson & Johnson corporate structure database; they found that whilst there were considerable differences between individual chemists a fair level of consistency could be achieved using a wisdom-of-crowds approach . This technique was also used by Oprea et al. to reconcile the often disparate views of pharmaceutical experts as to the effectiveness of chemical probes resulting from the NIH Molecular Libraries and Imaging Initiative . Boda et al.  and Bonnet  studied groups of medicinal chemists’ assessments of molecular synthetic feasibility, and again observed some degree of inconsistency in the judgments that were made. Finally, Kutchukian et al. have reported a large-scale study of medicinal chemists at Novartis, who were asked to select chemical fragments for lead-generation projects. There was not only a marked level of inconsistency in the selection, but also a comparable level of inconsistency in the reasons for their selections . Analogous variations in the ways that individuals react to objects have been widely observed: for example, when indexing terms are assigned to documents , when links are created in hypertext systems , when search strategies are chosen for accessing text databases , and when scientists create mental maps of active research areas .
We note here one characteristic of the training data that could have affected the results, which is the way that the molecules were presented to the experts for assessment. In some cases, the molecules in a pair were displayed in such a way that the structural similarities were obvious to the human eye with the common features clearly aligned, as exemplified by the molecules in the first row of Figure 1. In other cases, the similarities may have been less obvious when the common features were not aligned, as exemplified by the molecules in the third row of Figure 1. Such variant alignments could result in the experts perceiving the molecules comprising a pair to be less similar than might have been expected from one or more of the computed, fingerprint-based similarities. The alignments presented to the experts (such as the examples above and the molecule-pairs in Additional file 1: Table S1) were those available in the DrugBank database. No attempt was made to modify the alignments in cases where it was felt that improvements were possible (such as the example above), since this is the situation faced by the members of the CHMP when they consider applications for authorisation; indeed, they have the additional problem that the molecule-pairs that they inspect (i.e., a molecule that has been submitted for consideration and the existing orphan-drug for that disease) may well have been drawn using different drawing packages.
Logistic regression, receiver operating characteristic curves and performance statistics
Once the human judgments had been determined and the similar and non-similar molecular-pairs identified, it was possible to develop logistic regression models that assessed how fingerprint-based similarities were correlated with the probability of being considered similar by the majority of the experts. For each fitted model, ROC curves and other performance statistics were computed in order to assess the predictive performance of each fingerprint. The analysis is exemplified by the logistic regression model for the ECFP4 fingerprint, with the same processes being applied to each of the fingerprints.
Logistic regression to predict the similarity, or otherwise, of training-set molecule-pairs using different types of fingerprint
The corresponding results for all of the six fingerprints are listed in Table 1, with the fingerprints listed in alphabetical order. As with the ECFP4 fingerprint, goodness-of-fit assessment was performed for all fingerprints and the assumptions of each fitted model were assessed with the Hosmer-Lemenshow test. The Nagelkerke R2 values were high (>0.8) for all fingerprints. In the case of the Unity data, the model listed in Table 1 is that obtained after the elimination of one outlier molecule-pair (number 99 in Additional file 1: Table S1), where over 70% of the experts judged the pair to be similar despite a Unity similarity of just 0.217. When this molecule pair was included in the model, the assumptions of the logistic regression did not hold: removing this observation, nevertheless, did not change significantly the estimates of β0 and β1.
Optimal levels of performance using ROC curves
Use of an external test-set
Characteristics of the 163 training-set and 51 test-set molecules
Number of carbons
Number of heteroatoms
Number of rings
Number of aromatic rings
Number of stereocentres
Numbers of test-set molecule-pairs predicted correctly using t LR and t ROC
A simple consensus approach was then used to see if further improvements could be made. A molecule-pair was classified as similar or non-similar by each of the fingerprints individually, and then the final classification was similar if three or more of the individual classifications were similar. This consensus result forms the bottom row of Table 4.
In this paper, we have described how fingerprint-based measures of similarity can be used to assess the structural novelty of molecules that are being submitted for consideration as new medicines for rare diseases. Such measures are well established, dating back to at least the mid-1970s , for applications such as property prediction, cluster analysis and virtual screening. A characteristic of most of these studies is that they have focused on the identification of molecules that are similar to each other; in similarity-based virtual screening, for example, the aim is to identify those previously untested database structures that are most similar to a bioactive reference structure [20, 21], whilst removing the large numbers of low-similarity database structures from further consideration. In the current application, conversely, dissimilarity is of at least as much importance as is similarity; indeed, it is arguably of greater importance for a company applying for orphan drug authorisation. The other chemoinformatics application where dissimilarity is important is molecular diversity analysis; however the identification of sets of mutually dissimilar molecules is very different from the need to determine whether two molecules are considered to be significantly different as is required for the current application.
The results obtained here demonstrate clearly that simple, 2D fingerprint representations provide measures of structural similarity that mimic closely the judgments of experts, using both training-set molecule-pairs extracted from DrugBank and test-set molecule-pairs typical of the work of the CHMP. This is so despite the fact that the two sets of molecules are rather different in character (as demonstrated by the figures in Table 3). The BCI fingerprints performed best overall on the training-set while the BCI, Daylight, ECFP4 and Unity fingerprints showed comparable, high levels of predictive performance on the test-set. The BCI fingerprints would hence seem to be an appropriate choice for future studies in this area. They encode six different classes of chemical substructure: augmented atoms, atom/bond sequences, atom pairs, and three types of ring feature. The atom- and bond-types can be generalized if required and an algorithmic procedure is used to select the required number of substructures (1052 in the present case) whilst ensuring that they satisfy user-specified criteria relating to minimum, maximum and co-occurrence frequencies.
There are, of course, similarity measures other than those studied to date that could be used for the study of orphan drug similarity, e.g., a measure that takes account of 3D structural information or that uses a similarity coefficient other than the Tanimoto coefficient. Other possible areas of study include the use of multiple similarity measures in the logistic regression model, accounting for individual judgements instead of using the majority decision, or the use of more sophisticated data fusion methods  than the simple consensus approach considered thus far.
In conclusion, we must emphasise that we are not suggesting that a computational procedure could be used as an alternative to, let alone a replacement for, the current processes used to evaluate applications for orphan drug authorisations. However, the approach described here could form a useful, quantitative input to those evaluations by providing a tool to assess molecular structural similarity by interested parties. Assume that a new molecule M is being submitted for orphan drug authorisation, and that there is already an existing drug D for this indication. The similarity between M and D is computed, e.g. using one of the fingerprint-types that performed well in the experiments above, and then the corresponding regression equation used to give the probability that the two molecules would be considered a similar molecule-pair, based upon experts’ previous similarity assessments. This probability would then be one of the multiple factors that are considered when deciding upon the similarity or otherwise between M and D.
In the evaluation of similarity in the context of orphan drug evaluation, the CHMP needs to decide whether or not an active compound for which a new medicine, or an extension of an existing marketing authorisation (change of indication or line extension), is being sought is similar to an existing orphan drug that has already been authorised. The decision is made on the basis of the votes of a panel of experts drawn from each of the member states of the EU. Focussing just on the similarity criterion, the experts are required to make a binary decision: is the new molecule similar to, or different from, the existing orphan drug(s) for the rare disease of interest? The experimental set-up that we have created seeks to mimic this situation, with a panel of experts being asked to make judgements on the similarity or otherwise of a carefully chosen training-set of molecule-pairs, and then a comparison being made between these human decisions and the outputs from computer-based similarity calculations. In this section, we describe the following components of our experimental procedure: the training-set of molecule-pairs on which the similarity assessments were made; the panel of experts who made these assessments; measuring the effectiveness of the automated assessments; and, finally, a second, independent test-set of molecule-pairs that was used to assess the predictive performance of the models developed from the training-set.
The training-set database contains 100 pairs of bioactive molecules selected from DrugBank 3.0 (at http://www.drugbank.ca/), a bioinformatics and chemoinformatics resource that contains a wealth of detailed information on over six thousand drug molecules and their associated biological targets . The file was filtered to identify 1068 molecules that contained at least one carbon atom and that had not more than ten hydrogen bond accepters, not more than five hydrogen bond donors, a molecular mass not greater than 500 Daltons, and an octanol-water partition coefficient not greater than 5. The similarity between each distinct pair of these drug-like molecules was computed using ECFP4 fingerprints  and the Tanimoto coefficient , and then 100 pairs of molecules chosen so as to cover as wide and as equal a spread of Tanimoto values as possible, with the observed similarity values ranging from 0.116 to 1.000. The molecule-pairs contained a total of 163 distinct molecules, these representing 42 different pharmacological classes including antibiotics, beta-adrenergic antagonists, benzodiazepines and anti-hypertensives inter alia.
With the permission of the EMA, one of us (PF) gave presentations to several EMA committees and working parties responsible for the evaluation and the quality of medicines. The attendees at these meetings who had a background in quality and experience in assessing orphan drugs were invited to participate in the project by providing similarity judgments on the 100 pairs of DrugBank molecules. Similar invitations were sent to appropriate individuals on an EMA email list of European experts with a background in the quality of medicines, and to contact points in the regulatory authorities in the USA, Japan and Taiwan (the Food and Drug Administration, the Pharmaceutical and Medical Devices Agency, and the Food and Drug Administration of Taiwan, respectively). Participants were sent the 100 pairs of 2D structure diagrams and for each molecule-pair asked to state whether (Yes) or not (No) the two molecules should be regarded as being structurally similar. A total of 143 completed responses (128 from within the EU) was obtained and these were then used to compute the fractions of Yes and No responses for each of the pairs of molecules. The structure diagrams and SMILES descriptions for the 100 molecule-pairs and the percentages of Yes and No responses for each such pair are listed in Additional file 1: Table S1.
The decisions of the CHMP are decided on the basis of majority voting, and it was hence decided that the molecule-pairs in the sample where more than 50% of the responses were Yes should be considered as similar, which we shall refer to as a similar molecule-pair. If this was not the case then the two molecules were judged to be a non-similar molecule-pair. This resulted in 49 similar molecule-pairs and 51 non-similar molecules-pairs; then, once each of the molecule-pairs had been categorised in this way, the expert judgments were used to assess the categorisation ability of similarity measures based on 2D fingerprints.
Measurement of effectiveness
Many different types of structural representation can be used to compute inter-molecular structural similarities. The similarities here were computed with 2D fingerprints, which are widely used for this purpose since they are both simple to compute and effective in operation [26, 27]. The following types of fingerprint were generated to represent the molecules in each of the pairs: BCI (1052), Daylight (2048), ECFC4 (1024), ECFP4 (1024), MDL (166) and Unity (988), where the number in brackets is the number of elements in the fingerprint. Brief descriptions of all these types of fingerprint are provided by Gardiner et al. . The Tanimoto coefficient was used to compute the similarity between the fingerprints for each of the molecules comprising a molecule-pair, using each type of fingerprint in turn. In addition, 23 computed molecular properties (such as molecular weight, logP, pKa, molar refractivity, PSA, numbers of rotatable bonds and stereocentres etc.) were computed using Pipeline Pilot to provide an additional type of structure representation. However, the results obtained using this representation were uniformly inferior to those obtained using the various 2D fingerprints, and the results and discussion hence consider only the fingerprint-based similarity data.
that describes a linear relationship between the similarity and the logarithm of the odds that the molecules comprise a similar molecule-pair . The performance of the model can be assessed by observing the differences between the sets of observed and predicted values: this was done here using Nagelkerke’s R2 statistic, which takes values between zero and unity (denoting a very poor fit and a perfect fit, respectively). Also, Hosmer-Lemenshow goodness-of-fit tests were used to assess the assumptions of the model (i.e., linearity at the log scale and additivity). Once the logistic regression equation for a fingerprint had been generated, it was used to compute the threshold similarity, t LR , such that the two molecules comprising a pair are predicted to be similar (‘Yes’) if their computed similarity is ≥ t LR (corresponding to a probability greater than or equal to 0.5 of being similar according to the logistic regression model) or predicted to be not similar (‘No’) if < t LR (corresponding to a probability lower than 0.5 of being similar according to the logistic regression model).
These performance statistics were computed as the value of t was systematically varied, so as to determine t ROC , i.e., the threshold similarity that resulted in the best overall predictive performance. The best level of performance was taken to be that threshold similarity which resulted in the maximum values for the precision, the accuracy, the F index, the Youden index and the Matthews coefficient whilst maintaining acceptable values of the sensitivity and specificity.
The predictive power of the models derived from the similarity data for the 100 DrugBank molecule-pairs could have been assessed by means of cross validation experiments using that training-set, as is often done in SAR and QSAR studies. It is generally considered better, however, to use a distinct test-set that has not been involved in the training [30, 31], and this was accomplished here using data kindly provided by the CHMP that typifies their regular work-load. Specifically, the test-set contained 100 molecule-pairs in which one molecule was an existing orphan drug for some specific rare disease and the other was a molecule that had been submitted to the CHMP for consideration for orphan drug status for that disease.
It should be noted that the test-set differs from the training-set in two principal ways. First, of the 100 molecule-pairs provided by the CHMP, 89 of them had been judged to be non-similar pairs with only 11 judged to be similar pairs, whereas the test-set contained near-equal numbers of the two types of molecule-pair. This is not unexpected given that companies are unlikely to submit for consideration molecules that are obviously closely related to existing orphan drugs for some disease. Second, the natures of the molecules involved. It is not possible to provide structural data analogous to that presented in Additional file 1: Table S1 given the highly confidential nature of the application process. However, some broad characteristics of the test-set are as follows. There were 51 distinct molecules in the test-set, since many of the molecule-pairs resulted from the comparison of an existing orphan drug with several different molecules that had sought authorisation for the same rare disease. Just five pharmacological classes were represented: there were two immunosuppressants, two respiratory system compounds, three antimicrobials, ten pulmonary arterial hypertension compounds, and no less than 34 antineoplastic compounds (reflecting the fact that much current orphan drug research focuses on therapies for rare types of cancer ). The compounds were notably larger than those in the test-set as demonstrated in Table 3, with 27.5% of them not being Lipinski-compliant.
We thank the following: Dr. Jean-Louis Robert, Dr. Hilde Boone, Dr. Yoshikazu Hayash and Dr. Lin-Chau Chang for support and helping in networking with the international experts; all of the experts who participated in the similarity survey; and Prof. Martin Posh and Dr. Luis Pinheiro for statistical advice.
- Morgan S, Grootendorst P, Lexchin J, Cunningham C, Greyson D: The cost of drug development: a systematic review. Health Policy. 2011, 100: 4-17. 10.1016/j.healthpol.2010.12.002.View ArticleGoogle Scholar
- Meekings KN, Williams CSM, Arrowsmith JE: Orphan drug development: an economically viable strategy for biopharma R&D. Drug Discov Today. 2012, 17: 660-664. 10.1016/j.drudis.2012.02.005.View ArticleGoogle Scholar
- Melnikova I: Rare diseases and orphan drugs. Nat Rev Drug Discov. 2012, 11: 267-268. 10.1038/nrd3654.View ArticleGoogle Scholar
- Tambuyzer E: Rare diseases, orphan drugs and their regulation: questions and misconceptions. Nat Rev Drug Discov. 2010, 9: 921-928. 10.1038/nrd3275.View ArticleGoogle Scholar
- Westermark K, Holm BB, Söderholm M, Llinares-Garcia J, Rivière F, Aarum S, Butlen-Ducuing F, Tsigkos S, Wilk-Kachlicka A, N’Diamoi C, Borvendég J, Lyons D, Sepodes B, Bloechl-Daum B, Lhoir A, Todorova M, Kkolos I, Kubáčková K, Bosch-Traberg H, Tillmann V, Saano V, Héron E, Elbers R, Siouti M, Eggenhofer J, Salmon P, Clementi M, Krieviņš D, Matulevičiene A, Metz H, et al: European regulation on orphan medicinal products: 10 years of experience and future perspectives. Nat Rev Drug Discov. 2011, 10: 341-349.View ArticleGoogle Scholar
- Lev D, Thorat C, Phillips I, Thomas M, Imoisili MA: The routes to orphan drug designation - our recent experience at the FDA. Drug Discov Today. 2012, 17: 97-99. 10.1016/j.drudis.2011.12.014.View ArticleGoogle Scholar
- Franco P: Orphan drugs: the regulatory environment. Drug Discov Today. 2013, 18: 163-172. 10.1016/j.drudis.2012.08.009.View ArticleGoogle Scholar
- Simmons S, Estes Z: Individual differences in the perception of similarity and difference. Cognition. 2008, 108: 781-795. 10.1016/j.cognition.2008.07.003.View ArticleGoogle Scholar
- Kutchukian PS, Vasilyeva NY, Xu J, Lindvall MK, Dillon MP, Glick M, Coley JD, Brooijmans N: Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery. PLOS One. 2012, 7: e48476-10.1371/journal.pone.0048476.View ArticleGoogle Scholar
- Lajiness M, Maggiora G, Shanmugasundaram V: Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J Med Chem. 2004, 47: 4891-4896. 10.1021/jm049740z.View ArticleGoogle Scholar
- Hack MD, Rassokhin DN, Buyck C, Seierstad M, Skalkin A, ten Holte P, Jones TK, Mirzadegan T, Agrafiotis DK: Library enhancement through the wisdom of crowds. J Chem Inf Model. 2011, 51: 3275-3286. 10.1021/ci200446y.View ArticleGoogle Scholar
- Oprea TI, Bologa CG, Boyer S, Curpan RF, Glen RC, Hopkins AL, Lipinski CA, Marshall GR, Martin YC, Ostopovici-Halip L, et al: A crowdsourcing evaluation of the NIH chemical probes. Nature Chemical Biology. 2009, 5: 441-447. 10.1038/nchembio0709-441.View ArticleGoogle Scholar
- Boda K, Seidel T, Gasteiger J: Structure and reaction based evaluation of synthetic accessibility. J Comput Aided Mol Des. 2007, 21: 311-325. 10.1007/s10822-006-9099-2.View ArticleGoogle Scholar
- Bonnet P: Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur J Med Chem. 2012, 54: 679-689.View ArticleGoogle Scholar
- Markey K: Inter-indexer consistency tests. Libr Inf Sci Res. 1984, 6: 155-177.Google Scholar
- Ellis D, Furner-Hines J, Willett P: On the creation of hypertext links in full-text documents: measurement of inter-linker consistency. J Doc. 1994, 50: 67-98. 10.1108/eb026925.View ArticleGoogle Scholar
- Iivonen M: Consistency in the selection of search concepts and search terms. Inf Process Manage. 1995, 31: 173-190. 10.1016/0306-4573(95)80034-Q.View ArticleGoogle Scholar
- Tijssen RJW: A scientometric cognitive study of neural network research: expert mental maps versus bibliometric maps. Scientometrics. 1993, 28: 111-136. 10.1007/BF02016288.View ArticleGoogle Scholar
- Adamson GW, Bush JA: A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. J Chem Inf Comput Sci. 1975, 15: 55-58. 10.1021/ci60001a016.View ArticleGoogle Scholar
- Ripphausen P, Nisius B, Bajorath J: State-of-the-art in ligand-based virtual screening. Drug Discov Today. 2011, 16: 372-376. 10.1016/j.drudis.2011.02.011.View ArticleGoogle Scholar
- Todeschini R, Consonni V, Xiang H, Holliday JD, Buscema M, Willett P: Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real datasets. J Chem Inf Model. 2012, 52: 2884-2901. 10.1021/ci300261r.View ArticleGoogle Scholar
- Willett P: Combination of similarity rankings using data fusion. J Chem Inf Model. 2013, 53: 1-10. 10.1021/ci300547g.View ArticleGoogle Scholar
- Manley PW, Stiefl N, Cowan-Jacob SW, Kaufman S, Mestan J, Wartmann M, Wiesmann M, Woodman R, Gallagher N: Structural resemblances and comparisons of the relative pharmacological properties of imatinib and nilotinib. Bioorg Med Chem. 2010, 18: 6977-6986. 10.1016/j.bmc.2010.08.026.View ArticleGoogle Scholar
- Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, et al: DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2011, 39: D1035-D1041. 10.1093/nar/gkq1126.View ArticleGoogle Scholar
- Rogers D, Hahn M: Extended-connectivity fingerprints. J Chem Inf Model. 2010, 50: 742-754. 10.1021/ci100050t.View ArticleGoogle Scholar
- Willett P: Similarity methods in chemoinformatics. Ann Rev Inf Sci Technol. 2009, 43: 3-71.Google Scholar
- Stumpfe D, Bajorath J: Similarity searching. Wiley Interdiscip Rev: Comput Mol Sci. 2011, 1: 260-282. 10.1002/wcms.23.Google Scholar
- Gardiner EJ, Gillet VJ, Haranczyk M, Hert J, Holliday JD, Malim N, Patel Y, Willett P: Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Stat Anal Data Min. 2009, 2: 103-114. 10.1002/sam.10037.View ArticleGoogle Scholar
- Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models. A framework for traditional and novel measures. Epidemiology. 2010, 21: 121-138.Google Scholar
- Golbraikh A, Tropsha A: Beware of q2!. J Mol Graph Model. 2002, 20: 269-276. 10.1016/S1093-3263(01)00123-1.View ArticleGoogle Scholar
- Gramatica P: Principles of QSAR models validation: internal and external. QSAR & Combinatorial Science. 2007, 26: 694-701. 10.1002/qsar.200610151.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.