Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints
© Awale et al.; licensee Springer. 2015
Received: 20 November 2014
Accepted: 19 December 2014
Published: 10 February 2015
Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures).
Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances.
3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects.
KeywordsVirtual screening Chemical space Databases Fingerprints Atom pairs Molecular shape Pharmacophores Stereoselectivity
Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support for drug discovery and development projects to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS) [1-3]. While public compound databases such as ChEMBL  or ZINC  offer similarity searching on their websites, options are limited to a single type of 2D-substructure similarity comparisons, and performance is limited in terms of speed and number of analogs retrieved. Recently we reported a series of interactive database browsers, accessible at www.gdb.unibe.ch, allowing molecular fingerprint  based LBVS within seconds in very large databases of millions of compounds such as ZINC (13.2 M commercially available drug-like molecules), PubChem (53.2 M structures collected from public sources), [7,8] or the much larger Chemical Universe Databases GDB-11 (26.4 M), GDB-13 (977 M) and GDB-17 (166.4 G) enumerating all possible organic molecules following simple rules of chemical stability and synthetic feasibility up to 11, 13 and 17 atoms [9-13]. Fast LBVS was made possible by using the sum of fingerprint bit values as hash function and the city-block distance as fingerprint similarity measure,  an approach applicable to scalar fingerprints such as MQN (Molecular Quantum Numbers)  and SMIfp (SMILES fingerprint),  and to binary fingerprints such as the daylight type substructure fingerprint Sfp  and the extended connectivity fingerprint ECFP4 .
Due to the importance of 3D-molecular shape and pharmacophores in determining the bioactivity [17-25] and clinical success of small molecule drugs,  we recently expanded our city-block distance based search algorithm to the topological atom pair fingerprints APfp (20-bit atom pair fingerprint, all heavy atoms without categories) and Xfp (55-bit category extended atom pair fingerprint), which count the number of atom pairs at increasing topological distance, counted in bonds through the shortest path, following a concept originally reported by Carhart et al.  We showed that these fingerprints encode 3D-features of molecules in various enrichment studies for 3D-shape, 3D-pharmacophore, and bioactive analogs .
APfp and Xfp were computed from the 2D-structure only. Considering that the 3D-structure of molecules is now available in several large databases such as the Cambridge Structural Database (CSD, experimental X-ray crystal structure) or the collated catalogs of all commercial compounds (ZINC, predicted 3D-structures), it should also be possible to compute a related 3D-atom pair fingerprint considering through-space rather than topological distances between atoms and subsequently organize large databases for fast LBVS. Such 3D-fingerprints should represent the actual 3D-shape more closely than 2D-fingerprints, and enable stereoselective LBVS by distinguishing between different conformers and stereoisomers of the same molecule, which is not possible with 2D-fingerprints.
Fingerprints used in this study
16-bit scalar 3D-fp, each bit is the sum of atom pair gaussian function values sampled at 16 different through-space distances between 1 and 20 Å, normalized to HAC1.5
80-bit scalar 3D-fp, equivalent to 3DAPfp extended to 5 categories: Hyb, HBA, HBD, sp2, and cross-pair HBA-HBD
40-bit scalar 3D-fp, each bit counts the number of atom pairs within the corresponding 0.5 Å through-space distance interval between 0 and 20 Å, normalized to HAC (R = regular binning)
200-bit scalar fp, category extended version of R3DAPfp
20-bit scalar 2D-fp, each bit counts the number of atom pairs at one particular topological distance between 1 and 20 bonds, normalized to HAC
55-bit scalar 2D-fp, category extended version of APfp
3-bit scalar 3D-fp, measures the principal moments of inertia scaled to molecular weight
12-bit scalar 3D-fp, represents euclidean distance distributions calculated with respect to four chosen reference points by three statistical moments: average, standard deviation and kurtosis
60-bit scalar 3D-fp, version of USR extended with categories: All atoms, Hyb, HBA, HBD, aromatic atoms
42-bit scalar 2D-fp, counts 42 Molecular Quantum Numbers (MQN) counting atom types, bond types, polar groups and topologies
1024-bit binary 2D-fp, perceives the presence of substructures
In a first study 3D-shape and pharmacophore analogs of 110,000 molecules from the Cambridge Structural Database (CSD) were defined using the Rapid Overlay of Chemical Structures (ROCS) shape similarity functions ROCS shape Tanimoto (shape only), ROCS Color Tanimoto (pharmacophore only), and ROCS Comboscore (combined shape and pharmacophore) [18,38,39]. Fingerprint based LBVS for these analogs showed that the very compact, 16-bit shape-only fingerprint 3DAPfp performed best among all fingerprints for recovering Shape and Comboscore analogs. 3DAPfp performed better than its 2D parent fingerprint APfp, in particular with molecules presenting a folded conformation in their crystal structure. On the other hand 3DXfp performed best for recovering pharmacophore (ROCS color) analogs from CSD. In a second study recovering actives in the directory of useful decoys (DUD), a broadly accepted method to benchmark virtual screening methods, [40-44] 3DXfp again performed better than 3DAPfp, yet showed results comparable to its parent 2D-fingerprint Xfp, an effect which might be related to the very 2-dimensional nature of the molecules in DUD and ZINC.
Remarkably, the 3D-fingeprints were stereoselective and produced significant differences between conformers and stereoisomers of the same molecule compared to different molecules of similar size. A third study was therefore performed in which the 3D-fingerprints were used for LBVS starting from different diastereomers of chiral drugs. Both 3DXfp and 3DAPfp gave very different nearest neighbors from different diastereomers, which were also different from the nearest neighbours obtained by the parent 2D-fingerprint search with Xfp or APfp, highlighting the impact of stereochemistry on LBVS. 3D-fingerprints also returned different nearest neighbors compared to 2D-fingerprints when searching for analogs of folded molecules identified as bound ligands in the Protein Databank. 3DAPfp and 3DXfp were used to design web-browsers for the 23.2 million 3D-structures in the ZINC database, which is freely available at www.gdb.unibe.ch. Stereoselective LBVS of 3D-structures in ZINC should provide useful assistance for drug discovery projects.
Results and discussion
Fingerprint design and optimization
The performance of the 3D-atom pair fingerprints 3DAPfp, R3DAPfp, 3DXfp and R3DXfp was evaluated in analog enrichment studies discussed below. In the course of these studies, parameter variations were examined to challenge the design of 3DAPfp and 3DXfp, which confirmed that the selected width of the atom pair gaussian (18% of atom pair distance) and the multiplication factor between successive sampling intervals (1.18) were optimal. For the regular binning fingerprints R3DAPfp and R3DXfp optimal results were obtained using 0.5 Å bin width, with broader but fewer bins giving slightly better results for recovering 3D-shape and pharmacophore analogs, and narrower but more numerous bins giving slightly better results in the DUD enrichment studies (Additional file 1: Figures S1-S3).
LBVS in the Cambridge structural database
Atom-pair fingerprints performed significantly better than USR, USRCAT and PMIfp in these comparisons, probably reflecting the more detailed encoding of molecular shape through atom pair counts compared to the more global shape parameters encoded in USR, USRCAT and PMIfp. The very compact 16-bit shape fingerprint 3DAPfp stood out by its high LBVS performance for ROCS shape analogs, which was higher than for R3DAPfp and the parent 2-dimensional APfp, showing that the gaussian/exponential binning principle used for 3DAPfp contributed to a better molecular shape perception (Figure 2A). The atom category extended fingerprint 3DXfp showed higher performance than 3DAPfp for recovery of ROCS Color Tanimoto analogs, in line with the fact that ROCS Color primarily encodes pharmacophores. However in this case results with 3DXfp were comparable to R3DXfp and the parent 2D-fingerprint Xfp independent of any position in the shape triangle (Figure 2B). Recovery of ROCS Comboscore analogs was most efficient using 3DAPfp, showing that this ROCS scoring function, which combines shape and pharmacophores, is dominated by molecular shape (Figure 2C).
DUD enrichment studies
The various 3D atom pair fingerprints readily retrieved scaffold-hopping analogs, which are compounds with high shape and pharmacophore similarity, similar bioactivity, but a low level of substructure similarity as measured by substructure similarity comparisons (Sfp) . Examples of scaffold-hopping analogs among DUD actives retrieved by 3DXfp are shown in Additional file 1: Figure S8. Similar scaffold-hopping capabilities were reported previously with MQN, APfp and Xfp, and generally occur with fingerprints not taking detailed substructures into account.
It should be noted that most molecules in DUD and ZINC are rod-like or at best 2-dimensional with only very few 3D-shaped molecules (Figure 4E/F). The very low shape diversity in these databases might partly contribute to the similar LBVS performance of 3D and 2D methods with DUD also noted in previous literature reports [18,33,41,42,46-49].
A distinctive feature of 3D-scoring functions and fingerprints is their ability to distinguish between different stereoisomers and conformers of the same molecule. Indeed the 3D-fingerprints investigated here distinguished between various stereoisomers and conformers of the model cases 4,5-dihydroxy-octa-2,6-diyne (2 enantiomers and one meso form, 9 conformers), glucopyranose (32 possible diastereomeric hexopyranoses, 154 conformers) and arachidonic acid ((5Z,8Z,11Z,14Z)-5,8,11,14-eicosatetraenoic acid, 16 possible E/Z double bond isomers, 640 conformers). However they lacked chiral sense information and did not differentiate between mirror image conformers, a possibility offered by ROCS scoring functions computed from overlapping chiral 3D-structures (Additional file 1: Figure S9).
LBVS with folded molecules
3D-fingerprints should behave differently from 2D-fingerprints in LBVS with folded molecules where through-space distances determining molecular shape are much shorter than topological distances (e.g. 1–4 Figure 3). To illustrate this point 10 ligands bound to their target protein in a folded conformation were identified by searching the Protein Databank for small molecules with very low correlation coefficient between through-space distance between atom pairs in the 3D-structure of the conformer and the corresponding atom pair topological distances in the parent 2D-structure (Additional file 1: Figure S10). In all 10 cases similarly folded conformations were generated from the Open Eye Omega 3D-builder (with which the 3D-structures in ZINC were computed), implying that folding was intrinsic and not induced by protein binding.
3DXfp and 3DAPfp browsers
Search times for retrieving 1000 nearest neighbours with the browsers are approximately 16 ± 10 sec. for 3DAPfp and 43 ± 17 sec. for 3DXfp depending on molecule size and the availability of closely related analogs in ZINC, to which data transfer times via the internet connection must be added. The search results are limited to a maximum of 1000 molecules to avoid stalling of the internet browser. The search results are displayed as molecule matrix indicating for each molecule the city-block distance to the query and the ZINC ID number (Figure 7C). For each of the result molecules, a link option is available to visualize the data in the parent ZINC database. The interactive browsers provide a straightforward method to rapidly interrogate ZINC for 3D-shape and 3D-pharmacophore analogs of any molecule of interest.
Extending on the work of Sheridan et al.,  geometric atom pair fingerprints counting atom pairs for all heavy atoms or extended with atom categories at increasing through-space distances were designed considering either fuzzy atoms pairs binned into increasing distance intervals (3DAPfp and 3DXfp), or direct binning of the exact atom-pair distance in 0.5 Å distance intervals (R3DAPfp and R3DXfp). These 3D fingerprints were compared in LBVS performance with other 3D-fingerprints (PMIfp, USR and USRCAT), the corresponding topological atom pair fingerprints APfp and Xfp, and MQN and Sfp as reference 2D-fingerprints. LBVS performance was assessed in enrichment studies for ROCS Shape and pharmacophore analogs in CSD and in the recovery of actives in DUD from decoys and from ZINC. The data showed that 3DAPfp was the best fingerprint for representing 3D-shape as measured by the ROCS Shape Tanimoto and Comboscore scoring functions, in particular surpassing its parent 2D-fingerprint APfp. On the other hand 3DXfp surpassed 3DAPfp for LBVS of ROCS pharmacophore analogs and DUD actives, however its performance was comparable to its parent 2D-fingerprint Xfp.
LBVS with 3DXfp and 3DAPfp was stereoselective, leading to very different nearest neighbors from diastereomeric drugs as query molecules. LBVS results with 3DXfp and 3DAPfp were themselves different from nearest neighbors retrieved using the 2D-fingerprints Xfp and APfp. 3D-and 2D-fingerprints also retrieved substantially different molecules as nearest neighbors of folded molecules for which through-space distances between atom pairs are much shorter than topological distances. An interactive browser was assembled for searching through the 23.2 million 3D-structures in the ZINC database according to 3DAPfp and 3DXfp similarity, which is accessible at www.gdb.unibe.ch. Such web-browser for stereoselective LBVS of ZINC should provide useful assistance to drug discovery projects.
ZINC (https://docking.org/) and DUD (http://dud.docking.org/) databases were downloaded in SDF format from respective database websites. The 3D-structures in ZINC are lowest energy conformers (one conformer per molecule) calculated with Omega . Cambridge Structural Database (CSD) was copied from a licensed CD to Dr. Jürg Hauser, University of Bern. All the calculations were performed on 3D structural information available in downloaded SDF files. Counter ions were removed and ionization state of molecules were adjusted to pH 7.4, using an in-house built java program utilizing Java Chemistry library (JChem) from ChemAxon, Ltd., as a starting point. In case of CSD, compounds up to 50 heavy atoms (~110 k) were considered in the presented study. If the compound was available in complex form, only one of the largest fragments was retained.
3D atom pair fingerprints
Computation of 3DAPfp, 3DXfp and all the other fingerprints were carried out using an in-house written java program utilizing various plugins of Java Chemistry library (JChem) from ChemAxon, Ltd., as a starting point.
The 40-bit R3DAPfp was constructed as follows: For each atom pair AB in the molecule, an increment of 1 was added in the bit of the 0.5 Å interval containing the atom pair distance dAB between 0 and 20 Å. The summed bit-values were divided by HAC (heavy atom count), multiplied by 100, and rounded to the integer value. Rounding reduces the size of data for storage and has no significant influence on LBVS results . For the 200-bit R3DXfp atoms were assigned to one of more of the following four categories: hydrophobic (Hyb), Hydrogen Bond Donor (HBD), Hydrogen Bond Acceptor (HBA), planar (sp2), and the R3DAPfp was computed within each of the four same-category pair (Hyb-Hyb, HBA-HBA, HBD-HBD, sp2-sp2) and for the HBA-HBD cross-pairs normalized to HBA.
The 16-bit 3DAPfp was constructed as follows: For each of the atom pair AB in the molecule, a gaussian function was generated centered at the atom pair distance dAB with width of 0.18 × dAB, and the function was sampled at 1.45, 1.71, 2.02, 2.38, 2.81, 3.32, 3.91, 4.62, 5.45, 6.43, 7.59, 8.96, 10.57, 12.47, 14.71 and 17.36 Å (16 bit values at dn+1 = dn × 1.18). For each of the 16 bits, values were summed across all atom pairs, the sum was divided by HAC1.5, multiplied by 100, and rounded to the integer value. For the 80bit 3DXfp the 3DAPfp was similarly computed within each of the atom type categories (see R3DXfp above).
MQN and Sfp
MQN was calculated using the previously reported source code (freely available at www.gdb.unibe.ch) written in Java [7,12]. For the substructure fingerprint Sfp, a daylight type 1024-bit hash fingerprint with path length of 7 was computed using JChem library.
PMIfp and triangular shape plot
USR and USRCAT
Source code for the USR  (Ultra-fast Shape Recognition) fingerprint calculation was obtained from the Chemistry Development Tool Kit (CDK, http://sourceforge.net/projects/cdk/files/cdk/1.4.19/) and used to compute 12 dimensional USR (4*3 moments) shape fingerprint for the molecule.
Computation of USRCAT was facilitated by the python source code obtained from the https://bitbucket.org/aschreyer/usrcat/ website. Five atom pair categories namely: a) All atoms b) Hydrophobic c) Aromatic atoms d) HBA and e) HBD were created in USRCAT. Similar to the USR, moments were generated for each of the five categories which results in the 60 bit (12 × 5) USRCAT fingerprint.
This work was supported financially by the University of Berne, the Swiss National Science Foundation and the NCCR TransCure. We thank OpenEye Scientific Software Pvt. Ltd. for providing free academic licenses for Flipper/Omega/Rocs and ChemAxon Pvt. Ltd. for providing free academic and web licenses for their products.
- Bleicher KH, Bohm HJ, Muller K, Alanine AI. Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discovery. 2003;2:369–78.View ArticleGoogle Scholar
- Renner S, Popov M, Schuffenhauer A, Roth HJ, Breitenstein W, Marzinzik A, et al. Recent trends and observations in the design of high-quality screening collections. Future Med Chem. 2011;3:751–66.View ArticleGoogle Scholar
- Hann MM. Molecular obesity, potency and other addictions in drug discovery. MedChemComm. 2011;2:349–55.View ArticleGoogle Scholar
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–7.View ArticleGoogle Scholar
- Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52:1757–68.View ArticleGoogle Scholar
- Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006;11:1046–53.View ArticleGoogle Scholar
- Nguyen KT, Blum LC, van Deursen R, Reymond J-L. Classification of organic molecules by molecular quantum numbers. ChemMedChem. 2009;4:1803–5.View ArticleGoogle Scholar
- van Deursen R, Blum LC, Reymond JL. A searchable map of PubChem. J Chem Inf Model. 2010;50:1924–34.View ArticleGoogle Scholar
- Awale M, Reymond JL: A multi-fingerprint browser for the ZINC database. Nucleic acids research 2014:doi: 10.1093/nar/gku1379.
- Blum LC, van Deursen R, Reymond JL. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening. J Comput-Aided Mol Des. 2011;25:637–47.View ArticleGoogle Scholar
- Ruddigkeit L, Blum LC, Reymond JL. Visualization and virtual screening of the chemical universe database GDB-17. J Chem Inf Model. 2013;53:56–65.View ArticleGoogle Scholar
- Schwartz J, Awale M, Reymond JL. SMIfp (SMILES fingerprint) chemical space for virtual screening and visualization of large databases of organic molecules. J Chem Inf Model. 2013;53:1979–89.View ArticleGoogle Scholar
- Reymond JL, Blum LC, Van Deursen R. Exploring the chemical space of known and unknown organic small molecules at www.gdb.unibe.ch. Chimia. 2011;65:863–7.View ArticleGoogle Scholar
- Khalifa AA, Haranczyk M, Holliday J. Comparison of nonbinary similarity coefficients for similarity searching, clustering and compound selection. J Chem Inf Model. 2009;49:1193–201.View ArticleGoogle Scholar
- Hagadone TR. Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J Chem Inf Comput Sci. 1992;32:515–21.View ArticleGoogle Scholar
- Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50:742–54.View ArticleGoogle Scholar
- Sauer WH, Schwarz MK. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity. J Chem Inf Comput Sci. 2003;43:987–1003.View ArticleGoogle Scholar
- Rush TS, Grant JA, Mosyak L, Nicholls A. A shape-based 3-D scaffold hopping method and its application to a bacterial protein − protein interaction. J Med Chem. 2005;48:1489–95.View ArticleGoogle Scholar
- Venhorst J, Núñez S, Terpstra JW, Kruse CG. Assessment of scaffold hopping efficiency by use of molecular interaction fingerprints. J Med Chem. 2008;51:3222–9.View ArticleGoogle Scholar
- Kirchmair J, Distinto S, Markt P, Schuster D, Spitzer GM, Liedl KR, et al. How to optimize shape-based virtual screening: choosing the right query and including chemical information. J Chem Inf Model. 2009;49:678–92.View ArticleGoogle Scholar
- Nicholls A, McGaughey GB, Sheridan RP, Good AC, Warren G, Mathieu M, et al. Molecular shape and medicinal chemistry: a perspective. J Med Chem. 2010;53:3862–86.View ArticleGoogle Scholar
- Ebalunode JO, Zheng W. Molecular shape technologies in drug discovery: methods and applications. Curr Top Med Chem. 2010;10:669–79.View ArticleGoogle Scholar
- Perez-Nueno VI, Ritchie DW. Using consensus-shape clustering to identify promiscuous ligands and protein targets and to choose the right query for shape-based virtual screening. J Chem Inf Model. 2011;51:1233–48.View ArticleGoogle Scholar
- Kim S, Bolton EE, Bryant SH. PubChem3D: conformer ensemble accuracy. J Cheminform. 2013;5:1–17.View ArticleGoogle Scholar
- Wirth M, Volkamer A, Zoete V, Rippmann F, Michielin O, Rarey M, et al. Protein pocket and ligand shape comparison and its application in virtual screening. J Comput-Aided Mol Des. 2013;27:511–24.View ArticleGoogle Scholar
- Lovering F, Bikker J, Humblet C. Escape from flatland: increasing saturation as an approach to improving clinical success. J Med Chem. 2009;52:6752–6.View ArticleGoogle Scholar
- Carhart RE, Smith DH, Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci. 1985;25:64–73.View ArticleGoogle Scholar
- Awale M, Reymond JL. Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17. J Chem Inf Model. 2014;54:1892–7.View ArticleGoogle Scholar
- Sheridan RP, Miller MD, Underwood DJ, Kearsley SK. Chemical similarity using geometric atom pair descriptors. J Chem Inf Comput Sci. 1996;36:128–36.View ArticleGoogle Scholar
- Ballester PJ, Richards WG. Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem. 2007;28:1711–23.View ArticleGoogle Scholar
- Schreyer AM, Blundell T. USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints. J Cheminform. 2012;4:27–39.View ArticleGoogle Scholar
- Mavridis L, Hudson BD, Ritchie DW. Toward high throughput 3D virtual screening using spherical harmonic surface representations. J Chem Inf Model. 2007;47:1787–96.View ArticleGoogle Scholar
- Brown RD, Martin YC. The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J Chem Inf Comput Sci. 1997;37:1–9.View ArticleGoogle Scholar
- Randic M. Novel shape descriptors for molecular graphs. J Chem Inf Comput Sci. 2001;41:607–13.View ArticleGoogle Scholar
- Haigh JA, Pickup BT, Grant JA, Nicholls A. Small molecule shape-fingerprints. J Chem Inf Model. 2005;45:673–84.View ArticleGoogle Scholar
- Zhang Q, Muegge I. Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. J Med Chem. 2006;49:1536–48.View ArticleGoogle Scholar
- Firth NC, Brown N, Blagg J. Plane of best fit: a novel method to characterize the three-dimensionality of molecules. J Chem Inf Model. 2012;52:2516–25.View ArticleGoogle Scholar
- Hawkins PC, Skillman AG, Nicholls A. Comparison of shape-matching and docking as virtual screening tools. J Med Chem. 2007;50:74–82.View ArticleGoogle Scholar
- ROCS version 3.0.0. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com.
- Huang N, Shoichet BK, Irwin JJ. Benchmarking sets for molecular docking. J Med Chem. 2006;49:6789–801.View ArticleGoogle Scholar
- Ebalunode JO, Zheng W. Unconventional 2D shape similarity method affords comparable enrichment as a 3D shape method in virtual screening experiments. J Chem Inf Model. 2009;49:1313–20.View ArticleGoogle Scholar
- Hu G, Kuang G, Xiao W, Li W, Liu G, Tang Y. Performance evaluation of 2D fingerprint and 3D shape similarity methods in virtual screening. J Chem Inf Model. 2012;52:1103–013.View ArticleGoogle Scholar
- Kalaszi A, Szisz D, Imre G, Polgar T. Screen3D: a novel fully flexible high-throughput shape-similarity search method. J Chem Inf Model. 2014;54:1036–49.View ArticleGoogle Scholar
- Koutsoukas A, Paricharak S, Galloway WR, Spring DR, Ijzerman AP, Glen RC, et al. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J Chem Inf Model. 2014;54:230–42.View ArticleGoogle Scholar
- Schneider G, Neidhart W, Giller T, Schmid G. “Scaffold-Hopping” by topological pharmacophore search: a contribution to virtual screening. Angew Chem Int Ed Engl. 1999;38:2894–6.View ArticleGoogle Scholar
- Matter H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. J Med Chem. 1997;40:1219–29.View ArticleGoogle Scholar
- Bajorath J. Integration of virtual and high-throughput screening. Nat Rev Drug Discov. 2002;1:882–94.View ArticleGoogle Scholar
- McGaughey GB, Sheridan RP, Bayly CI, Culberson JC, Kreatsoulas C, Lindsley S, et al. Comparison of topological, shape, and docking methods in virtual screening. J Chem Inf Model. 2007;47:1504–19.View ArticleGoogle Scholar
- Hawkins PCD, Nicholls A. Conformer generation with OMEGA: learning from the data set and the analysis of failures. J Chem Inf Model. 2012;52:2919–36.View ArticleGoogle Scholar
- OMEGA version 2.3.2. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com.
- Sadowski J, Gasteiger J. From atoms and bonds to 3-dimensional atomic coordinates - automatic model builders. Chem Rev. 1993;93:2567–81.View ArticleGoogle Scholar
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Delivery Reviews. 1997;23:3–25.View ArticleGoogle Scholar
- Teague SJ, Davis AM, Leeson PD, Oprea T. The design of leadlike combinatorial libraries. Angew Chem Int Ed Engl. 1999;38:3743–8.View ArticleGoogle Scholar
- Congreve M, Carr R, Murray C, Jhoti H. A rule of three for fragment-based lead discovery? Drug Discov Today. 2003;8:876–7.View ArticleGoogle Scholar
- Hopkins AL, Keseru GM, Leeson PD, Rees DC, Reynolds CH. The role of ligand efficiency metrics in drug discovery. Nat Rev Drug Discovery. 2014;13:105–21.View ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.