Efficient ring perception for the Chemistry Development Kit
- John W May^{1}Email author and
- Christoph Steinbeck^{1}
https://doi.org/10.1186/1758-2946-6-3
© May and Steinbeck; licensee Chemistry Central Ltd. 2014
Received: 17 December 2013
Accepted: 27 January 2014
Published: 30 January 2014
Abstract
Background
The Chemistry Development Kit (CDK) is an open source Java library for manipulating and processing chemical information. A key aspect in handling chemical structures is the determination of the chemical rings. The rings of a structure are used areas including descriptors, stereochemistry, similarity, screening and atom typing. The CDK includes multiple algorithms for determining the rings of a structure on demand. Non-unique descriptions of rings were often used due to the slower performance of the unique alternatives.
Results
Efficient algorithms for handling chemical ring perception have been implemented and optimised in the CDK. The algorithms provide much faster computation of new and existing types of rings. Several optimisation and implementation considerations are discussed which improve real case usage. The performance is measured on several publicly available data sets and in several cases the new implementations were found to be more than an order of magnitude faster.
Conclusions
Algorithmic improvements allow handling of much larger datasets in reasonable time. Faster computation allows more appropriate rings to be utilised in procedures such as aromaticity. Several areas that require ring perception have also seen a noticeable improvement. The time taken to compute the unique rings is now comparable allowing a correct usage throughout the toolkit. All source code is open source and freely available.
Keywords
Background
The Chemistry Development Kit (CDK) [1, 2] is an open source Java library for manipulating chemical information. A key aspect of manipulating and querying chemical information is the ability to define and reason about attributes of chemical structures. Describing the rings in a structure is fundamental and a prerequisite of other attributes.
There is often a disconnect between how chemical rings are numbered and what is useful for computation. Conflicting definitions of rings contribute towards discrepancies between chemistry toolkits such as assigning aromaticity. The CDK does not provide a single strict definition of what rings are present in a structure. The ring information is considered auxiliary with different algorithms utilised for a specific use-case. Some considerations of the differences will be touched upon but a thorough review is provided by [3, 4] and [5].
There are several key properties we wish to know: is an atom or bond in a ring, what size is the ring and what are the other atoms and bonds in the ring? This information can be stored as an attribute of each atom or bond, as a collection of rings on the structure or computed on demand. With the provision of multiple algorithms it is undesirable to store all the information but invariant properties including membership and smallest ring size could be stored as an attribute of an atom or bond.
The ring properties can be used in many procedures throughout the library. In similarity searching and screening the creation of chemical fingerprints [6] may include ring size or membership to reduce the number of false positives. When matching atoms and bonds between structures the ring properties can be used in early elimination of infeasible matches or to disfavour ring opening and closing. Ring properties are also utilised in structure patterns (SMARTS [7]) where ring membership, size and number of rings can be queried.
It is essential that different structure resonance forms are treated as equivalent, one approach is to treat bonds in aromatic ring systems as delocalised. Conversely a delocalised structure may have been provided without specified bond orders. The ring properties can be used to localise and delocalise the bonds between aromatic and Kekulé representations.
Geometric isomers (double-bond stereochemistry) should not be encoded when the bond is involved in a rigid ring. Rigidity is approximated by only allowing stereoconfigurations in rings with more than seven atoms. Groups of interdependent stereocenters can be identified by recursively checking the rings in a structure [8].
Improving the core ring perception algorithms can influence many areas and it is important that efficient algorithms are used.
Graph theory preliminaries
Although more comprehensive and accurate methods exist, chemical structures can be represented and efficiently modelled as graphs [9]. The algorithms used for ring perception are not specific to chemical structures and require several formal definitions. The basic concepts for these are briefly introduced here. A graph is composed of a set of vertices V and a set of edges E. Each vertex or edge may be labelled with a value. Two vertices are adjacent if an edge exists which contains the two vertices. The vertices of an edge are known as the endpoints, each endpoint is said to be incident to the edge. A degree of a vertex is the number of incident edges. If the endpoints are unordered, an edge is said to be undirected. Simple graphs have no edges connecting the same vertex (loops) and no edges which share the same endpoints (multiedges). We model a chemical structure as simple undirected labelled graph where the atoms and bonds are labels on the vertices and edges. Although the edges have a numeric value (bond order) they are not treated as weighted.
A walk is a sequence of vertices and edges connecting two vertices. If the start and end of the walk are the same, the walk is closed. Otherwise the walk is open. A walk is simple if it contains no repeated edges and elementary if there are no repeated vertices. A simple walk that is also open it is referred to as a path. Two vertices are connected if there is a path between them. A graph is connected if each vertex can be reached from every other vertex. A connected component (ConnComp(G)) in an undirected graph is a subgraph in which every vertex is connected.
A cycle is a closed walk. Graphs containing a cycle are said to be cyclic or acyclic if no cycle is present. Acyclic simple graphs are referred to as a tree. A ring in a chemical structure is best described as an elementary cycle. The cycle has no repeating vertices or edges and each vertex has a degree of 2 (in the cycle). This definition includes envelope rings of structures like napthalene and azulene. As we are primarily concerned with chemical structures herein we use the term cycle to refer to elementary cycle.
A cycle basis is a set of cycles which can be used to generate all other cycles (cycle space) of the graph. Representing a cycle as a set of edges, a new cycle can be generated using the symmetric difference (XOR, ⊕-summing) of the edge sets of two cycles whose edge sets intersect. A minimum cycle basis is a cycle basis of minimum weight, in an unweighted graph the weight is simply the number of edges. When there is more than one basis with the same weight the choice between them is arbitrary as either can be used to generate the cycle space.
Cycle membership
The first step in cycle processing for a chemical structure is to efficiently determine which vertices and edges of the graph belong to a cycle. In PubChem-Compound [10] (Aug 2013) 97.3% of structures (47,745,887) contained a cycle. Although the proportion of structures containing a cycle is high only 59.3% of the heavy atoms and 57.3% of bonds were cyclic. Eliminating these acyclic vertices and edges from further processing reduces the size of the computation.
The SpanningTree was introduced in the CDK to eliminate acyclic vertices and edges, reducing the runtime of existing algorithms [11]. A graph H is a subgraph of a graph G if the vertices V and edges E of H are a subset of G. A subgraph G is said to be a spanning subgraph of H if every vertex of H is present in G. The edges in chemical structures are unweighted and so the minimum spanning tree is a tree with the smallest number of edges. Given an input structure a spanning tree is created which contains a subset of the edges that span the vertices but contains no cycles. The SpanningTree class uses a greedy algorithm [12] to sequentially build up this tree. Cyclic vertices and edges are determined by finding a path in the tree between the two endpoints of an edge which was not included. Any edge that is not in the spanning tree is cyclic and any path in the tree which connects the two endpoints contains vertices and edges that are also cyclic. The number of paths to find depends on the number of edges not included in the spanning tree. Structures containing a large number of rings will have more edges removed and more paths to find. Discovery of a path in the tree is implemented as depth-first-search and the entire tree may be traversed for each removed edge.
Cycle sets
In addition to determining if a vertex or edge is cyclic, one would also like to know the sizes of cycles and the walks. There is an exponential number of elementary cycles in a graph and smaller subsets of this have subsequently been defined and used in various aspects of chemical information processing.
Smallest set of smallest rings/minimum cycle basis
Chemical structure sets used to measure performance
In general the CDK library has been relying less on MCB as it has little use beyond counting the number of rings and generating the cycle space. Both of these tasks can be achieved more efficiently with other procedures. The implementations provided in the CDK are primarily for reference and their use in computing other uniquely defined cycle sets.
Essential and relevant cycles
The essential and relevant cycles are a uniquely defined set of cycles. The MCB is non-unique when there are multiple minimum cycle bases and an arbitrary choice of a single basis can generate the cycle space. The essential cycles is the intersect of these minimum cycle bases whilst the relevant cycles is the union. When a graph has a single unique MCB it is equal to both the essential and relevant cycles. As a subset of the MCB the essential cycles do not form a basis and cannot be used to generate the cycle space. Like the MCB the essential cycles are always polynomial in number. Counter-intuitively, structures such as barrelene (Figure 1) contain no essential cycles. The relevant cycles do form a basis but may be exponential in number.
The uniqueness of these cycle sets make them desirable for describing chemical entities. The essential cycles have been utilised in the CDK for similarity searching techniques including generation of fingerprints and for the structure query patterns. Unfortunately the computation of the unique essential and relevant cycles (using the SSSRFinder) takes much longer than the non-unique MCB. The increased computation runtime has generally meant the MCB has been favoured.
All elementary cycles
Implementations
Graph representations
The choice of data structure depends on properties being modelled, and which algorithms will be used. Chemical structures are generally small (|V|<100) and each vertex is only adjacent to a few other vertices (sparse). Although more costly in memory and for modifications the attributes of chemical structures make the adjacency (or incidence) list representation preferable.
Average ( n = 15) time taken to convert CDK structure representations to adjacency and incidence list data structures
Chemical structure | n structures | Adjacency list | Incidence list | ||
---|---|---|---|---|---|
t (ms) | sdev | t (ms) | sdev | ||
chebi_108 | 26,790 | 167 | 14 | 238 | 15 |
nci_aug00 | 250,172 | 998 | 49 | 1,347 | 53 |
zinc_frag | 504,074 | 1,466 | 13 | 2,029 | 29 |
chembl_17 | 1,318,180 | 8,308 | 33 | 11,977 | 246 |
zinc_leads | 5,135,179 | 22,537 | 582 | 33,567 | 2368 |
Results and discussion
Here we describe the optimisations and measure the performance on several chemical datasets (Table 1). All measurements were performed on a 2.66 GHz Intel Core i7 processor using Java version 1.7.0_21. The unprocessed benchmark results are provided as Additional files 1, 2 and 3.
Cycle membership
The existing algorithm used in SpanningTree was for graphs with weighted edges [12]. In an unweighted graph any spanning tree is the minimum spanning tree. A spanning tree in an undirected, unweighted graph can be constructed with a depth- or breath-first-search [23]. Although efficient in construction, the spanning tree still requires additional operations to determined the cyclic vertices and edge. A more efficient approach is to compute the biconnected (2-connected) components of the graph. The biconnected components can be found using a single depth-first search [24]. A vertex is biconnected if removing it from the graph does not increase the number components. A biconnected component is a maximal connected subgraph where every vertex is biconnected. In addition to detecting the cyclic vertices and edges the procedure also partitions the graph in to separate components which correspond to a separate ring systems in the chemical structure. If the number of edges is equal to the number of vertices, |E|=|V|, then the circuit rank is 1 and the component is an elementary cycle. Such components correspond to the isolated and spiro ring systems in a chemical structure whilst the other biconnected components are the fused and bridged ring systems. The simple elementary cycles need no further processing and can be skipped from the more computationally intensive algorithms.
The biconnected components were already used internally for other cycle computations (SSSRFinder). A new RingSearch utility was written with an algorithm optimised for small graphs using binary sets. The implementation provides logical testing of cycle membership for vertices and edges as well as partitioning the components and creating fragments of the input structure.
Average and median ( n = 15) time taken to determine ring membership
Chemical structure | SpanningTree | RingSearch | ||||
---|---|---|---|---|---|---|
mean | median | sdev | mean | median | sdev | |
t (ms) | t (ms) | t (ms) | t (ms) | |||
chebi_108 | 2,969 | 2,826 | 246 | 236 | 212 | 90 |
nci_aug00 | 8,440 | 8,451 | 59 | 1,396 | 1,372 | 88 |
zinc_frag | 5,338 | 5,353 | 77 | 1,833 | 1,818 | 60 |
chembl_17 | 122,357 | 122,493 | 616 | 10,325 | 10,303 | 98 |
zinc_leads | 114,710 | 115,067 | 837 | 27,496 | 27,502 | 215 |
Average ( n = 50) time taken to determine ring membership in several complex structures from ChEBI and FULLERENE [[25]]
Chemical structure | SpanningTree | RingSearch | ||
---|---|---|---|---|
t (ms) | sdev | t (ms) | sdev | |
cubane (CHEBI:33014) | 1.18 | 1.24 | 0.21 | 0.66 |
dodecaboride (CHEBI:51706) | 1.11 | 0.80 | 0.05 | 0.01 |
octacontaboron (CHEBI:50252) | 100.18 | 52.87 | 0.44 | 0.18 |
C _{60} fullerene (CHEBI:33128) | 11.15 | 2.92 | 0.30 | 0.03 |
C _{70} fullerene (CHEBI:33195) | 10.44 | 4.25 | 0.88 | 0.23 |
C _{320} fullerene (FULLERENE) | 423.06 | 166.30 | 0.61 | 0.32 |
C _{720} fullerene (FULLERENE) | 3100.71 | 1171.62 | 1.65 | 0.66 |
Minimum cycle basis
Average and median ( n = 15) time taken to compute the minimum cycle basis (MCB) using the existing and improved implementations
Chemical structure | MCB (old) | MCB (new) | ||||
---|---|---|---|---|---|---|
mean | median | sdev | mean | median | sdev | |
t (ms) | t (ms) | t (ms) | t (ms) | |||
chebi_108 | 4,200 | 4,020 | 695 | 396 | 353 | 160 |
nci_aug00 | 34,762 | 34,330 | 1,685 | 2,193 | 2,125 | 211 |
zinc_frag | 47,752 | 47,844 | 1,982 | 2,376 | 2,330 | 153 |
chembl_17 | 245,620 | 245,592 | 990 | 15,341 | 15,257 | 311 |
zinc_leads | - | - | - | - | - | - |
The cycle basis is formed by incrementally adding candidate cycles of increasing size. In this case the candidates are the initial set of cycles [26]. A candidate is added to the basis if it is linearly independent from the current members of basis [27]. This check for linear independence is expensive^{b} and can be avoided under some conditions. With the union of all edges in the basis (E_{ B }) a new cycle is linearly independent if any edges of the candidate (E_{ cand }) are not present in basis. That is, when |E_{ B }∩E_{ cand }|<|E_{ cand }|, the cycle must independent. Additionally we know the basis is complete when the number of cycles is equal to the circuit rank. As the biconnected components are processed separately, the circuit rank of the component is |E|−|V|+1.
Essential and relevant cycles
Average and median ( n = 15) time taken to compute the relevant cycles using the existing and improved implementations
Chemical structure | Relevant cycles (old) | Relevant cycles (new) | ||||
---|---|---|---|---|---|---|
mean | median | sdev | mean | median | sdev | |
t (ms) | t (ms) | t (ms) | t (ms) | |||
chebi_108 | 17,013 | 16,897 | 576 | 445 | 388 | 171 |
nci_aug00 | 149,210 | 148,231 | 12,190 | 2,250 | 2,187 | 195 |
zinc_frag | 183,587 | 184,720 | 18,219 | 2,610 | 2,519 | 237 |
chembl_17 | 972,493 | 972,605 | 1,197 | 16,003 | 15,991 | 201 |
zinc_leads | - | - | - | - | - | - |
Average and median ( n=15 ) time taken to compute the essential cycles using the existing and improved implementations
Chemical structure | Essential cycles (old) | Essential cycles (new) | ||||
---|---|---|---|---|---|---|
mean | median | sdev | mean | median | sdev | |
t (ms) | t (ms) | t (ms) | t (ms) | |||
chebi_108 | 16,561 | 16,395 | 615 | 572 | 424 | 451 |
nci_aug00 | 128,963 | 128,663 | 2,325 | 2,536 | 2,459 | 362 |
zinc_frag | 217,312 | 217,016 | 940 | 3,662 | 3,574 | 336 |
chembl_17 | 954,954 | 952,437 | 20,698 | 17,235 | 17,171 | 293 |
zinc_leads | - | - | - | - | - | - |
The number of cycles in each set
Chemical | n structures | MCB | Essential | Relevant | All |
---|---|---|---|---|---|
structure | |||||
chebi_108 | 26,790 | 56,572 | 55,687 | 57,401 | ∼126,713 |
nci_aug00 | 250,172 | 599,876 | 591,144 | 606,045 | ∼1,007,643 |
zinc_frag | 504,074 | 880,296 | 875,801 | 882,393 | ∼1,022,498 |
chembl_17 | 1,318,180 | 4,505,285 | 4,455,907 | 4,563,027 | ∼6,599,942 |
zinc_leads | 5,135,179 | - | - | - | ∼14,816,752 |
All elementary cycles
The data structures of AllRingsFinder were optimised and the timeout replaced with a threshold specific to the algorithm [20]. The improvements to the data structures involved representing the path-graph as an incidence-list and using binary sets to test intersection. The algorithm progresses by iteratively reducing (removing) vertices – the order of removal can be predetermined or dynamic. Using a predetermined order the edges need only be indexed by the next endpoint (i.e. directed). This reduces the number of modifications to the path-graph. Edges are only removed when a vertex is being reduced and all edges can be removed at once from this vertex. As each vertex is reduced the degree on the adjacent vertices may increased. Limiting the maximum degree the algorithm is allowed to reach provides a better threshold to determine feasibility.
Ring systems (PubChem-Compound) that were feasibly handled by the improved AllRingsFinder at different thresholds
Percentile | Threshold (degree) | Feasible | Infeasible |
---|---|---|---|
ring systems | ring systems | ||
99.95 | 72 | 17,834,013 | 8,835 |
99.96 | 84 | 17,835,876 | 6,972 |
99.97 | 126 | 17,837,692 | 5,156 |
99.98 | 216 | 17,839,293 | 3,555 |
99.99 | 684 (default) | 17,841,065 | 1,783 |
99.991 | 882 | 17,841,342 | 1,506 |
99.992 | 1,062 | 17,841,429 | 1,419 |
99.993 | 1,440 | 17,841,602 | 1,246 |
99.994 | 3,072 | 17,841,789 | 1,059 |
99.9946 | 5,000 (max tested) | 17,841,861 | 987 |
Average ( n = 15) number of structures considered infeasible by the old and new implementations
Chemical | n structures | n fail (old) | n fail (new) |
---|---|---|---|
structure | |||
chebi_108 | 26,790 | 108-117 | 41 |
nci_aug00 | 250,172 | 306-311 | 37 |
zinc_frag | 504,074 | 0 | 0 |
chembl_17 | 1,318,180 | 2528-2547 | 232 |
zinc_leads | 5,135,179 | 0 | 0 |
Average ( n = 15) number of all cycles found in each datasets
Chemical structure | n structures | Old | New | ||
---|---|---|---|---|---|
Cycles | sdev | Cycles | sdev | ||
chebi_108 | 26,790 | 98,597 | 199 | 126,713 | 0 |
nci_aug00 | 250,172 | 936,625 | 409 | 1,007,643 | 0 |
zinc_frag | 504,074 | 1,022,498 | 0 | 1,022,498 | 0 |
chembl_17 | 1,318,180 | 6,176,585 | 378 | 6,599,942 | 0 |
zinc_leads | 5,135,179 | 14,816,752 | 0 | 14,816,752 | 0 |
Average and median ( n = 15) time taken to find all rings using existing and improved implementations of AllRingsFinder
Chemical structure | Old | New | ||||
---|---|---|---|---|---|---|
t (ms) | median | sdev | t (ms) | median | sdev | |
t (ms) | t (ms) | |||||
chebi_108 | 16,293 | 16,179 | 540 | 599 | 574 | 105 |
nci_aug00 | 80,417 | 80,339 | 319 | 3,478 | 3,449 | 143 |
zinc_frag | 41,854 | 41,747 | 458 | 3,786 | 3,778 | 60 |
chembl_17 | 568,984 | 568,809 | 829 | 25,200 | 25,181 | 99 |
zinc_leads | 661,028 | 661,368 | 1133 | 54,490 | 54,471 | 101 |
Additional cycles sets
The shortest cycle through each vertex and edge is also provided as a unique but potentially exponential cycle set. The edge-short cycles has also been termed the Largest Set of Smallest Rings (LSSR) and is utilised within Open Babel [28]. Computation of the sets does not check if the cycles form a basis. This could improve performance but no noticeable change was observed in measurements. The implementations are provided for compatibility.
A TripletCycles utility was also implemented to improve generation of CACTVS [29] Substructure Keys (PubChemFingerprint). These cycles are the shortest through a vertex triple {u,v,w} and allows generation of cycles for envelope rings such as naphthalene or azulene whilst avoiding larger fused rings. The implementation allows a unique or non-unique set to be generated.
Conclusion
The improved performance in cycle perception means it is now feasible to analyse much larger chemical data sets. This is particularly true of the unique short cycle sets (essential and relevant) which saw an order of magnitude improvement. It is now no longer favourable to utilise the non-unique MCB due to runtime performance. Any procedures incorrectly relying on the MCB to be unique can be easily adapted to use the new algorithms. The efficient implementation of the relevant cycles could also be adapted to compute a recent descriptor known as Unique Ring Families [30].
Improvements were seen throughout the toolkit with cycle perception being required for core functionality. The new algorithm for cycle membership has been used to improve performance of atom typing and the set of all cycles utilised in aromaticity perception. To avoid a performance hit from the old implementation the aromaticity of non-shortest cycles was only perceived for small fused rings systems. The new aromaticity has no restrictions and attempts to perceive aromaticity on all cycles. If computation is not feasible the aromaticity perception falls back to a smaller more feasible cycle set. Alternatively the smaller set of cycles could be tested first with the larger set only utilised if potentially aromatic atoms were remaining. Using the optimised representations the set of all cycles is generally faster to compute than the smaller sets and it is preferable to try all and fail fast.
A large portion of the time is spent in converting the CDK objects to optimised representations. Despite this without the conversion the runtime performance is much slower. Further gains could be made by optimising the native data structures and removing the need for conversion. The changes required would be large but could be introduced in future releases.
Availability and requirements
Project Name: The Chemistry Development KitProject Home Page:http://sourceforge.net/projects/cdk/ (version CDK (development)) or http://github.com/cdk/cdk (version 1.5.4 onwards)Operating System: Platform IndependentProgramming Language: JavaRequirements: Java 1.6+License: Lesser General Public License 2.1
Endnotes
^{a} alternatively known as cyclomatic number, nullity (μ), frère jacque number, first Betti’s number or bond closures [5].
^{b} linear independence is check with row reduction of a matrix (Gaussian elimination).
Declarations
Acknowledgements
This work was supported by Biotechnology and Biological Sciences Research Council CASE studentship [BB/I532153/1]. The authors would like to thank the following persons for their valuable discussions (names are alphabetically ordered): Felicity Allen, Stephan Beisken, Gordon James, Luis de Figueiredo, Pablo Moreno and Kalai Vannai. Additionally we would like to specifically thank Egon Willighagen for his thorough review of CDK additions.
Authors’ Affiliations
References
- Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci. 2003, 43 (2): 493-500. 10.1021/ci025584y.View ArticleGoogle Scholar
- Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E: Recent developments of the Chemistry Development Kit (CDK) - an open-source java library for chemo- and bioinformatics. CPD. 2006, 12: 2111-2120. 10.2174/138161206777585274.View ArticleGoogle Scholar
- Berger F, Flamm C, Gleiss PM, Leydold J, Stadler PF: Counterexamples in chemical ring perception. J Chem Inf Comput Sci. 2004, 44 (2): 323-331. 10.1021/ci030405d.View ArticleGoogle Scholar
- Downs G, Gillet V, Holliday J, Lynch M: Review of ring perception algorithms for chemical graphs. J Chem Inf Comput Sci. 1989, 29 (3): 172-187. 10.1021/ci00063a007.View ArticleGoogle Scholar
- Downs G: Ring Perception, Handbook of Cheminformatics, vol 1. 2003, Weinheim: Wiley-VCHGoogle Scholar
- Willett P: Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today. 2006, 11 (23): 1046-1053.View ArticleGoogle Scholar
- Daylight: SMARTS - A Language for describing molecular patterns. http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html,
- Razinger M, Balasubramanian K, Perdih M, Munk M: Stereoisomer generation in computer-enhanced structure elucidation. J Chem Inf Comput Sci. 1993, 33 (6): 812-825. 10.1021/ci00016a003.View ArticleGoogle Scholar
- Balaban AT: Applications of graph theory in chemistry. J Chem Inf Comput Sci. 1985, 25: 334-343. 10.1021/ci00047a033.View ArticleGoogle Scholar
- Bolton E, Wang Y, Thiessen P, Bryant S: PubChem: Integrated platform of small molecules and biological activities. Chapter 12 in Annual Reports in Computational Chemistry, Volume 4. 2008, Washington: American Chemical Society,Google Scholar
- Nikolova-Jeliazkova N: Slow fingerprints?. CDK News. 2005, 2 (2): 34-40.Google Scholar
- Kruskal JB: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc. 1956, 7 (1): 48-50. 10.1090/S0002-9939-1956-0078686-7.View ArticleGoogle Scholar
- Figueras J: Ring perception using breadth-first search. J Chem Inf Comput Sci. 1996, 36: 986-991. 10.1021/ci960013p.View ArticleGoogle Scholar
- RDKit: Cheminformatics and machine learning software 2014. http://www.rdkit.org,
- Hastings J, Matos PD, Dekker A, Ennis M, Harsha B, Kale N, Muthukrishnan V, Owen G, Turner S, Williams M, Steinbeck C: The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2013, 41 (D1): 456-463. 10.1093/nar/gks1146.View ArticleGoogle Scholar
- National Cancer Institute (NCI) Database Download. http://cactus.nci.nih.gov/download/nci/,
- Irwin J, Sterling T, Mysinger M, Bolstad E, Coleman R: ZINC: A free tool to discover chemistry for biology. J Chem Inf Model. 2012, 52 (7): 1757-1768. 10.1021/ci3001277.View ArticleGoogle Scholar
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, Mcglinchey S, Michalovich D, Al-Lazikani B, Overington JP: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40 (D1): 1100-1107. 10.1093/nar/gkr777.View ArticleGoogle Scholar
- Weininger D: SMILES, A chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988, 28 (1): 31-36. 10.1021/ci00057a005.View ArticleGoogle Scholar
- Hanser T, Jauffret P, Kaufmann G: A new algorithm for exhaustive ring perception in a molecular graph. J Chem Inf Comput Sci. 1996, 36 (6): 1146-1152. 10.1021/ci960322f.View ArticleGoogle Scholar
- Rahman S, Bashton M, Holliday GL, Schrader R, Thornton JM: Small Molecule Subgraph Detector (SMSD) toolkit. J Cheminformatics. 2009, 1 (1): 12-10.1186/1758-2946-1-12.View ArticleGoogle Scholar
- Sedgewick R, Wayne K: Graphs, Algorithms, 4th edn. 2011, Boston: Addison WesleyGoogle Scholar
- Skiena SS: Minimum spanning tree, Graphs, The Algorithm Design Manual. 1997, New York: SpringerGoogle Scholar
- Hopcroft J, Tarjan R: Efficient algorithms for graph manipulation. Commun ACM. 1973, 16 (6): 372-378. 10.1145/362248.362272.View ArticleGoogle Scholar
- Schwerdtfeger P, Wirz L, Avery J: Program FULLERENE: a software package for constructing and analyzing structures of regular fullerenes. J Comput Chem. 2013, 34 (17): 1508-1526. 10.1002/jcc.23278.View ArticleGoogle Scholar
- Vismara P: Union of all the minimum cycle bases of a graph. Electr J Comb. 1997, 4: 73-879.Google Scholar
- Horton JD: A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM J Comput. 1987, 16 (2): 358-366. 10.1137/0216026.View ArticleGoogle Scholar
- O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: an open chemical toolbox. J Cheminformatics. 2001, 3 (1): 33-View ArticleGoogle Scholar
- Ihlenfeldt W, Takahashi Y, Abe H, Sasaki S: Computation and management of chemical properties in CACTVS: an extensible networked approach toward modularity and compatibility. J Chem Inf Comput Sci. 1994, 34 (1): 109-116. 10.1021/ci00017a013.View ArticleGoogle Scholar
- Kolodzik A, Urbaczek S, Rarey M: Unique ring families: a chemically meaningful description of molecular ring topologies. J Chem Inf Model. 2012, 52 (8): 2013-2021. 10.1021/ci200629w.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.