- Preliminary communication
- Open Access
Statistical filtering for NMR based structure generation
© Junker; licensee Chemistry Central Ltd. 2011
Received: 19 April 2011
Accepted: 11 August 2011
Published: 11 August 2011
The constitutional assignment of natural products by NMR spectroscopy is usually based on 2D NMR experiments like COSY, HSQC, and HMBC. The difficulty of a structure elucidation problem depends more on the type of the investigated molecule than on its size. Saturated compounds can usually be assigned unambiguously by hand using only COSY and 13C-HMBC data, whereas condensed heterocycles are problematic due to their lack of protons that could show interatomic connectivities. Different computer programs were developed to aid in the structural assignment process, one of them COCON. In the case of unsaturated and substituted molecules structure generators frequently will generate a very large number of possible solutions. This article presents a "statistical filter" for the reduction of the number of results. The filter works by generating 3D conformations using smi23d, a simple MD approach. All molecules for which the generation of constitutional restraints failed were eliminated from the result set. Some structural elements removed by the statistical filter were analyzed and checked against Beilstein. The automatic removal of molecules for which no MD parameter set could be created was included into WEBCOCON. The effect of this filter varies in dependence of the NMR data set used, but in no case the correct constitution was removed from the resulting set.
Nuclear Magnetic Resonance is the most common tool used for the structure elucidation of new compounds. The used 2D NMR experiments like COSY, HSQC, and 13C-HMBC deliver correlation information between atoms that can be translated into connectivity information. Out of these, correlation information from COSY and HSQC experiments can be transcribed directly into connectivity between atoms. But the 13C-HMBC correlations need more attention because of their ambiguity and complexity. Hence the difficulty of the structure elucidation problem depends more on the type of the investigated molecule than on its size. Saturated compounds can usually be assigned unambiguously using mainly COSY and some 13C-HMBC data, whereas condensed heterocycles are problematic due to their lack of protons that could show interatomic connectivities. This ambiguity has driven the development of different software packages to aid in the interpretation of the 13C-HMBC correlation data [1–19] as much as the development of additional correlation experiments [20, 21].
When the observed connectivity information is used as input for the structure generation program COCON[3, 22–24] it will create all compatible constitutional assignments. In the case of unsaturated molecules COCON will usually generate a very large number of possible solutions. Since the solutions will then have to be checked manually for their chemical feasibility and sense, Different efforts have been made to reduce the number solutions. Among others, ranking of the constitutional assignments by chemical shift deviation and/or substructural elements have been tested [25, 26] integrated to COCON runs. Unfortunately, the described software could not be made available for the online version of COCON (WEBCOCON at http://cocon.nmr.de), since it uses data protected by Intellectual Property. A different way of handling the result set had to be chosen, and the statistical filter was implemented.
The idea behind the filter is, to compare the suggested constitutions against existing molecules, like the ones contained in the PubChem (PubChem can be found at http://pubchem.ncbi.nlm.nih.gov/) database. For each COCON-suggested constitution all 1 sphere elements of the constitutions are checked for corresponding elements in PubChem. This comparison is done indirectly, by generating molecular dynamics parameters in smi23d. The software smi23d (smi23d is available under the Apache 2.0 license and can be downloaded from http://www.chembiogrid.org/cheminfo/smi23d/) has been used to generate 3D coordinates for almost 13M compounds contained in PubChem (The corresponding 3D coordinates generated by smi23d can be found at http://www.chembiogrid.org/cheminfo/p3d/; the error observed is ~ 0.4% (= 53.000) false negatives for 13M compounds) and succeeded on generating coordinates for 99.6% of the molecules contained in the Database. The filtering application actually uses smi23d to generate 3D coordinates for all constitutional assignments generated by COCON and eliminates those for which smi23d fails because of lacking parameters. Since smi23d has successfully been used on so many well known compounds, this means that the structural element for which parameters were missing has hardly ever been observed and therefore might not exist in natural products. Due to the nature of the filter, no ranking of the remaining constitutions is carried out and further methods might be necessary to improve the results. All calculations were run on the publicly available WEBCOCON server, using the input files provided there as examples. Calculation times varied from several minutes to two hours for 1 and 2. For 3 the longest running time was 3 days for the generation of the 523.668 constitutional assignments using COSY, 13C-HMBC correlations and open atom types. A webpage allowing direct access to the results of the structure generator runs presented here has been set up on the WEBCOCON server http://cocon.nmr.de/StatisticalFilter/ (The results are also mirrored at http://science.jotjot.net/StatisticalFilter/).
Ascomycin is a well known ethyl derivative of Tacrolimus, it serves as example of a large natural product, featuring 43 Carbon atoms. Using experimental NMR correlation data (COSY and 13C-HMBC correlations) together with fixed atom types, COCON generates only one solution, independent of the statistical filter. Additionally the filter showed no effect on the number of constitutional assignments generated, when no atom types were defined, in which case a total of 100 different constitutions were proposed.
Number of constitutional assignments suggested for 1 and 2.
open atom types
fixed atom types
Number of constitutional assignments suggested for 3 depending on the type of correlation information used.
open atom types
fixed atom types
The WEBCOCON server is freely accessible via http://cocon.nmr.de.
The authors wishes to thank Rainer Haessner and the Technische Universität München for providing the Hardware for the WEBCOCON Server.
- Elyashberg M, Williams A, Martin G: Computer-assisted structure verification and elucidation tools in NMR-based structure elucidation. Prog Nucl Mag Res Sp. 2008, 53 (1-2): 1-104. 10.1016/j.pnmrs.2007.04.003.View ArticleGoogle Scholar
- Peng C, Bodenhausen G, Qiu S, Fong H, Farnsworth N, Yuan S, Zheng C: Computer-assisted structure elucidation: Application of CISOC-SES to the resonance assignment and structure generation of betulinic acid. Magn Reson Chem. 1998, 36 (4): 267-278. 10.1002/(SICI)1097-458X(199804)36:4<267::AID-OMR256>3.0.CO;2-6.View ArticleGoogle Scholar
- Lindel T, Junker J, Kock M: COCON: From NMR correlation data to molecular constitutions. J Mol Model. 1997, 3: 364-368. 10.1007/s008940050052.View ArticleGoogle Scholar
- Stefani R, Nascimento P, Costa F: Computer-aided structure elucidation of organic compounds: Recent advances. Quim Nova. 2007, 30 (5): 1347-1356. 10.1590/S0100-40422007000500048.View ArticleGoogle Scholar
- Elyashberg M, Blinov K, Molodtsov S, Williams A, Martin G: Fuzzy structure generation: A new efficient tool for computer-aided structure elucidation (CASE). J Chem Inf Model. 2007, 47 (3): 1053-1066. 10.1021/ci600528g.View ArticleGoogle Scholar
- Smurnyy Y, Elyashberg M, Blinov K, Lefebvre B, Martin G, Williams A: Computer-aided determination of relative stereochemistry and 3D models of complex organic molecules from 2D NMR spectra. Tetrahedron. 2005, 61 (42): 9980-9989. 10.1016/j.tet.2005.08.022.View ArticleGoogle Scholar
- Sharman G, Jones I, Parnell M, Willis M, Mahon M, Carlson D, Williams A, Elyashberg M, Blinov K, Molodtsov S: Automated structure elucidation of two unexpected products in a reaction of an alpha,beta-unsaturated pyruvate. Magn Reson Chem. 2004, 42 (7): 567-572. 10.1002/mrc.1396.View ArticleGoogle Scholar
- Steinbeck C: Recent developments in automated structure elucidation of natural products. Nat Prod Rep. 2004, 21 (4): 512-518. 10.1039/b400678j.View ArticleGoogle Scholar
- Schulz K, Korytko A, Munk M: Applications of a HOUDINI-based structure elucidation system. J Chem Inf Comp Sci. 2003, 43 (5): 1447-1456.View ArticleGoogle Scholar
- Steinbeck C: SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry. J Chem Inf Comp Sci. 2001, 41 (6): 1500-1507.View ArticleGoogle Scholar
- Steinbeck C: Recent advancements in the development of SENECA, a computer program for Computer Assisted Structure Elucidation based on a stochastic algorithm. Abstr Pap Am Chem S. 1999, 218: U360-U360.Google Scholar
- Strokov I, Lebedev K: Computer aided method for chemical structure elucidation using spectral databases and C-13 NMR correlation tables. J Chem Inf Comp Sci. 1999, 39 (4): 659-665.View ArticleGoogle Scholar
- Madison M, Schulz K, Korytko A, Munk M: SESAMI: An integrated desktop structure elucidation tool. Internet J Chem. 1998, 1 (34): CP1-U22.Google Scholar
- Steinbeck C: LUCY - A program for structure elucidation from NMR correlation experiments. Angew Chem Int Edit. 1996, 35 (17): 1984-1986. 10.1002/anie.199619841.View ArticleGoogle Scholar
- Bangov I, Laude I, Cabrolbass D: Combinatorial Problems in the Treatment of fuzzy C-13 NMR Spectral Information in the Process of Computer-Aided Structure Elucidation - Estimation of the Carbon-Atom Hybridization and Alpha-Environment States. Anal Chim Acta. 1994, 298: 33-52. 10.1016/0003-2670(94)90041-8.View ArticleGoogle Scholar
- Funatsu K: Computer-Assisted Structure Elucidation for Organic-Compound. J Syn Org Chem Jpn. 1993, 51 (6): 516-528. 10.5059/yukigoseikyokaishi.51.516.View ArticleGoogle Scholar
- Lebedev K, Nekhoroshev S, Kirshansky S, Derendjaev B: Computer Method of Fragmentary Formula Prediction of an unknown by its Mass and NMR-Spectra. Sibirskii Khim Zh+. 1992, 72-79. 3Google Scholar
- Guzowskaswider B, Hippe Z: Structure Elucidation of organic-compounds aided by the Computer-Program System Scannet. J Mol Struct. 1992, 275: 225-234.View ArticleGoogle Scholar
- Nuzillard J, Massiot G: Computer-Aided Spectral Assignment in NMR Spectroscopy. Anal Chim Acta. 1991, 242: 37-41.View ArticleGoogle Scholar
- Reif B, Kock M, Kerssebaum R, Kang H, Fenical W, Griesinger C: ADEQUATE, a new set of experiments to determine the constitution of small molecules at natural abundance. J Magn Reson Ser A. 1996, 118 (2): 282-285. 10.1006/jmra.1996.0038.View ArticleGoogle Scholar
- Kock M, Junker J, Lindel T: Impact of the H-1,N-15-HMBC experiment on the constitutional analysis of alkaloids. Org Lett. 1999, 1: 2041-2044. 10.1021/ol991009c.View ArticleGoogle Scholar
- Lindel T, Junker J, Kock M: 2D-NMR-guided constitutional analysis of organic compounds employing the computer program COCON. Eur J Org Chem. 1999, 573-577.Google Scholar
- Kock M, Junker J, Maier W, Will M, Lindel T: A COCON analysis of proton-poor heterocycles-Application of carbon chemical shift predictions for the evaluation of structural proposals. Eur J Org Chem. 1999, 579-586.Google Scholar
- Junker J, Maier W, Lindel T, Kock M: Computer-assisted constitutional assignment of large molecules: COCON analysis of ascomycin. Org Lett. 1999, 1: 737-740. 10.1021/ol990725b.View ArticleGoogle Scholar
- Meiler J, Kock M: Novel methods of automated structure elucidation based on C-13 NMR spectroscopy. Magn Reson Chem. 2004, 42 (12): 1042-1045. 10.1002/mrc.1424.View ArticleGoogle Scholar
- Meiler J, Sanli E, Junker J, Meusinger R, Lindel T, Will M, Maier W, Kock M: Validation of structural proposals by substructure analysis and C-13 NMR chemical shift prediction. J Chem Inf Comp Sci. 2002, 42: 241-248.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.