Modeling of molecular atomization energies using machine learning
© Rupp et al; licensee BioMed Central Ltd. 2012
Published: 1 May 2012
Atomization energies are an important measure of chemical stability. Machine learning is used to model atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only . Our scheme maps the problem of solving the molecular time-independent Schrödinger equation onto a non-linear statistical regression problem. Kernel ridge regression  models are trained on and compared to reference atomization energies computed using density functional theory (PBE0  approximation to Kohn-Sham level of theory [4, 5]). We use a diagonalized matrix representation of molecules based on the inter-nuclear Coulomb repulsion operator in conjunction with a Gaussian kernel. Validation on a set of over 7000 small organic molecules from the GDB database  yields mean absolute error of ~10 kcal/mol, while reducing computational effort by several orders of magnitude. Applicability is demonstrated for prediction of binding energy curves using augmentation samples based on physical limits.
- Rupp M, Tkatchenko A, Müller KR, von Lilienfeld OA: Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters. 2012, 108 (5):Google Scholar
- Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2009, New York: SpringerView ArticleGoogle Scholar
- Perdew JP, Ernzerhof M, Burke K: Rationale for mixing exact exchange with density functional approximations. J Phys Chem. 1996, 105 (22): 9982-9985. 10.1063/1.472933.View ArticleGoogle Scholar
- Hohenberg P, Kohn W: Inhomogeneous Electron Gas. Phys Rev. 1964, 136 (3B): B864-B871. 10.1103/PhysRev.136.B864.View ArticleGoogle Scholar
- Kohn W, Sham LJ: Self-Consistent Equations Including Exchange and Correlation Effects. Phys Rev. 1965, 140 (4A): A1133-A1138. 10.1103/PhysRev.140.A1133.View ArticleGoogle Scholar
- Blum LC, Reymond JL: 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. J Am Chem Soc. 2009, 131 (25): 8732-8733. 10.1021/ja902302h.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.