Technical implications of new IUPAC elements in cheminformatics
© The Author(s) 2017
Received: 5 January 2017
Accepted: 2 February 2017
Published: 13 February 2017
The symbols for the new IUPAC elements named in November 2016 can introduce subtle ambiguities within cheminformatics software. The ambiguities are described and demonstrated by highlighting inconsistencies between software when handling existing element symbols.
On the 28th November 2016 the International Union of Pure and Applied Chemistry (IUPAC) approved the names and symbols of four new elements: 113 Nihonium (Nh), 115 Moscovium (Mc), 117 Tennessine (Ts), 118 Oganesson (Og). Cheminformatics libraries typically use a centralised dictionary of elements to store and look up symbols in the periodic table. Naïvely adding the new element symbols to this table can introduce unexpected behaviour.
While it is true that a human may decipher the intended meaning, it is more difficult for the software especially when the compound is no longer associated with the original context. The existence of compounds like PubChem ’s p-tolylactinium (CID 20712520 ) instead of the intended p-acetyl structure demonstrate this. The error here was propagated from the original substance submissions: a deprecated ChemSpider  entry, and patent sketches extracted by SCRIPDB .
Software for sketching chemical diagrams often allow the input of contracted abbreviations. ChemDraw 15 interprets all Ac labels as Acetyl and it is impossible to add an Actinium atom to a sketch even via the periodic table selection menu or from file or line-notation input. In MarvinSketch 126.96.36.199 entering Ac using the periodic table menu or keyboard short-cuts results in Actinium whilst using the “Label Editor” produces Acetyl. BIOVIA Draw 2017 makes a clear distinction when adding abbreviated atoms and both interpretations can be input. ChemDoodle 7.0.2 always interprets Ac as Actiniumwhen setting the atom label but does allow OAc for Acetoxy. With all of these, there is often little visual indication or feedback as to whether a user has entered the input they intended to.
To remove the ambiguity between Tosyl and Tennessine the alternative abbreviation Tos can be used. A brief analysis of sketches taken from United States patent applications published in 2015 shows that Ts is used in atom labels 2290 times and Tos 113 times.
A more subtle problem may arise with the symbol Nh in software that allows case insensitive atom labels. It is reasonable to accept CL as equal to Cl for chlorine (e.g. PDB HETATMs) but NH (secondary amine) may now unexpectedly be picked up as Nihonium from the internal dictionary.
Support for the SMARTS query language is available in many closed and open-source cheminformatics toolkits. A potential area for ambiguity is again found with Nihonium and the interpretation of other transfermium symbols. Transfermium symbols were officially named after the initial release of the Daylight SMARTS toolkit  and in subsequent implementations some are interpreted differently between toolkits either as an element or a conjuction (AND) expression. For example, at the time of writing both the CDK  and Open Babel  interpret [Bh] as [B&h] by whilst RDKit  interprets it as [#107].
Ambiguous SMARTS for transfermium element symbols officially named since 1997
Aliphatic nitrogen and aromatic oxygen (logically impossible)
Aromatic boron with on explicit bond (possible on fragment matching)
[D&b] or [bD]
Aliphatic boron with at least one implicit hydrogen
[B&h] or [hB]
Aromatic sulfur with one explicit hydrogen
[H&s] or [sH]
Aromatic sulfur with one explicit bond (possible on fragment matching)
[D&s] or [sD]
Aliphatic Carbon and aromatic nitrogen (logically impossible)
Nitrogen with at least one implicit hydrogen
[N&h] or [hN]
A pragmatic approach to handling the new elements or perhaps all high atomic number elements with a very short half-life could be to simply ignore them. Whilst these elements are unlikely to have a practical application it is unsatisfactory to simply ignore them and we hope this commentary highlights that care should be taken when supporting the new symbols in cheminformatics software.
JWM wrote the manuscript and RAS recognized the ambiguities with SMARTS expressions. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):1202–1213. doi:10.1093/nar/gkv951 View ArticleGoogle Scholar
- National Center for Biotechnology Information. PubChem compound database; CID=20712520. https://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=20712520. Accessed 3 Jan 2017
- Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87(11):1123–1124. doi:10.1021/ed100697w View ArticleGoogle Scholar
- Heifets A, Jurisica I (2012) SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents. Nucleic Acids Res 40(D1):428–433. doi:10.1093/nar/gkr919 View ArticleGoogle Scholar
- Daylight Chemical Information Systems Inc. http://www.daylight.com. Accessed 3 Jan 2017
- Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500. doi:10.1021/ci025584y View ArticleGoogle Scholar
- O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3(1):33. doi:10.1186/1758-2946-3-33 View ArticleGoogle Scholar
- RDKit: Open-source cheminformatics. http://www.rdkit.org. Accessed 3 Jan 2017