mol2chemfig, a tool for rendering chemical structures from molfile or SMILES format to LATE X code
© Brefo-Mensah and Palmer; licensee Chemistry Central Ltd. 2012
Received: 21 August 2012
Accepted: 25 September 2012
Published: 2 October 2012
Displaying chemical structures in LATE X documents currently requires either hand-coding of the structures using one of several LATE X packages, or the inclusion of finished graphics files produced with an external drawing program. There is currently no software tool available to render the large number of structures available in molfile or SMILES format to LATE X source code. We here present mol2chemfig, a Python program that provides this capability. Its output is written in the syntax defined by the chemfig TE X package, which allows for the flexible and concise description of chemical structures and reaction mechanisms. The program is freely available both through a web interface and for local installation on the user’s computer. The code and accompanying documentation can be found at http://chimpsky.uwaterloo.ca/mol2chemfig.
KeywordsLATE X Chemfig Molfile SMILES Molecular structures Code generation
The molfile  and the SMILES  data formats are widely used to represent molecule structures with or without atomic coordinates, respectively. The entries in the PubChem database  are available in both formats. Other chemical data formats can be converted to molfile or SMILES using converters such as openbabel , and most interactive chemical drawing programs can export these formats as well. We therefore chose molfile and SMILES as input formats for mol2chemfig.
mol2chemfig is written in Python version 2 . It was tested only on Python 2.7 but uses no particular features of that version, and should therefore run on any recent Python 2.x installation. In addition to various modules from the standard library, it uses the indigo cheminformatics library and its accompanying Python API [10, 11], which it relies on for parsing of molfile and SMILES input, addition or removal of hydrogen atoms, and the calculation of missing coordinates.
The program, which is used from the command line, and its required libraries can be installed on the user’s computer. Alternatively, a server installation of the program can be accessed through a web interface. As a third option, a command line-driven thin web client is available, which accepts input in the same way as the locally installed program but then hands it off to the server installation. The web interface is also implemented in Python. The thin client is implemented in Lua. Since TeXLive contains a Lua interpreter, it runs the thin client without installing any other software. MikTex should to the same, but the authors have not confirmed this. The thin client also transparently accesses the most up-to-date version of mol2chemfig.
Using indigo, the molfile or SMILES input is read into the data structures defined by that library. If coordinates are missing (SMILES input) or the user has explicitly requested calculation of new ones, indigo is used to compute them.
mol2chemfig code modules
Accepts and validates user input from the command line or through the web; invokes indigo to parse input and supply missing coordinates; hands over to molecule
Generates tree representation of the molecule, applies options, renders molecule to chemfig code
Supplies translations and auxiliary code for rendering the molecule tree to chemfig code
Supply auxiliary classes for molecule
Supplies auxiliary classes and global settings
Define and process options
From the data structures populated by indigo, a tree representation of the molecule is built.
The tree is traversed and annotated in order to satisfy the user-selected options for molecule rotation, bond scaling and so forth.
The tree is rendered to chemfig code, which is returned.
The chemfig code generated by mol2chemfig uses several custom macros. These macros must be loaded by LATE X documents in order to execute the generated code; they are contained within a separate small LATE X package (mol2chemfig.sty) that also takes care of loading the chemfig package. The chemfig package, in turn, requires and loads the TiKZ package (Figure 1).
As of this writing, both TiKZ and chemfig are available in the two major TE X distributions (MikTeX and TeXLive). The custom LATE X code for mol2chemfig is included in this program’s download.
Results and discussion
The use of the program and its features will here be illustrated with a few short examples; some more examples are contained in the documentation available through the program’s website, as well as in the Additional file 1 to this paper. While some basic elements of chemfig’s syntax will be briefly introduced, the latter will not be covered systematically. The chemfig package’s accompanying documentation is clearly written and thorough; reference  gives a brief but useful introduction.
Basics of operation
The program is invoked from the command line. It takes exactly one argument, which by default is the name of a file containing a single molecule in molfile format. Output is written to stdout; output redirection will typically be used to write to a file instead. A miscellany of options is available to modify input and output. Invoking mol2chemfig -h or simply mol2chemfig will display the full list of options and their descriptions.
Hand-written versus mol2chemfig-generated chemfigcode
Outside of the ring, bond angles cannot be inferred and are specified explicitly between angular brackets. A preceding single colon denotes an absolute angle; an angle that is relative to the preceding bond can be specified with two colons, as in [::45]. Branches are again created by parentheses, as in line 5 of Figure 2A; this line also illustrates chemfig’s convention for specifying stereo bonds that point upwards. Since chemfig ignores whitespace, Figure 2A could also have been written as: ∖chemfigNH_2-[:270]-[:210](<[:150]HO)-[:270]⋆6(=-(-HO)=(-OH)-=-); this style might appeal to enthusiasts of the brainfuck language .
While elegant and effective for hand-coded molecules, chemfig’s syntax for rings is somewhat orthogonal to the tree syntax used with other parts of the molecule and thus is not implemented in mol2chemfig; therefore, the generated code in Figure 2B treats the ring much like the remainder of the molecule. By default, mol2chemfig uses one line for each bond and appends an end-of-line comment with the number of the atom that is reached by this bond; this number is the same as in the input if the latter is given in molfile format. If the number is prefixed with ->, as in line 16 in Figure 2B, this indicates that the bond closes a ring and points back to an atom that appeared in the output earlier.
A convenient method to include the code generated by mol2chemfig in a LATE X document is to load it from an external file with ∖input. Note, however, that ∖input cannot be used inside a ∖chemfig macro; therefore, the ∖chemfig macro must be part of the external file. The -w or --wrap-chemfig option used in Figure 2B assists with this by enclosing the generated code in a ∖chemfig macro.
Charges and radicals
The molfile format can represent radicals and charges, and these are supported by mol2chemfig. Charges and radical electrons (as well as implicit hydrogens) are placed so as to minimize interference with bonds attached to the atom in question. Figure 3 shows the structure of FMNH as an example.
Coordinate calculation and transformations
In the rendered structure, the bond angles seem just a little off; this is confirmed by looking at the generated code, which shows angles that are close to, but not quite exactly the multiples of 30 degrees. Instead of fixing up all those angles manually, we can ask mol2chemfig to recalculate them for us with the -u or --recalculate-coordinates option; this is shown in Figure 4B. This example also illustrates the -p or --flip option to horizontally flip the molecule; other options allow vertical flipping and rotation. Finally, the -o or --aromatic-circles option renders aromatic rings with circles instead of discrete bonds.
Note that, in the recalculated structure, the orientations of some substituents are changed. These decisions are made by indigo, from which mol2chemfig adopts the coordinate calculation wholesale.
Working with sub-molecules
The submol mechanism operates essentially through string substitution; therefore, subsequent sub-molecules are simply connected across the last and first atoms of their respective main chains. In order to place those connecting bonds correctly, we thus need to take control of the entry and exit atoms for the sub-molecules. To find the correct ones, we can let mol2chemfig print the atom numbers, as illustrated in Figure 6B. Setting atoms 6 and 11 as entry and exit atoms, respectively, then produces the structure shown in Figure 6C.
Note that, in the sub-molecule definition generated for Figure 6C, the primary amino group was manually changed to a secondary one. Generally speaking, while basic usage of mol2chemfig does not require familiarity with chemfig’s syntax, the ability to manually touch up the generated code will notably increase the usefulness of this program. The chemfig package offers a plethora of settings for bond lengths, colors and patterns as well as font sizes and shapes that allow the user to tweak the appearance of the rendered structures. It also provides facilities to depict reaction mechanisms and schemes; structures generated with mol2chemfig can be manually modified and incorporated into such schemes.
The mol2chemfig program introduced here allows the conversion of molecules specified in molfile or SMILES format to the TE X-compatible format defined by the chemfig package. The generated code can be included in documents as is, or can be edited and integrated into larger chemfig graphics. We hope the program will be useful for authors who wish to illustrate the structures of organic molecules and reactions in LATE X documents.
Availability and requirements
Project Name: mol2chemfigProject home page: http://chimpsky.uwaterloo.ca/mol2chemfig/Operating system(s): Linux, Windows, MacProgramming language: Python 2.7Other requirements: For full version: Python 2.7, the indigo toolkit and its prerequisite libraries; for thin client: a Lua interpreter. The LuaTeX binary that is available through TeXLive or MikTeX satisfies this requirement. (The manual installation of indigo is described at https://github.com/ggasoftware/indigo/blob/master/README.txt; binary packages are available for several Linux distributions.)Any Restrictions to use by non-academics: None. The code is freely available under the LATE X public license.
The locally installable full version and the thin web client are packaged and available for download from the project’s website. The server setup that is used by both the web interface and the web client is not routinely available, but the required code and setup instructions will be shared upon request.
EB-M: code implementation and testing, preparation of manuscript; MP: code implementation, preparation of manuscript. Both authors have read and approved the manuscript.
We thank the author of the chemfig package, Christian Tellechea, for helpful discussion and for the contribution of some auxiliary TE X code.
- Fujitaa S: The XyMTeX System for Drawing Chemical Structures. 2010, [http://xymtex.com/fujitas3/xymtex/indexe.html]
- Hagen J, Otten AF: PPCHTEX, a macropackage for typesetting chemical structure formulas. 2001, [www.pragma-ade.com/general/manuals/mp-ch-en.pdf]
- Tellechea C: chemfig: Draw molecules with easy syntax. 2012, [http://www.ctan.org/pkg/chemfig/]
- Tantau T, Feuersaenger C: PGF and TikZ - Graphic systems for TeX. 2011, [http://sourceforge.net/projects/pgf/]
- Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, Laufer J: Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J Chem Inform Comp Sci. 1992, 32: 244-255. 10.1021/ci00007a012. [http://pubs.acs.org/doi/abs/10.1021/ci00007a012]View Article
- Weininger D: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model. 1988, 28: 31-36. 10.1021/ci00057a005.View Article
- The PubChem database. 2012, [http://pubchem.ncbi.nlm.nih.gov/]
- O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR: Open Babel: An open chemical toolbox. J Cheminform. 2011, 3: 33-10.1186/1758-2946-3-33. [http://view.ncbi.nlm.nih.gov/pubmed/21982300]View Article
- The Python programming language. 2012, [http://www.python.org/]
- Pavlov D, Rybalkin M, Karulin B, Kozhevnikov M, Savelyev A, Churinov A: Indigo: universal cheminformatics API. J Cheminform. 2011, 3 (Suppl 1): P4-10.1186/1758-2946-3-S1-P4. [http://dx.doi.org/10.1186/1758-2946-3-S1-P4]View Article
- The indigo cheminformatics toolkit. 2012, [http://ggasoftware.com/opensource/indigo/]
- Wright J: Exploring ChemFig: Basics. 2012, [http://www.texdev.net/2012/08/25/exploring-chemfig-basics/]
- Paczkowski A: 99 bottles of beer. One program in 1500 variations: Language brainfuck. 2005, [http://www.99-bottles-of-beer.net/language-brainfuck-101.html]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.