Rubabel: wrapping open Babel with Ruby
© Smith et al.; licensee Chemistry Central Ltd. 2013
Received: 6 December 2012
Accepted: 2 April 2013
Published: 24 July 2013
Skip to main content
© Smith et al.; licensee Chemistry Central Ltd. 2013
Received: 6 December 2012
Accepted: 2 April 2013
Published: 24 July 2013
The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating, time-consuming, arcane, and verbose. Although high-level language wrappers have been implemented, more can be done to leverage the intuitiveness of object-orientation, the paradigms of high-level languages, and the extensibility of languages such as Ruby. We introduce Rubabel, an intuitive, object-oriented suite of functionality that substantially increases the accessibily of the tools in the Open Babel chemoinformatics library.
Rubabel requires fewer lines of code than any other actively developed wrapper, providing better object organization and navigation, and more intuitive object behavior than extant solutions. Moreover, Rubabel provides a convenient interface to the many extensions currently available in Ruby, greatly streamlining otherwise onerous tasks such as creating web applications that serve up Rubabel functionality.
Rubabel is powerful, intuitive, concise, freely available, cross-platform, and easy to install. We expect it to be a platform of choice for new users, Ruby users, and some users of current solutions.
Despite the fact that chemoinformatics tools have been developed since the late 1990s , the field has yet to rally in support of a single library. The intricacies of the libraries combined with the low-level programming prowess required for these languages present a considerable barrier to adoption by less programming-oriented practitioners. What’s more, the competing libraries don’t share complete coverage of implemented tasks, meaning that the practitioner, who may be struggling with the language barrier, has to shoulder the additional burden of being well versed in the differences between the libraries, including different APIs, different IO interfaces and different data type standards.
The Cinfony project  is an attempt to offer high level access to three major existing chemoinformatics libraries from Python , a high-level scripting language . Cinfony’s use of Python greatly reduces the number of lines of code required for a broad range of chemoinformatics tasks. Though it allows the user to access the functionality of the component libraries from one Python script, Cinfony does not automatically manage underlying data types nor the choice of which library to use for which function. This allows users more control over how Cinfony works. However, as the authors acknowledge, it requires users to have an intimate knowledge of the component libraries in order to manage what data types, conventions, and operations can be performed by each of the three libraries it wraps. Despite the success of Cinfony, there is still a need for simplified, high level access to common chemoinformatics tasks.
Since most common tasks are available in any single chemoinformatics library, wrappers for single tool kits are widely used. Because these wrappers interface into a single library, they have the potential for simpler interfaces and easier extension.
Pybel [5, 6], a Python toolkit inspired by the Daylight project [7, 8], wraps the chemoinformatics library Open Babel . Pybel has an active user base and is an actively developed high-level solution to the accessibility problem of the available chemoinformatics libraries. Still, Pybel’s implementation in Python may not be the most intuitive interface for new users, who may not be strong programmers, or for Rubyists, who will miss multi-lined lambdas, simple extension (i.e, open-classes), and Rubygems, Ruby’s streamlined add-on installation tool .
In addition to Pybel, other attempts have been made to make open source chemoinformatics libraries more accessible. Indigo Python, a Python wrapper bundled with the Indigo open source chemoinformatics library , is a substantial improvement over the C++ library it wraps in terms of reduction of lines of code (LOC) needed to implement common tasks. RDKit  is a C++ library that has a Python wrapper and provides substantial reduction of LOC over direct access to the underlying C++ library. Most other available toolkits are either proprietary (such as OpenEye  and CACTVS ) or have not yet been documented and developed to maturity.
Ruby has penetrated the applied sciences where the need for a concise but powerful language meets appreciation for an easy learning curve [15, 16]. A minimal learning curve, concise coding, and powerful language paradigm have made Ruby an attractive option for coding bioinformatics tools, such as BioRuby .
For those who are not comfortable enough with programming to use the current tools, for those who prefer the Ruby way, and for those who want to do more with fewer lines of code, we present Rubabel. Rubabel offers a convenient, intuitive molecule-centric interface and facile intra-molecular navigation with minimal lines of code per task. It is an easily installed, actively developed project with a substantial amount of implemented functionality and an arbitrarily accessible extension mechanism for customization.
Open Babel’s acceptance rests at least partly on its SWIG bindings  which allow it to be accessed from languages other than C++. The bindings provide handles for accessing the internals of Open Babel.
For those who are less confident in C++ programming or aren’t familiar enough with the code base to know the command line composition for their desired task, Open Babel’s Ruby SWIG bindings provide an alternative solution. Although the bindings technically allow access to Open Babel from Ruby, it quickly becomes evident that the user is not convincingly spared from C++. An intimate understanding of Open Babel’s implementation architecture is required for many if not most tasks, and in some cases an almost line-for-line translation from C++ to Ruby is necessary. For example, Listing Ruby SWIG bindings shows how to instantiate a molecule from a SMILES string with the Ruby bindings.
Rubyists will notice that this code seems strikingly more like C++ than Ruby. Moreover, note that despite the uncharacteristic simplicity of this example, the user still needs to understand explicit details of the Open Babel architecture including the OBMol and OBConversion objects and modification methods for OBConversion. For more complex but typical examples, such as highlighting a substructure within a molecule in an image, the LOC required are comparable to C++. Adding Ruby-style objects and idioms to the SWIG bindings is the obvious next step toward improving upon the Ruby SWIG.
Rubabel is much more than a wrapper that ports Open Babel functionality to Ruby. Rubabel organizes the Open Babel objects into a more intuitive structure and extends the available behavior in a manner consistent with Ruby idioms, which is beneficial to experienced Rubyists and non-programmers alike, who both will find the interface intuitive and straightforward.
Wraps Open Babel’s OBmol object. Adds the ability to intelligently manipulate molecules as strings, transfer to and from lists of atoms and bonds, add and modify atoms, explicit and general molecular matching, iterate over bonds or atoms, copy molecules, png representation of the molecule, and fingerprinting.
Wraps Open Babel’s OBatom object. Adds accessibility conveniences such as the ability to seamlessly create or access an atom as an atomic number, the ability to intrinsically iterate through bonds and pass blocks to iterating loops, and the ability to iterate through and optionally execute a block of code for each atom bonded to the current one.
Wraps Open Babel’s OBBond object. Adds an accessor for a list of attached atoms, a seamless enumerator for attached atoms, the ability to execute a block of code for each attached atom, and the ability to easily check if a given atom is connected with this bond.
Wraps Open Babel’s smarts pattern object.
Ruby is a language designed to be easy to use, intuitive, and fun. We designed Rubabel to embody as many of these admittedly subjective qualities as possible by designing Rubabel object behavior in ways consistent with established Ruby idioms for object behavior.
Rubabel’s object-oriented paradigm defines behaviors for the objects with which they interact. For example, Open Babel’s Tanimoto coefficient logic will always apply to molecules, so in Rubabel that functionality is built into a method on the Molecule object. Similarly, in Open Babel, write methods for drawing molecules in image files are located in the code base as discrete functions. However, object-oriented methodology dictates that the objects themselves—not external modules—should define how they are printed. In Rubabel, write methods are defined for Molecule. Another example of linking behavior to the objects modified can be found in Rubabel iterators. In Open Babel, each object has its own iterator type as a separate object. In Ruby, iterators are implicit and connected to the object iterated over. There is no need to look up behavior because Rubabel’s iterators work exactly like iterators over native Ruby objects.
By following the object-oriented paradigm, users can instantly know what behavior is defined on any object by simply typing <object name>.methods in an interactive Ruby console. Non-object-oriented code requires digging through documentation or, if there isn’t any, sourcecode. Both options are unappealing due to the time commitment, while the latter option is inaccessible to non-programmers.
Listing Object orientation is illustrative of how object-orientation facilitates more intuitive code. Through Rubabel’s explicit Bond object, one can access a bond (line 3), upgrade its order (line 3), and downgrade its order (line 4). One would expect that the plus and minus operators to define the syntax to increase or decrease a bond’s order. With Rubabel, they are.
Object-orientation reduces lines of code. When considering an SD file, it seems reasonable to think about each entry in the file as a Molecule object. With Rubabel, you can do exactly that by iterating through each Molecule object in a file. Listing Object orientation shows how to open an SD file and print out each molecule whose weight is in the range (300,400) in just one line of code.
To assist in convenience and minimize syntax lookup time, the Ruby idiom for strings is engineered for frequent exposure in Rubabel. For example, the Molecule object can be implicitly treated as a string, allowing splitting and matching operations that are concise and intuitive. For example, the user can collect all atoms with two bonds (excluding implicit hydrogen atoms) and append a carbon to the first atom in the filtered collection (see Listing String idiom).
Additionally, Rubabel implements both split and append methods for the Bond object that mirror the same behavior defined in Ruby strings. Listing String idiom shows an example of splitting bonds. Lines 2-4 create a molecule, find each single bond that links a carbon atom to an oxygen atom, then splits those bonds. Line 5 appends a carbon and an oxygen atom to mol using atomic numbers with the append function. Line 6 does the same using the element name.
Because molecules are treated as lists of atoms, you can quickly and easily access and modify specific atoms in a molecule. Listing String idiom demonstrates adding an ethyl group atom-by-atom to the first carbon atom by indexing into the Molecule (line 3).
No other toolkits have equivalent functions to the string idiom in Rubabel.
Rubabel is designed to simplify common IO tasks to provide the shortest path to chemoinformatics functionality. Rubabel allows creation of Molecule objects from every format Open Babel accepts, including SMILES strings (see Listing Access methods). Note that Rubabel requires only one line where the SWIG code requires 4 (compare with Listing Ruby SWIG bindings).
Efficiency in accessing objects is very important to reducing LOC and increasing intuition. Listing Access methods gives some examples of object traversal in Rubabel, highlighting the amount of processing that can be done with very few lines of code in Rubabel. With very few lines of code and intuitive method names (select, find, reject), the user is able to conduct significant operations on newly created molecules. Lines 2-3 create a molecule then find the atom(s) that contain a double bond. Line 4 finds all the single- and double-bonded oxygen atoms in the molecule. Line 5 first finds all oxygen atoms, then removes from that list those that are bound to a carbon atom, yielding the peroxy oxygen.
Rubabel offers multiple novel methods that assist in building and modifying molecules and bonds. Several, including the bond order increment/decrement operator, split, and match functions were already highlighted. Additionally, the Molecule object defines adding and removing atoms, as well as a mass method that calculates the mass of the molecule taking into account the charge state.
Blocks are dynamic sections of code with open scope, sometimes several lines long, that allow injection of specific behavior into otherwise generic methods. This allows greater code reuse, concise code, and places custom logic next to the object it modifies instead of in a separate function or an external library. Consider Listing Blocks. By using find parameters in a block, Rubabel obtains a specific molecule in an SDF file in a more concise manner than RDKit, a Python chemoinformatics toolkit, which requires more control structure and logic (see Listing Blocks).
Additionally, blocks make for easier synonymous code—code that is different syntactically but equivalent functionally. This increases the likelihood that a non-expert user can ascertain the syntax of desired operations with minimal reference to documentation while allowing more experienced users the freedom to use coding styles they are familiar and comfortable with.
In addition to the two examples given here, Listings Object orientation and Access methods use blocks as well (lines 2 and 3-5, respectively). They are powerful tools not available in languages like C++ and Python.
We have provided explicit Molecule methods for common tasks such as Tanimoto coefficient calculation, substructure highlighting, and graph diameter measurement. In the likely event that users need custom extended behavior in Rubabel, they can take advantage of what are known as Ruby open classes. Objects in Ruby are more accessible to behavior modification than in some other languages. Writing custom behavior into Rubabel is analogous to using a plugin. Although Open Babel has a plugin mechanism which allows external code to be integrated into the toolkit, usage is not trivial. In contrast, Rubabel can be modified and accessed with ease using Ruby’s open classes. A class is open when it allows any external code to add or modify functionality in the local scope. For example, the authors are currently developing a plugin for Rubabel that defines fragmentation behavior for lipid molecules. They require the molecule behavior defined by Rubabel and also need to add descriptions of how lipids fragment in order to accomplish their task. With Rubabel, this is as simple as adding a few lines in a new Ruby file, as in Listing Custom behavior.
By using Rubabel, custom behavior can be defined and shared amongst lab groups and colleagues rapidly and easily.
Ruby extensions accessible to Rubabel
Possible application with Rubabel
Sinatra , a web application framework
Quick and easy webapp GUI for Open Babel, allowing multi-platform point and click chemoinformatics
Sciruby , a scientific library
Plotting, statistical tools, access to R programming language for Rubabel results
Rubyvis graphical library 
Open ended graphical software to make clean representations of numerical data
IRB, the interactive Ruby shell
Quick access to Rubabel and Open Babel from a terminal
Rspec, an automated code testing library 
Automated unit tests for software built with Rubabel (No Python equivalent due to Ruby’s block ability)
Ruby debugger 
Step into executed code with a live IRB session to ferret out bugs
Rubygems, a distribution tool 
Easily distribute and integrate applications written with Rubabel with a one-line install
As an example of the capabilities of these extensions, consider Sinatra. Using Sinatra, it is possible to give practitioners online access to Rubabel in very few lines of code. Applications can easily be developed to serve up both the native functionality of Rubabel as well as custom functionality developed as needed. To demonstrate the brevity of code required, consider the task of adding explicit hydrogen atoms to a molecule and printing the SVG image of the new molecule. Assuming that the user has a standard install of Ruby, which includes Rubygems and the prerequisites for Open Babel, the entire environment for Sinatra and Rubabel can be installed in one line (see Listing Building a Rubabel web app in Sinatra).
The functionality for the web app requires only five lines of code (see Listing Building a Rubabel web app in Sinatra). We place these in the file mol_h.rb.
The simplicity of this example readily extends to virtually all facets of Rubabel.
Though space will not permit an exhaustive consideration of all Ruby extensions that can be used in conjunction with Rubabel, the interactive Ruby shell (IRB) is of special import. As with languages like Python, Ruby’s interactive shell allows users a ready sandbox to run quick experiments, test syntax, or debug their scripts. IRB can be installed (provided the user has Rubygems) by typing gem install irb. Simply enter the IRB environment (irb at the terminal) and type require ‘rubabel’ and all of Rubabel’s functionality is accessible in an interactive terminal. This is particularly useful given the number of tasks that Rubabel can accomplish in just one line. Rubabel in IRB provides an interactive sandbox to experiment in realtime with instant feedback—a refreshing alternative to stringing together guess-and-check command line arguments. As mentioned before, this is also a fantastic and fast way to look up (via <object name>.methods) or check syntax.
To provide a quantitative analysis of Rubabel compared to existing tools, we compare the required number of lines of code (LOC) from the Chemistry Toolkit Rosetta Wiki tasks . The CTRwiki provides code snippets for 18 common chemoinformatics tasks for more than 17 toolkits in various programming languages. Since there are several toolkits with only one or two solutions we consider only open source solutions with at least 5 of the CTR tasks documented. We use the CTR tasks implemented as a quantitative measure of accessibility to functionality and not necessarily as an absolute measure of what is or is not implemented in a toolkit. There are many toolkits out there, with varying off-the-shelf capabilities and documentation. Since these toolkits are wrappers with access to the underlying libraries, it is possible, given enough time and code, to do anything in them that the underlying library does. However, the point of this comparison is to show off-the-shelf accessibility, scope, and verbosity. From a user perspective, having a listing of a core set of basic behaviors is useful not only from a comparative standpoint, but because of the appeal of off-the-shelf code to accomplish a standard task. That the CTR tasks are acceptable and demonstrative is suggested by the number of platforms that have adopted them, including Open Babel, which has provided a tutorial showing how to do each of them. Though a lack of a CTR entry for a given method does not indicate that the task is not possible, it means that the recipe for the task is not available in this convenient location and is thus harder to obtain than the tasks that are posted there. The CTR tasks also provide a convenient and established set of functionality to demonstrate a baseline set of features provided in Rubabel.
Rubabel has some features which users may find useful that are not available in Pybel. These include an explicit Bond object and the associated functionality, simpler atom interrogation, enumerable atoms and bonds, intuitively named wrapped output and input options (obviating the need to dig through Open Babel documentation to enumerate them), more Molecule object modifications (e.g. adding hydrogens at a specific pH), and simpler output (Rubabel infers the output format from the filename). Additionally, there are several extensions written in Ruby that do not yet have equivalents in Python (see Table 2).
Rubabel is open source software released under the liberal MIT license. The license and source code, as well as instructions on how to install, are found at https://github.com/princelab/rubabel. The project is available as a Ruby gem , which makes it easy to install. For those who already have Ruby, Rubygems, and Open Babel’s prerequisites installed, Rubabel and all requirements (including Open Babel) can be installed with one line: gem install rubabel. Rubabel can also be downloaded and built from source. The instructions for this are available on the github site mentioned above.
Chemists are not necessarily computer scientists. The more concise, clear, and accessible a toolkit is, the less time they spend learning syntax and the more time they spend solving chemistry problems. Ruby is designed to be intuitive, concise, and powerful. Rubabel wraps Open Babel in a way that is true to these qualities. Rubabel provides more intuitive object organization than Open Babel and provides extra functionality designed to streamline code writing by limiting both the time necessary to look up function syntax and the number of lines of code required. Rubabel also provides access to the many open source extensions available for Ruby. Rubabel’s concise and intuitive design makes common chemoinformatics tasks readily accessible from scripts, interactive shells, or custom applications in few lines of code and with less time spent learning APIs. Intentionally intuitive design, concise code idioms, and simplified common tasks make Rubabel appealing to Rubyists, non-programmers, and a segment of the users of other platforms.
Project name: Rubabel
Project home page: https://github.com/princelab/rubabel
Operating System(s): Platform independent
Programming language: Ruby
Other requirements: Open Babel’s Install Requirements, Rubygems
Any restrictions to use by non-academics: None
RS acknowledges the NSF (DGE-0750759) for financial support.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.