Representation and searching of biomolecules
© Joeseph L et al; licensee BioMed Central Ltd. 2010
Published: 04 May 2010
Biomolecules present challenges to chemical information systems designed for small molecules. Their sizes, up to tens of thousands of atoms, overwhelm representation/storage/searching solutions built on explicit chemical representation of the structures. But biomolecules are largely made up of many repeats of a limited number of building-block molecules, a fact which has been used to provide a compressed representation for biomolecules using templates for the building blocks.
We have adopted a modified template-based representation for biomolecules. Our primary interest is in the chemically modified portions of biomolecules, for which we choose to use explicit chemistry. These areas of explicit chemistry are then embedded in the template-compressed, unmodified portions of the full biomolecule.
The regions containing explicit chemistry are indexed, and thus can be structure searched with good performance. A limited number of residues surrounding explicit chemistry regions are included in the index for searching the context of these explicit regions. By using explicit chemistry to represent modified regions we can search across classes of modifications for common features. For example a single substructure search query will find green fluorescent protein, and its histidine, phenylalanine and tryptophan analogs.
Templates are stored with the structure providing a self-contained file format. The use of NEMA keys allows templates from different structures to be compared, and allows storage of structures containing a canonical list of templates. The residues have defined attachment points, allowing automated traversal of a protein backbone, or location of non-backbone bonds to residues.
We will present example structures and structural queries highlighting capabilities of our representation.
This article is published under license to BioMed Central Ltd.