Looking over the rim: algorithms for cheminformatics from computer scientists
© Meinl et al; licensee Chemistry Central Ltd. 2014
Published: 11 March 2014
In recent years a number of methods were invented in the data mining/machine learning field that have received little attention in the cheminformatics world even though they offer interesting properties for these types of applications as well - even compared to some similar algorithms published primarily in the cheminformatics space. In this talk we want to highlight three of these algorithms/approaches. The first is MoSS , a frequent subgraph miner that can not only be used to find common substructures in a set of molecules but is also able to compute the MCSS very fast and has some extension especially suited for molecules. The second presented approach deals with the problem of finding diverse subsets of molecules . Quite interestingly, not only finding a diverse subset can be a challenging task but already the definition of diversity is not as straight-forward as it seems at the first glance. The third algorithm goes along the same lines but tries to find similar molecules by looking at their properties from so-called parallel universes . Each universe contains a set of related properties and partial predictive models are built in each universe separately. Through interactive model construction, e.g. by so-called Neighbourgrams, the models from one universe can aid the construction of a models in other universes.
- Borgelt C: Canonical Forms for Frequent Graph Mining. Proc. 30th Annual Conf. of the German Classification Society (GfKl 2006, Berlin, Germany). 2006, Springer-Verlag, Heidelberg, Germany, 337-349.Google Scholar
- Meinl T, Ostermann C, Berthold MR: Maximum-Score Diversity Selection for Early Drug Discovery. J Chem Inf Model. 2011, 51: 237-247. 10.1021/ci100426r.View ArticleGoogle Scholar
- Wiswdel B, Berthold MR: Supervised Learning in Parallel Universes using Neighborgrams. Proceedings of the 10th International Symposium on Intelligent Data Analysis, Series Lecture Notes in Computer Science (LNCS). 2011, Springer-Verlag, Heidelberg, Germany, 7014: 388-400.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.