Semantics vs. statistics in chemical markup
© Batchelor; licensee BioMed Central Ltd. 2012
Published: 1 May 2012
Since the late 1990s, natural language processing (NLP) has seen a massive shift from high-precision, low-recall systems based on small sets of hand-written rules, to methods based on the statistical analysis of large corpora. The field of chemoinformatics, likewise, is dominated by statistical and machine-learning approaches. In recent years, however, pharmaceutical companies have been engaging more and more with Semantic Web technologies, which are largely built around the sorts of hand-written systems that NLP has moved away from this century. We discuss where our current text analysis and Semantic Web efforts at the Royal Society of Chemistry are headed and how we're making use of the unreasonable effectiveness of data.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.