Systematic extraction of structure-activity relationship information from biological screening data
© Mathias et al; licensee BioMed Central Ltd. 2010
Published: 04 May 2010
The analysis of high-throughput screening data poses significant challenges to medicinal and computational chemists. The number of compounds assayed in a single screen is prohibitively large for manual data analysis and no generally applicable computational methods have thus far been developed to consistently solve the problem of how to best select hits for further chemical exploration.
Focusing on the question of how structure-activity relationship (SAR) information can be used to support this decision, we present methods for the descriptive analysis of screening data. Network representations visualize the distribution of 2D similarity relationships and potency in a data set and give an overview of global and local features of an activity landscape. Although dominated by many weakly active hits, different local SAR environments can be identified among screening hits, thus helping to focus on regions in chemical space that might show favorable SAR behavior in further exploration .
A more detailed analysis of the data is achieved by systematically mining the network for SAR pathways, i.e. sequences of pairwise similar compounds that connect two molecules via a gradually increasing potency gradient. The SAR pathways are calculated exhaustively for all possible compound pairs in a data set to identify those having most significant SAR information content. Often, high-scoring pathways lead to activity cliffs, i.e. pairs of similar compounds with significant differences in potency, and scaffold transitions can be observed along the pathways.
Furthermore, a tree structure organizes alternative pathways that begin at the same compound but lead to different molecules and chemotypes. Similarly, SAR trees can be generated from all pathways that lead to an activity cliff in order to characterize the surrounding SAR microenvironment .