The past, present and future of Scientific discourse
© Rzepa; licensee Chemistry Central Ltd. 2011
Received: 5 April 2011
Accepted: 14 October 2011
Published: 14 October 2011
Skip to main content
© Rzepa; licensee Chemistry Central Ltd. 2011
Received: 5 April 2011
Accepted: 14 October 2011
Published: 14 October 2011
The science journal is 346 years old in 2011, having evolved continuously but largely incrementally over that period. Its reinvention for an online presence has largely preserved its previously printed nature, in the sense that much of the increased functionality which is potentially offered by this new medium has yet to be exploited. In the present article an attempt is made to discuss two previously published papers, one in 1953 and the other in 2010, and to illustrate how additional functionality can be implemented in the form of accessible data sourced from quantum mechanical calculation and how subsequent discourse in the form of blogs may add to the process. In this sense, the reader of this article is invited to try for themselves whether these enhancements improve their scientific understanding, and whether such enhanced journals are good models for the future evolution of the genre.
The first journal devoted exclusively to science is generally accepted to have first appeared in 1665 as the Philosophical Transactions (of the Royal Society). The inaugural issue , which famously carries an account by Robert Boyle of "a very odd monstrous calf", is perhaps not science as we know it nowadays, but it does remind one rather of what one might find in a personal blog, a similarity I will return to later. The structure of the scientific journal and the articles published by this means evolved constantly and mostly incrementally during the next 330 years. One of the more significant, but nevertheless still incremental changes was adding an online presence during the late 1990s. Indeed, some journals founded in the late 1990s offered only an online version  and there are signs that some older journals may be preparing to abandon the (relatively expensive) printed form. Access to the online versions in 2011 is predominantly via a format which is perhaps best described as digital paper (PDF), although most journals also offer the articles in an alternative hypertext (HTML) format. There is no journal yet that has adopted a format increasingly used for books, the epub/epub3 standard . On the horizon is also the fifth major evolution of hypertext markup language known as HTML5 [4, 5], which strives to offer a richer interactive medium to the reader. Certainly machines (e.g. such as those working on behalf of search engines such as Google) now also automatically process the articles published in the modern journal, indexing the full-textual content, adding rich metadata on the topics therein described, and noting (but largely incapable of truly indexing the context of) the images and figures. Software can also usefully replace the old processes of binding the printed journal and the storage of volumes on shelves in a library or an office, a process delightfully described (by biologists, not chemists) as defrosting the digital library. These processes largely address only the bibliographic issues (via rich metadata harvesting) rather than attempting to defrost the scientific or chemical content itself. It is the issues involved in defrosting the latter type of information and data that the present article addresses.
Data has always been the "elephant in the room" of scientific publishing. Because the costs of printing and distributing paper are still significant to this day, print was never really been considered a viable mechanism for distributing the (often very large amounts of) data, whether raw, or partially processed, on which almost all scientific models, theories and their interpretations are based. Instead, starting in the early 1990s and coincident with the first introduction of the Internet, many science journals offered an annex to the main journal in the form of supporting or supplemental information. This was provided in final form by the authors themselves, and the journal itself added little extra value such as indexing to this (often purely visual) content. It was very much up to an interested reader to add their own value to any (visual, textual or numerical) supporting data that might be associated with an article.
The long view over a 350 year period is that these evolutions of the journal could be regarded as largely relating to the production and delivery processes of journals, and arguably have not been matched by similar advances in how scientists consume or use journals. In this essay, I will analyze two chemical articles, published in respectively 1953 and 2010, from the point of view of how the original journal presented the scientific discourse, what the limitations of that presentation might have been, and the prospects of how it could evolve into a step-change rather than incremental change in that discourse.
Armed only with the one diagram actually published, curiosity might lead one to pose a scientific question such as "How did Watson and Crick assign the helix as right rather than left handed"? In other words, on what data did they base that conclusion? This does matter! For example, some 733 articles have appeared in the science literature over the last 20 years or so where DNA is represented as having left-handed helicity, in most cases certainly erroneously . Coincidentally, similar issues of left or right-handedness were to be found when Pauling presented his α-helix models of proteins. In fact, almost all protein helices exhibit right-handedness . A partial answer to that question is actually given in what is called the full version  of the preliminary article  (published as it happens by the Royal Society). Here we are told the following:
that both chains follow right handed helices ...
because left handed helices can only be constructed by violating permissible van der Waals contacts.
We are informed that such permissible contacts include the approach of any two hydrogen atoms in the molecule to a distance of no less than 2.1Å.
We are not however informed what the violations might be in a left handed helix that excludes this model. In other words, just how close can two hydrogen atoms separated (for intramolecular contacts) by at least four bonds approach? In fact, distances of ~1.85Å or less have been observed .
In this same full article by Watson and Crick , we are given a table of numerical (polar) coordinates describing the positions of twelve key atoms, but it would have taken a very determined scientist to have used only this combination of information to easily confirm the assertion that a left-handed helix is excluded. Perhaps the lack of a model with which the reader could experiment might account for the relatively slow recognition of the importance of this article in the immediate years following its publication, and the observation that whilst a physical model of DNA had of course been built, it was only available for viewing (but not modifying) by visiting Cambridge!
Relative thermodynamic energies (kcal mol-1)a
Total energy (duplex)
ΔG298 single chain
This is a model of a DNA duplex tetramer, built using only the bases CGCG or ATAT in this example, with inclusion of three phosphate groups and calculated for both the left- and right-handed helical form. The first of these was the one deprecated by Watson and Crick on the basis that the model violates permissible van der Waals contacts. The geometry is optimized to high convergence using a recent density functional formalism (ωB97XD)  which incorporates a correction for the attractive dispersion component of the van der Waals interactions. Justification for the use of this functional in describing hydrogen bonding has recently been published . A 6-311G(d,p) basis set results in the wavefunction being described by up to 3468 basis functions, close to the practical limit using standard computing resources available in 2010. The ionic nature of the system, deriving from the phosphate groups, was treated using a self-consistent-reaction-field continuum solvent (water) [15, 16] as implemented in the Gaussian09 package, revisions A.02 and B.01. 12. In such a model, duplex formation by combining two tri-anionic chains is nevertheless exothermic in the computed free energy (Table 1), which suggests the model is not physically unrealistic. Although in principle a full ion-pair resulting from inclusion of a solvated positive counterion (typically Na+ or NH4+ with additional water molecules) could also be treated using this method , the resulting model is too large and complex for the current available computational resources.
The reader is invited to load the Z-d(CGCG)2 coordinates in Figure 3. The lengths of the van der Waals H...H attractions and the nucleophilic O...C attractions from the ribose ether oxygen to the electrophilic carbons on the guanine base have been enumerated. Because of the complex 3D nature of this molecule, they can only be truly perceived if the structure itself can be viewed from any desired angle (something clearly not possible in a conventional journal diagram). It also allows successive layers to be viewed, each perhaps concentrating on a particular aspect, without destroying or overwhelming the initial simple elegance of the overall concept (of a double helix). The identified ~2.8Å O...C interactions to the guanine are unique to the Z or left handed helical form, and since the sum of the vdW radii  of these two contact atoms is ~3.22Å, these are presumed to be (electrostatically) attractive. The alternative B-d(CGCG) 2 stereoisomer reveals these O...C contacts are absent, being instead replaced by hydrogen bonds (~1.9-2.1Å) between the ribose ether-oxygen and the NH2 hydrogens of the guanine. The sum of the O and H vdW radii is ~2.64Å, which suggests these are significantly attractive hydrogen bonds. There are additional C-H...O-P hydrogen bonded contacts of ~2.1-2.2Å. A similar divergence of attractive interactions emerges for chains built of AT bases. The Z-d(ATAT)2 duplex has only one ~2.8Å O...C interaction to the adenine, with three others having rather longer lengths (~3.0-3.1Å). The B-d(ATAT)2 duplex instead displays C-H...O contacts of ~2.4-2.5Å.
Z-d(CGCG)2 is stabilized (inter alia) by a short contact between a carbon on the guanine and the ribose ether oxygen, of which there are four per four base pairs.
These contacts are replaced in B-d(CGCG)2 by NH hydrogen bonded contacts to the ribose ether oxygen.
In Z-d(ATAT)2, the O-contacts to the adenine are much longer, which
in B-d(ATAT)2, are replaced by short CH...O contacts.
By embedding access to accurate coordinate data within Figure 3, the reader can select whatever level of detail they desire from the diagram. Part of the origins of the relative stability the Z- and B- helical forms is not simply due to the presence (or in this case absence) of "violation of permissible van der Waals contacts", but also to several types of less common but nevertheless attractive interactions which may not have been inferred by building physical models alone. Such additional insights may in turn impact upon e. g. modeling one remarkable property of the DNA polymer, its ability to be stretched to almost twice its normal length without breaking .
It is of course the accumulation of these effects that determines the overall stability of the structure (Table 1). The thermodynamic quantities are computed with inclusion of thermal energies, obtained by solving the appropriate partition functions using calculated vibrational frequencies. Since these require second derivatives of energy with respect to coordinates, a smaller basis set 6-31G(d) was used for the purpose (a calculation time of ~4 days on a 12-core processor is typical). The dispersion corrections were obtained at the slightly higher 6-311G(d,p) basis set level to allow interactions to H to be modelled more realistically.
These energies reveal some surprises. Firstly, the free energy for forming the duplex from the separated chains is significantly exothermic, despite the electrostatic repulsions resulting from each chain carrying a 3- charge. For the resulting helix, the B-d(CGCG)2 form is 11.9 kcal/mol less stable in terms of total free energy than the Z-isomer, but is 4.2 kcal/mol more stable for the dispersion/van der Waals term, the criterion suggested by Watson and Crick as generally discriminating against the Z-form (although without a specification of the base type used for the model). The greater stability of the Z-form arises from a contribution of 3.9 from the entropy and 8.0 kcal/mol from the (zero-point energy corrected) enthalpy, which dominates the less favourable dispersion term.
The formation of a B-d(ATAT)2 duplex is less exothermic than that of the CG duplex. It is now favoured by 5.2 kcal/mol over the Z-isomer in terms of free energy and by 12.8 kcal/mol in terms of dispersion contributions. The assertion often made  that the Z-helix is favoured by CG rich oligomers and the B-helix by AT-rich forms is thus confirmed by these calculations.
Armed with optimized coordinates which include the weaker interactions between atoms one can annotate the basic models revealed in Figure 3 with other (computed) properties. For example the optical rotation [α]589 has the value +62° for Z-d(CGCG)2 and -137° for the B-diastereomer, perhaps surprisingly small values for such an apparently asymmetric molecule. Also surprisingly, the corresponding experimental measurement does not appear to have been reported. Optical rotations are known to be rather fragile, being sensitive to small variations in conformation and solvation, but the electronic circular dichroism spectrum is regarded as rather more robust. Unlike other forms of spectroscopy such as NMR or IR, which can be used to infer structure from simple rules based on the functional groups present (in other words, local properties), these chiro-optical properties tend to be more characteristic of the global features of the molecule. As a result, they can be very difficult to interpret without a reasonably accurate model based on these global properties. A quantum mechanical computation of the molecular wavefunction is one such model, and it is now increasingly routinely used to help interpret optical rotations, electronic (and vibrational) circular dichroism spectra. Theory can now handle molecules containing ~250-254 atoms, such as the DNA tetramers modelled here. Annotation of the models with the calculated ECD spectra is included here in the hope they might prove useful for the interpretation of the experimental spectra.
This example has illustrated how access to accurate data can help provide additional insights into the factors controlling the stability of molecular structures. In this case, the factors controlling the helical stability of DNA duplexes can be teased out. By incorporating these models directly into the journal article (and providing links to digital repositories where a more complete dataset can be acquired if needed) the readers of the journal have an opportunity to discover their own insights within their own spheres of interest.
Much of the scientific insight was carried in the form of four colour figures, all presented conventionally as single layered graphics with the viewpoint selected by the authors. Information on acquisition of the data on which these figures were based was given as citation 28 in that article, which lists deposition numbers (CCDC 764864-764868) and a URL that would enable a CIF file for each entry to be downloaded. Whilst this retrieval process is not entirely automatic, it does take only a few minutes to acquire the data. The usefulness of the file is of course predicated on the reader also having access to appropriate software for analysis of a file in this format. It is also important to note that a CIF file allows inspection only of the refined crystallographic model presented in the article and the statistics associated with that model; it does not allow the user access to the underlying (hkl) diffraction data which would allow other models to be refined and assessed.
The reaction scheme reported  for photochemical generation of trapped 1,3-dimethylcyclobutadiene is shown below in Figure 5. It differs from the original in showing a thermally activated reaction arrow connecting the 1,3-dimethylcyclobutadiene 4 to 2. This last possibility is not explicitly discussed in the original report , although there is there an implicit assumption that this process is slow at the temperature of the experiment, 175K. The original article therefore seeks to persuade the reader on the basis of crystallographic evidence that the structure of 3 or 4 has been established, with the aid of the four colour figures included in the article, and (optionally for the reader) with acquisition of the CIF files.
The theme of the present article is to ask how a reader's experience and perception of a scientific article might be enhanced or simply altered by adopting new forms of presentation. In Figure 5, the relevant CIF file can be loaded as a second layer into the reaction diagram. Because of the relatively large number of host atoms involved, the effect can be somewhat overwhelming when this is done and the interpretation may also be made more complex by the presence of disorder in the guest. A further layer of interpretation can be added by annotating the diagram with selected atom-atom distances; the reader can use the display software to added further such annotations of their own if they wish. There are many other actions the reader can perform at this point . A further, this time smaller, alternative layer that contains only the kernel of the scientific problem (as perceived by the present author, which may or may not correspond to the perception of the original  authors) has been added here, and again four key measurement annotations made, together with selected bonds highlighted in a different colour.
What are the kinetics of the reverse reaction of 4 to give 2 at 175K?
Does the crystallographic evidence convince that the guest is best described as 1,3-dimethylcyclobutadiene in close proximity to a detached molecule of carbon dioxide?
More specifically, how should the interaction between the labelled atoms C2 and C3 be interpreted? Should it be considered a strong van der Waals contact, as suggested by the original authors  or as a covalent bond? The same question might apply to another atom pair, O1 and C6 also connecting carbon dioxide and the cyclobutadiene.
Likewise, how should the angles O1-C2-O7 or C2-C3-H10 be interpreted?
The reader may note a common theme emerging between these questions and the origins of helical stability in DNA as discussed above.
The first of these questions was in fact posed in the form of a blog, written by the present author  and based on chemical precedent and entropic arguments. It was posted in August 2010, little more than a month after the original report was first published. The precedent for this form of discourse when addressing a scientific issue had already been established [27, 28]. Questions 2-4 emerged more conventionally and a little later in November 2010 in the same journal as the original article, and took the form of comments submitted by two independent groups [29, 30]. The original authors have a right of reply to such comments, which they took . These various participants in the debate all had access to the same CIF data as is transcluded into Figure 5. The debate to this point was summarized in a second blog post , and this and the original post themselves attracted ~15 responses in the form of appended comments. These posed further questions, on themes such as the computed structure of 1,3-dimethylcyclobutadiene, a debate on how much energy was required for angular distortion of O1=C2=O7 as an isolated molecule, and whether molecule 2 is transparent to light in the 320 to 500nm excitation range employed by the original experiments. This latter point was followed up by calculations of the UV-visible absorption spectrum of 2 inside the host cavity, also appended to the blog, and finally by calculations of the predicted vibrational spectra. The next stage in the discourse occurred in a conventional journal , taking the form of a set of calculations on the likely barrier preventing 4 and carbon dioxide from recombining inside the host cavity, and addressing question 1 above in more complete detail. This article did have one less conventional aspect; in the "rich HTML" version, an interactive version of the table of data was made available  in very much the manner adopted for Figures 3 and 5 in the present article. Additionally, there were links in this table to digital repository entries , which would enable any interested reader to access the complete archived details of all the calculations reported in that article.
Shortly after this last article was published, in December 2010, several publishers chose to highlight this emerging debate with editorial blog posts of their own [35–38]. These posts in turn attracted further comments, including several by one of the original authors. One comment in particular  entitled "Request of calculated structure data" highlighted an important aspect concerning the accessibility of previously reported data . This alludes to the "rich HTML" table, and the observation that it is important to provide information on the file formats in which data is held, so that appropriate conversions if needed and concomitant visualization can be performed. This particular query was answered in the form of another blog post , and applies directly to the issue of how to re-use data associated with the current article (Figures 3 and 5).
The scientific discourse described above regarding the nature of the species in a host crystal lattice is still ongoing, and so a final consensus (if ever achieved) cannot be reported at the time of writing. It is noteworthy that the primary (hkl) crystallographic data relating to the original measurements has been provided upon request  and so further analysis of alternative crystallographic refinement models is now possible.
The two scientific examples discussed in this article span 57 years, a relatively short period in the history of the scientific journal. The first is arguably the most influential scientific article of the 20th century, and clearly the absence of data associated with it has not held back its recognition as such. What is also clear is that addition of such data, albeit 57 years after the original report, may have the potential to reveal further insights into the structure of DNA that may not have hitherto been highlighted. Whether such a data-rich reformulation of the original problem has any measure of impact remains to be established. The second article is only months old, but in that brief period has been subjected to the kind of scrutiny that can only be achieved by having access to rich data sets. One might fairly conclude that the scientific article has evolved to enable that scrutiny. The article that you are now reading I suggest is one model for how such scientific discourse can be both improved and accelerated. It remains to be seen if scientists are prepared to author such articles in the future. There is an early example  of an article where both the discourse and the data supporting that discussion were seamlessly integrated into one (XML-based) document, with the presentation being made available to the reader by application of suitable stylesheet-based transformations. The production of an article in this form was however non trivial. Since then tools have appeared to facilitate the process [42, 43] and the task now much be to reach both the hearts and the minds of scientific authors to encourage them to start adopting this form of enhanced scientific discourse.
Derived from a presentation given for a symposium celebrating the career of Peter Murray-Rust, in Cambridge UK on January 17, 2011.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.