Paper Cuts: 2007

Tuesday, 5 June 2007

A Forensic Study of Fingerprints

I really like the recent work by S. Joshua Swamidass and Pierre Baldi on comparing binary fingerprints [1], [2].

I'll focus on ref [1] here. In this study, Swamidass and Baldi show that a simple comparison of the number of bits that are set can quickly rule out particular compounds having a Tanimoto coefficient of greater than a certain value. To be exact, the maximum possible similarity that two fingerprints can have is min(A,B)/max(A,B) where A and B are the number of bits set in each fingerprint.

In other words, if you are looking for all compounds in a library that have a Tanimoto coefficient with a target compound of greater than 0.7, you probably don't have to consider the majority of the compounds simply on the basis of the number of bits they have set. This simple rule can make this procedure much more efficient.

But the authors don't stop there. They decided to completely nail this question, so they look at other types of similarity measure, and at the bounds on similarity when using a multiple-molecule query. And if that wasn't enough, they then lay on the maths and simulations, and go after equations for the resulting speedup for fingerprints of different lengths and different similarity thresholds.

This paper has got it all. It's comprehensive, thorough, and has a really useful take home message that all cheminformaticians who deal with fingerprints should bear in mind.

References:

[1] Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time
S. Joshua Swamidass and Pierre Baldi
J. Chem. Inf. Model., 47 (2), 302-317, 2007.

[2] Mathematical Correction for Fingerprint Similarity Measures to Improve Chemical Retrieval
S. Joshua Swamidass and Pierre Baldi
J. Chem. Inf. Model., 47 (3), 952 -964, 2007.

Tuesday, 24 April 2007

How similar are two reaction mechanisms?

Using Reaction Mechanism to Measure Enzyme Similarity Noel M. O'Boyle, Gemma L. Holliday, Daniel E. Almonacid and John B.O. Mitchell, Journal of Molecular Biology, 2007, 368, 1484-1499.

Since most cheminformaticians do not regularly read JMB, I thought it might be useful to write a word or two about a recent publication of mine, which makes the following claims:

it presents the first method to measure the similarity of two explicit reaction mechanisms
it shows that this is a useful thing to do, especially in the context of biological reactions

The method used can be applied to any comparison of a sequence of sets. A pairwise similarity matrix of mechanism steps is calculated using Tanimoto coefficients or Euclidean distance (see paper for details), and the mechanisms are aligned and scored using the Needleman-Wunsch algorithm.

The mechanisms of biological reactions are taken from the MACiE database [1] of enzyme reaction mechanisms which has been developed over several years by the Mitchell Group, in collaboration with Prof. Janet Thornton (EBI, UK) and Prof. Peter Murray-Rust (Uni. of Cambridge, UK). Reaction mechanism data is taken directly from the literature, mainly from experimental work although in some cases from theoretical studies. GLH and DEA have put in a lot of hours entering this data, standardising it, developing a dictionary of terms, and so on.

The principal results were that:

the proposed method could identify similar mechanisms, and found some cases of convergent evolution of chemical mechanism
similar EC numbers indicate similar mechanism but only once the subclass is considered (that is, the class on its own is not very indicative of the mechanism)

The most similar previous work was by Latino and Aires-De-Sousa [2] who used a self-organising map to look at classification of enzyme reactions in the KEGG database. Here similarity was determined by the values of Gasteiger's descriptors which represent inductive and resonance effects around the reactive centre.

Both of these studies highlight the fact that applying cheminformatic techniques to biological problems can yield some interesting results.

[1] G.L. Holliday, G.J. Bartlett, D.E. Almonacid, N.M. O'Boyle, P. Murray-Rust, J.M. Thornton & J.B.O. Mitchell MACiE: a database of enzyme reaction mechanisms Bioinformatics, 2005, 21, 4315-4316. [Open Access]

[2] D.A.R.S. Latino and J. Aires-de-Sousa, Genome-scale classification of metabolic reactions: a cheminformatics approach Angew. Chem. Int. Ed. 2006, 45, 2066–2069.

Tuesday, 3 April 2007

Noncanonical interactions in protein-ligand complexes

Propensities of Polar and Aromatic Amino Acids in Noncanonical Interactions: Nonbonded Contacts Analysis of Protein-Ligand Complexes in Crystal Structures Yumi N. Imai, Yoshihisa Inoue, and Yoshio Yamamoto J. Med. Chem. 2007, 50, 1189-1196

Looking at a protein-ligand complex, it is quite easy to spot any hydrogen bonding interactions. This paper examines so-called ‘noncanonical’ interactions, i.e. pretty much all other non-bonded interactions between a protein and a ligand. The authors have analysed the PDB (or rather the tidied up version of the PDB in Relibase+) looking for these noncanonical interactions. The results are divided up per protein amino acid involved. There is much discussion of pi interactions, as well as interactions where the C-H bond acts as a ‘hydrogen-bond’ donor.

The most interesting results of all concern cases of very rare interactions, some of which they are not sure how/why they occur. Are these artefacts, or are they real? The authors make the point that the resolution of the proteins involved were quite good, but I think it would have been useful to further analyse these cases and to see whether these interactions are ‘forced’ by the presence of other favourable interactions, or whether that area of the protein was quite flexible. It would also have been nice to get some idea of how common the various interactions are with respect to each other; although there are some percentages mentioned in the text the figures are not complete, and there is some confusion over whether a particular percentage refers to the overall number of interactions, or just the noncanonical interactions.

All in all, a paper that provides useful qualitative information on the sorts of noncanonical interactions that occur in proteins. These data may inspire the development of new terms for docking functions, which typically concentrate on hydrogen bonding and lipophilic interactions.

Paper Cuts