| Scoring Functions
The rDock master scoring function is a weighted sum of intermolecular, ligand intramolecular, site intramolecular, and external restraint terms (Equation 1). The main changes to the original RiboDock scoring function are: i) the replacement of the crude steric potentials (Slipo and Srep) with a true van der Waals potential, Svdw, ii) the introduction of two generalised terms for all short range attractive (Spolar) and repulsive (Srepul) polar interactions, the implementation of a fast weighed solvent accessible surface area (WSAS) solvation term, and the addition of explicit dihedral potentials.
van der Waals potential
We have replaced Slipo and Srep with a true vdW potential similar to that used by GOLD . Atom types and vdW radii were taken from the Tripos 5.2 force field. Energy well depths are switchable between the original Tripos 5.2 values and those used by GOLD (calculated from the atomic polarisability and ionisation potentials of the atom types involved). Additional atom types were created for carbons with implicit hydrogens, as the Tripos force field uses an all-atom representation. vdW radii for implicit hydrogen types are increased by 0.1A for each implicit hydrogen, with well depths unchanged. The functional form is switchable between a softer 4-8 and a harder 6-12 potential. A quadratic potential is used at short range to prevent excessive energy penalties for atomic clashes. The potential is truncated at longer range (1.5 x sum of vdW radii).
Empirical attractive and repulsive polar potentials
We use an empirical Bohm-like potential to score hydrogen-bonding and other short-range polar interactions. The original RiboDock polar terms (SH-bond, SposC-acc, Sacc-acc, Sdon-don) are generalised and condensed into two scoring functions, Spolar and Srepul (Equation 2), which deal with attractive and repulsive interactions respectively. Six types of polar interaction centres are considered: hydrogen bond donors (DON), metal ions (M+), positively charged carbons (C+, as found at the centre of guanidinium, amidinium and imidazole groups), hydrogen bond acceptors with pronounced lone pair directionality (ACC_LP), acceptors with in-plane preference but limited lone-pair directionality (ACC_PLANE), and all remaining acceptors (ACC). The ACC_LP type is used for carboxylate oxygens and Osp2 atoms in RNA bases, with ACC_PLANE used for other Osp2 acceptors.
Individual interaction scores are the product of simple scaling functions for geometric variables, formal charges and local neighbour density. Metals are assigned a uniform formal charge of +1. C+ is considered to be a "weak" donor in this context and scores are scaled by 50% relative to conventional donors by the assignment of sgn(i)=0.5 in Equation 2. No pseudo-formal charges are assigned to selected RNA base atoms any more. The geometric functions minimally include an interaction distance term, with the majority including angular terms dependent on the type of the interaction centres. Geometric parameters are summarised in Table 1, and the angular functions in Table 2.
The most notable improvements to RiboDock are that attractive (hydrogen bond and metal) interactions with ACC_LP and ACC_PLANE acceptors include terms to enforce the relevant lone pair directionality. These replace the aACC dependence, which is retained for the ACC acceptor type. Note that no distinction between acceptor types is made for attractive interactions with guanidinium carbons, or for repulsive interactions between acceptors. In these circumstances all acceptors are treated as type ACC. Attractive interactions between guanidinium carbons (C+) and all acceptor types, which in RiboDock were described by only a distance function, SposC-acc, now include angular functions around the carbon and acceptor groups. Repulsive interactions between donors, and between acceptors, also have an angular dependence. This allows a stronger weight, and a longer distance range, to be used to penalise disallowed head-to-head interactions without forbidding allowable contacts. One of the issues in RiboDock was that it was not possible to include neutral acceptors in the acceptor-acceptor repulsion term with a simple distance function.
The desolvation potential in rDock combines a weighted solvent accessible surface area approach [WSAS ] with a rapid probabilistic approximation to the calculation of solvent accessible surface areas based on pairwise interatomic distances and radii (Equation 3). The calculation is fast enough therefore to be used in docking. We have redefined the solvation atom types compared to the original work and recalibrated the weights against the same training set of experimental solvation free energies in water (395 molecules). The total number of atom types (50, including 6 specifically for ionic groups and metals) is slightly lower than in original work. Our atom types reflect the fact that rDock uses implicit non-polar hydrogens. The majority of types are a combination of hybridisation state and the number of implicit or explicit hydrogens. The performance of the WSAS model over the training set (Table 3) is satisfactory and is only slightly reduced compared to the original work, which used an analytical calculation of solvent accessible surface areas and explicit non-polar hydrogens.
Ssolv is calculated as the change in solvation energy of the ligand and the docking site upon binding of the ligand. The reference energies are taken from the initial conformations of the ligand and site (as read from file) and not from the current pose under evaluation. This is done to take into account any changes to intramolecular solvation energy. Strictly speaking the intramolecular components should be reported separately under Sintra and Ssite but this is not done for reasons of computational efficiency.
We constructed a combined set of protein-ligand and RNA-ligand complexes for validation of rDock. Molecular data files for the protein-ligand complexes were extracted from the downloaded CCDC/Astex "clean-list" and used without substantive modification. The only change was to convert ligand MOL2 files to MDL SD format using Corina, leaving the coordinates and protonation states intact. Protein MOL2 files were read directly. The ten RNA-ligand NMR structures from the RiboDock validation set were supplemented with five RNA-ligand crystal structures (1f1t, 1f27, 1j7t, 1lc4, 1mwl) prepared in a similar way. All 15 RNA-ligand structures have measured binding affinities.
58 complexes (43 protein-ligand and 15 RNA-ligand) were selected for the initial fitting of component scoring function weights. Protein-ligand structures were chosen of any X-ray resolution that had readily available experimental binding affinities . The list of PDB codes and binding affinities are provided as supplementary information. 102 complexes were used for the main validation of native docking accuracy, consisting of 87 of the 92 entries in the high-resolution (R<2Å) "clean-list" (covalently bound ligands removed - 1aec, 1b59, 1tpp, 1vgc, 4est), and the 15 RNA-ligand complexes. 21 of the protein-ligand entries, and all of the RNA-ligand entries, are members of both validation sets. The performance of the most promising scoring functions were checked against the entire non-covalently bound CCDC/Astex "clean-list" (213 out of the original 224 entries).
Intermolecular scoring functions under evaluation
Component weights (W) for each term in the intermolecular scoring function (Sinter) were obtained by least squares regression of the component scores to Gbind values for the binding affinity validation set described above (58 entries). Each ligand was subjected first to simplex minimisation in the docking site, starting from the crystallographic pose, to relieve any minor non-bonded clashes with the site. The scoring function used for minimisation was initialised with reasonable manually assigned weights. If the fitted weights deviated significantly from the initial values the procedure was repeated until convergence. Certain weights (Wrepul, Wrot, Wconst) were constrained to have positive values to avoid non-physical, artifactual models. Note that the presence of Wrot and Wconst in Sinter improves the quality of the fit to the binding affinities but does not impact on native ligand docking accuracy.
Ten intermolecular scoring functions were derived with various combinations of terms (Table 4). The same ligand intramolecular weights are used in each case to avoid any differences in ligand conformational energies affecting the docking results. SF0 is a baseline scoring function that has the van der Waals potential only. SF1 adds a simplified polar potential, without f2 (formal charge) and f3 (neighbour density) scaling functions, and with a single acceptor type (ACC) that lacks lone-pair directionality. SF2 has the full polar potential (f2 and f3 scaling functions, ACC, ACC_LP and ACC_PLANE acceptor types) and adds the repulsive polar potential. SF3 has the same functional form as SF2 but with empirical weights in regular use at RiboTargets. SF4 replaces the repulsive polar potential with the WSAS desolvation potential described above. SF5 has the same functional form as SF4 but with empirical weights in regular use at RiboTargets. SF6 combines the repulsive polar and desolvation potentials. SF7 has the same functional form as SF2 and SF3 but with weights for Wvdw and Wpolar taken from SF5. SF8 and SF9 add the aromatic term to SF3 and SF5 respectively.
There is surprisingly little variation in correlation coefficient (R) and root mean square error (RMSE) in predicted binding energy over the ten scoring functions (Table 4). The best results are obtained with SF4 (R=0.67, RMSE=9.6 kJ/mol).