Copyright © 2006
Vernalis
Revision History |
||
Revision 2006.2 release |
10/04/2007 |
Rod Hubbard |
Revision 2006.1 release |
20/09/2006 |
David Morley |
Revision Initial draft |
16/03/2006 |
David Morley |
David Morley - Enspiral Discovery Ltd.
-
d.morley@enspiral-discovery.com
University
of
release – rdock@ysbl.york.ac.uk
Table of Contents
Section A –
Using the distribution as is
1. Installing the rDock release
Section B – Building a new version of rDock
rDock is a high-throughput
molecular docking platform for protein and RNA targets. Under development at
Vernalis (formerly RiboTargets) since 1998, the software (formerly known as
RiboDock[1]),
scoring functions, and search protocols have been refined continuously to meet
the demands of in-house Structure-Based Drug Discovery (SBDD) projects. Since September 2006, the software has been
developed and distributed through an academic network maintained at the
The major components of the
platform include fast intermolecular scoring functions (van der Waals, polar,
desolvation) validated against protein and RNA targets, a Genetic
Algorithm-based stochastic search engine, a wide variety of external
SBDD-derived restraint terms (tethered template, pharmacophore, noe distance
restraints), and novel Genetic Programming-based post-docking filtering. A
variety of scripts are available to perform automated validation experiments and
to launch virtual screening campaigns.
This introductory guide is aimed
at new users of rDock.
The guide is in two parts:
Section A describes how to
install from the distribution tape and run some test calculations with the
pre-compiled version of the program.
This will be all most users will need to get started.
Section B contains information
about compiling and running the program, with additional information about test
suites.
Once you have installed a
version, you should look at the Reference Guide for more detailed documentation
on the usage of rDock. In addition, the
rDock web site (http://www.ysbl.york.ac.uk/rDock)
has more information, test suites and ancillary programs available.
The standard
distribution includes a pre-compiled and linked version of rDock that should
run on most Linux operating systems. The
version was compiled with no shared object libraries on Fedora Core 3 version
of Linux and has been tested on Fedora Core 4, RHEL 4 and Suse 10. If you try the program on other operating
systems, please email rDock@ysbl.york.ac.uk
and let us know how you got on.
1.
Create
a new directory for building rDock.
$ mkdir ~/rDock2006.2
2.
Copy
or download the distribution files to $RBT_ROOT.
$ cp ~/mydownloads/rDock_2006.2_src.tgz ~/rDock2006.2
3.
Extract the distributions.
$ cd
~/rDock2006.2
$ tar
xvzf rDock_2006.2_src.tgz
4.
Edit
the setup_rDock script
The directory you created is referred to
as $RBT_ROOT[2] in the
subsequent steps.
Edit the file ~/rDock2006.2/setup_rDock to
define the RBT_ROOT environment variable.
If you are using rDock frequently, you may want to source this setup
file in your .cshrc file.
An example set of files are provided in
the directory $RBT_ROOT/testing.
Move to this directory:
$ cd
$RBT_ROOT/testing
Run the run_rDock_test script:
$ source run_rDock_test
This executes a sequence of commands. In brief:
The program rDcavity takes the description
of the target binding site from 5abp.prm and generates a cavity (5abp.as) used in
the docking calculations. In this case,
the docking cavity is defined by the crystallographic position of the ligand in
5abp_c.sd
The program rDock performs 10 docking
runs, trying to fit the ligand 5abp_l.sd into the ligand cavity. The attempts are output to 5abp_test.sd
The program rDrms computes the rmsd
between the 5abp_c.sd ligand and the different poses in 5abp_test.sd and puts
the result into 5abp_rms.sd
The perl script sdsort, sorts 5abp_rms.sd
so that the first pose has the best score for docking to the receptor.
You can use visualisation software (such
as the free DS Visualizer from Accelrys) to look at the various protein and
ligand files and demonstrate for yourself that the docking has worked.
This describes the minimal set of steps
required to build rDock from the source code distribution, and to run one of
the automated validation experiments provided in the test suite distribution.
The instructions assume that you are comfortable with simple Linux command line
administration tasks, and with building Linux applications from make files.
Compilers. rDock is supplied as source code, which means that
you will have to compile the binary files (run-time libraries and executable programs)
before you can use them. rDock has been developed largely on the Linux
operating system, most recently with the GNU g++ compiler under SuSE 9.2 Pro.
The code will almost certainly compile and run under other Linux distributions
with little or no modification, but no other distributions or compilers have
been tested extensively to date.
Condor. Although the rDock executables can be run directly
from the command line, the validation experiment scripts use the Condor High Throughput Computing software[3]
for automating the process of distributing and managing individual docking jobs
across a cluster of compute machines. Condor is available free of charge for a
wide variety of platforms. Installing and configuring Condor is beyond the
scope of this guide as the Condor configuration is very site-dependent.
For full production use, you would
typically compile rDock on a separate build machine and run the docking
calculations on a cluster of compute machines. However, for the purposes of
getting started, these instructions assume that you will be compiling rDock and
running the initial validation experiments on the same machine, and that you
have installed and configured a Personal Condor to manage the jobs on
that machine.
Required packages. Make sure you have the following packages
installed on your machine before you continue. The versions listed are
appropriate for SuSE 9.2 Pro; other versions may be required for your
particular Linux distribution.
Table 1.1. Required
packages for building and running rDock
Package |
Description |
Required at |
Version |
gcc |
GNU C compiler |
Compile-time |
3.3.4 |
g++ |
GNU C++ compiler |
Compile-time |
3.3.4 |
popt-devel |
C++ command-line argument processing |
Compile-time |
1.7-190 |
cppunit |
C++ unit test framework |
Compile-time |
1.10.2 |
popt |
C++ command-line argument processing |
Run-time |
1.7-190 |
Condor |
Distributed resource management |
Run-time |
6.6.10 |
The rDock source files and test suite
files are provided as independent gzipped tar (.tgz) distributions. Depending
on your requirements, the two distributions can be unpacked to entirely
separate locations, or can be unpacked under the same location. In this example
they are unpacked under the same location.
Table 2.1. rDock
distribution files
File |
Description |
rDock_[CODELINE]_[BUILDNUM]_src.tgz |
rDock source distribution |
RBT_TEST_[DATE].tgz |
Test suite data files and scripts |
where [CODELINE], [BUILDNUM], and [DATE]
will vary depending on the release. [CODELINE] represents the major version
string (for example, 2006.1
), [BUILDNUM] represents the minor version number (for example, 865
) and [DATE] represents a date string (for
example, 20060313
).
Procedure 2.1. Example
unpacking procedure
5.
Create
a new directory for building rDock.
$ mkdir ~/dev
The directory you created is referred to
as [BUILDDIR] in the subsequents steps.
6.
Copy
or download the distribution files to [BUILDDIR].
7. $ cp ~/mydownloads/rDock_2006.1_865_src.tgz ~/dev/
$ cp ~/mydownloads/RBT_TEST_20060313.tgz ~/dev/
8.
Extract
the distributions.
9. $ cd ~/dev/
10. $ tar xvzf rDock_2006.1_865_src.tgz
$ tar xvzf RBT_TEST_20060313.tgz
The distributions contain files with relative path
names, and you should find the following subdirectories created under
[BUILDDIR]. Note that the ./rDock/2006.1
subdirectory may have a different name depending
on the major version string (see above).
$ find . -type d
.
./rDock
./rDock/2006.1
./rDock/2006.1/fw
./rDock/2006.1/bin
./rDock/2006.1/lib
./rDock/2006.1/src
./rDock/2006.1/src/GP
./rDock/2006.1/src/exe
./rDock/2006.1/src/lib
./rDock/2006.1/src/daylight
./rDock/2006.1/data
./rDock/2006.1/data/sf
./rDock/2006.1/data/pmf
./rDock/2006.1/data/pmf/smoothed
./rDock/2006.1/data/scripts
./rDock/2006.1/data/filters
./rDock/2006.1/docs
./rDock/2006.1/docs/images
./rDock/2006.1/build
./rDock/2006.1/build/test
./rDock/2006.1/build/test/RBT_HOME
./rDock/2006.1/build/tmakelib
./rDock/2006.1/build/tmakelib/unix
./rDock/2006.1/build/tmakelib/linux-pathCC-64
./rDock/2006.1/build/tmakelib/linux-g++-64
./rDock/2006.1/build/tmakelib/linux-g++
./rDock/2006.1/import
./rDock/2006.1/import/nr
./rDock/2006.1/import/nr/src
./rDock/2006.1/import/nr/include
./rDock/2006.1/import/simplex
./rDock/2006.1/import/simplex/src
./rDock/2006.1/import/simplex/include
./rDock/2006.1/include
./rDock/2006.1/include/GP
./RBT_TEST
./RBT_TEST/na
./RBT_TEST/bin
./RBT_TEST/ccdc_astex
11.
Make
a note of the following locations for later use.
The test suite root directory is [BUILDDIR]/RBT_TEST/
and will be referred to as [RBT_TEST]
in later instructions. In this example, [RBT_TEST]
is ~/dev/RBT_TEST/
.
The rDock root directory is [BUILDDIR]/rDock/[CODELINE]
and will be referred to as [RBT_ROOT]
in later instructions. In this example, [RBT_ROOT]
is ~/dev/rDock/2006.1/
.
rDock
is written in C++ (with a small amount of C code from Numerical Recipes) and
makes heavy use of the C++ Standard Template Library (STL). The majority of the
source code is compiled into a single shared library (libRbt.so
). The executable programs themselves are
relatively light-weight command-line applications linked with libRbt.so
.
The
tmake build system (from Trolltech) is used to
generate makefiles automatically for a particular build target (i.e.
combination of operating system and compiler). The source distribution comes
with tmake templates defining the compiler
options and flags for three Linux build targets (linux-g++
, linux-g++-64
, and linux-pathCC-64
). The build targets have been tested
under SuSE 9.2 (2.6.8-24.18 kernel) with GNU g++ 3.3.4 and PathScale pathCC
2.1.
Table 3.1. Standard tmake
build targets provided
Target name |
Architecture |
Compiler |
Compiler flags
(release build) |
|
32-bit Intel |
|
|
|
64-bit AMD Opteron |
|
|
|
64-bit AMD Opteron |
|
|
Customising the tmake template for a build target. If none of the tmake templates are suitable
for your machine, or if you wish to customise the compiler options, you should
first customise one of the existing templates. The tmake template files are
stored under [RBT_ROOT]/build/tmakelib/
. Locate and edit the tmake.conf
file for the build target you wish to customise. For example, to customise
the linux-g++
build target, edit [RBT_ROOT]/build/tmakelib/linux-g++/tmake.conf
and localise the values to suit your
compiler.
Procedure 3.1. rDock build
procedure
To
build rDock, first go to the [RBT_ROOT]/build/
directory.
$ cd [RBT_ROOT]/build
1.
Compile
Make one of the build targets listed below.
$ make linux-g++
$ make linux-g++-64
$ make linux-pathCC-64
2.
Test
Run the rDock unit tests to check build integrity.
If no failed tests are reported you should be all set.
$ make test
3.
Install
(optional)
You can either run rDock directly from the build
location (only recommended for initial testing), or install the binaries and
data files to a new location (recommended for production use). To install in a new
location [INSTALLDIR], create a binary distribution file, copy the distribution
file to [INSTALLDIR], and extract the files.
$ make dist
$ mkdir -p [INSTALLDIR]
$ cp rDock_[CODELINE].tgz [INSTALLDIR]
$ cd [INSTALLDIR]
$ tar xvzf rDock_[CODELINE].tgz
4.
Cleanup
(optional)
To remove all intermediate build files from [RBT_ROOT]/build/
, leaving just the final executables (in [RBT_ROOT]/bin/
) and shared libraries (in [RBT_ROOT]/lib/
):
$ make clean
To remove the final executables and shared
libraries as well, returning to a source-only distribution:
$ make distclean
Now
that you have successfully built the rDock executables you should change your
working directory away from the installed locations of [RBT_TEST] and [RBT_ROOT],
for example to a personal project directory. It is not recommended to run
validation experiments directly within the [RBT_TEST] and [RBT_ROOT] directory
hierarchy.
The cpu_benchmark experiment. The cpu_benchmark
experiment consists of nine protein-ligand
complexes and two RNA-ligand complexes. It represents a small subset of the
full rDock docking accuracy validation set and is presented here as the
calculations should be short enough to run on a single workstation in a few
hours. The full experiments require access to a cluster of machines to run in a
tractable length of time. All ligands are non-covalently bound and were chosen
to have a range of rotatable bond counts (from 0 to 8)
Table 4.1. Complexes included in
the cpu_benchmark
experiment
Type |
PDB codes |
File source |
Protein-ligand |
|
Files used intact from CCDC/Astex "clean" high-resolution test set |
RNA-ligand |
|
Prepared by Vernalis |
Configuring the environment. You need to define two environment variables to
reflect the location of the test suite [RBT_TEST] and the rDock build you wish
to test [RBT_ROOT]. Either customise the example setup_validation
script (bash
shell) provided in the [RBT_TEST] directory, and source
the file, or define the environment
variables from the command line. You should also add the [RBT_TEST]/bin and
[RBT_ROOT]/bin directories to your path, and add [RBT_ROOT[/lib to your library
path. For example, using the directory names used in the example unpacking
procedure:
$ export RBT_TEST=~/dev/RBT_TEST
$ export RBT_ROOT=~/dev/rDock/2006.1
$ export PATH=${RBT_ROOT}/bin:${RBT_TEST}/bin:${PATH}
$ export LD_LIBRARY_PATH=${RBT_ROOT}/lib:${LD_LIBRARY_PATH}
Preparing the docking sites. rDock requires a docking site file (.as suffix) to
be defined for each receptor. The .as file (also known as an active site file)
defines a volume which represents the region of the receptor into which each
ligand should be docked. This process is known as cavity mapping and
needs be done only once for each receptor prior to running the experiments. The
cavity mapping process is quite quick (from a few seconds to a few minutes per
complex). The active site files (.as) themselves and associated rDock output log
files are generated "in situ" in $RBT_TEST/ccdc_astex/
and $RBT_TEST/na/
. Make sure you have write-access to these
directories. The only files you will see in your current working directory are
the Condor .cmd and .log files. Wait until all Condor jobs have completed
before moving to the next step.
$ make_cavities cpu_benchmark
$ condor_submit cav_cpu_benchmark.cmd
Running the experiment.
$ run_cpu_benchmark [TESTDIR]
where
[TESTDIR]
is the name of the subdirectory that will
be created under your present working directory for the experiment output
files. This command creates two Condor command files and automatically submits
the jobs:
|
|
As
a guide, the jobs require around 100 minutes total CPU time on an dual-processor
AMD Opteron 248 with 2GB RAM, for an average of around three seconds per
individual docking run.
Output files. After
all Condor jobs have completed you should find the following output files for
each complex in [TESTDIR]/SF3_100/
and [TESTDIR]/SF5_100
.
|
|
|
Each
.sd
file is accompanied by associated .out
and .err
files which contain the standard output (rDock
output) and standard error for each calculation.
Table 4.2. rDock score
components output as SD data fields
Data field |
Description |
SCORE |
Total docking score = SCORE.INTER + SCORE.INTRA + SCORE.SYSTEM + SCORE.RESTR |
SCORE.INTER |
Total receptor-ligand intermolecular score |
SCORE.INTRA |
Total ligand intramolecular score |
SCORE.SYSTEM |
Total intra-receptor, intra-solvent, and receptor-solvent score |
SCORE.RESTR |
Total external restraint penalty score |
SCORE.INTER/INTRA/SYSTEM are weighted sums of the individual scoring function terms listed below. Note that the individual scoring function terms are output as raw values, and must be multiplied by their respective weights (not shown here) in order to reconstitute the total scores. |
|
SCORE.*.VDW |
vdW scores (truncated Tripos 6-12) |
SCORE.*.POLAR |
Attractive polar interactions (empirical geometric function of distances/angles) |
SCORE.*.REPUL |
Repulsive polar interactions (SF3 scoring function only) |
SCORE.*.SOLV |
Desolvation score (weighted SASA, SF5 scoring function only) |
SCORE.*.DIHEDRAL |
Dihedral score calculated over rotatable bonds only |
Reporting the scores. The rDock scores for each pose are embedded within each SD data record as
listed in Table 4.2,
“rDock score components output as SD data fields”. You can use
the sdreport
script to output the tabulated scores for
each pose. For example:
$ sdreport -t x_1koc.sd
will
output the top-level score components (SCORE, SCORE.INTER, SCORE.INTRA,
SCORE.RESTR - also SCORE.INTER.VDW) in tab-delimited format. Alternatively you
can explicitly list the data fields you wish to output:
$ sdreport -tSCORE.INTER.POLAR,SCORE.INTER.VDW x_1koc.sd
Calculating the docking accuracy (RMSD). Use the rmsreport
script to generate the SCORE vs RMSD data
files for further analysis, where RMSD is the Root Mean Squared Deviation
between a docked ligand pose and the crystallographic reference ligand pose
(over non-hydrogens only). Run rmsreport
within each experiment directory, with no
arguments.
$ cd [TESTDIR]/SF3_100
$ rmsreport
$ cd [TESTDIR]/SF5_100
$ rmsreport
rmsreport
generates a [PDB].rmcs
file
and an rms[PDB].sd
file
for each complex. rms[PDB].sd
is a sorted, filtered copy of x_[PDB].sd with the RMSD value added as an
additional data field. [PDB].rmcs
is a tabulated text file with six columns where:
|
|
|
|
|
|
Poses
that have an excessive cavity penalty (SCORE.RESTR.CAVITY > 1) are removed
from the RMSD analysis by rmsreport. A cavity penalty indicates that the ligand
pose is not entirely within the defined docking volume, and hence the reported
scores can not be trusted.
Example results. View the contents of each .rmcs file. For example, to look at the top three
poses for each complex:
$ head -3 *.rmcs
You
should find that most of the top-scoring poses have an RMSD less than 2A.
Sample results for the SF3 scoring function are shown below, although as rDock
uses a stochastic search your results may differ.
== 1byj.rmcs ==
1 -40.955 -33.472 -7.482 2.855 0.000
2 -40.063 -38.091 -1.972 1.457 0.000
3 -38.215 -35.048 -3.168 7.630 0.000
== 1c1e.rmcs ==
1 -27.945 -23.581 0.000 0.265 0.000
2 -27.945 -23.582 0.000 0.264 0.000
3 -27.943 -23.578 0.000 0.263 0.000
== 1cil.rmcs ==
1 -32.052 -28.150 0.517 0.736 0.000
2 -31.312 -27.746 0.844 0.847 0.000
3 -31.116 -26.617 -0.077 1.293 0.000
== 1d3h.rmcs ==
1 -33.697 -22.345 -1.669 2.081 0.000
2 -33.691 -22.268 -1.752 2.059 0.000
3 -33.684 -22.191 -1.801 2.055 0.000
== 1koc.rmcs ==
1 -36.951 -37.164 -0.751 2.549 0.000
2 -36.562 -37.267 -0.341 2.452 0.000
3 -36.538 -37.317 -0.339 2.393 0.000
== 1lna.rmcs ==
1 -23.888 -24.112 0.224 1.255 0.000
2 -23.765 -23.956 0.191 1.114 0.000
3 -23.720 -23.496 -0.224 1.142 0.000
== 1mld.rmcs ==
1 -28.396 -28.905 0.957 1.697 0.000
2 -28.215 -28.151 0.410 1.246 0.000
3 -28.124 -28.232 0.436 1.508 0.000
== 1mts.rmcs ==
1 -38.309 -34.850 -0.727 1.313 0.000
2 -38.162 -35.351 -0.395 1.389 0.000
3 -37.852 -34.274 -0.869 1.157 0.000
== 1wap.rmcs ==
1 -42.289 -34.939 0.040 0.494 0.000
2 -41.818 -34.907 0.206 0.459 0.000
3 -41.769 -35.130 0.209 0.939 0.000
== 5abp.rmcs ==
1 -43.287 -38.021 0.370 0.886 0.000
2 -42.811 -37.664 0.367 0.893 0.000
3 -42.684 -37.426 0.361 0.907 0.000
== 6rnt.rmcs ==
1 -29.970 -26.189 -4.422 1.760 0.000
2 -29.368 -24.156 -5.368 1.356 0.000
3 -29.233 -24.059 -5.349 1.406 0.000
[1] Validation of an empirical RNA-ligand scoring function for fast flexible docking using RiboDock, SD Morley and M Afshar, J. Comput.-Aided Mol. Des., 18 (2004) 189-208.
[2]
You will see many references
to Rbt in the documentation, but particularly in the source code. This is because the program was initially
written within the company RiboTargets.
[3]
Condor is available from the http://www.cs.wisc.edu/condor
)