Information and FAQ

Visualisation of Molecules using RasMol

The full functionality of the Proteinase Inhibitor Database requires the use of a graphic program e.g. RasMol to display the enzymes and the inhibitors quickly and easily. RasMol is a free molecular graphics program intended for the visualisation of proteins, nucleic acids and small molecules. The program was written by Dr. Roger Sayle (Glaxo Research and Development, Greenford, UK.).

RasMol reads in molecular co-ordinate files in a number of formats (including PDB format) and interactively displays the molecule on the screen in a variety of colour schemes and representations e.g. spacefilling (CPK) spheres, macromolecular ribbons or wireframe. The displayed molecule may be rotated, translated and zoomed interactively using the mouse.

Protein Structure Group Users:

If you are within the Protein Structure Group, York, the RasMol program is available but it is necessary to configure your personal files so that Netscape will automatically use RasMol.

Edit the .mime.types file to include:
chemical/x-pdbpdb PDB

Edit the .mailcap file to include:
# Rasmol
chemical/x-pdb;rasmol %s

Non-Protein Structure Group Users:

RasMol can be downloaded over the Internet by simply going to the RasMol home page: http://www.umass.edu/microbio/rasmol and following the instructions found there. The program will run on a wide range systems including SGI, MacIntosh and IBM PC.


Web Page Organisation

The web pages were set up to reflect the organisation of proteinases into gene families within catalytic classes: e.g. Serine; subtilisin-like.

Each gene family page has a quick reference diagram that describes the main chain hydrogen bonding observed for the overlapped inhibitors in the active site.


Database Construction

The first step in the construction of the database was to search the current Brookhaven Protein Databank (October 1995 release) for proteinase crystal structures of all four catalytic classes. The searching was performed separately for each gene family using the BLAST program for proteins. A target sequence, derived from a representative of each gene family, was used for matching in each search e.g. papain (9pap.pdb) for the papain-like cysteine proteinases. Sequences identified by the search returned a probability value between 1 x 10-18 and 1. A value of 1 x 10-18 signifies a low probability of sequences matching by chance i.e. a genuinely related protein. Values between 1 x 10-4 and 1 have a high probability of the similarity occuring by chance and were thus excluded from the database. In addition, only crystal structures were chosen for the database, NMR structures and homology models were not included.

The results of the search identified PDB files (protein databank file format) that could then be copied into a local memory area for easy access. The structures for each gene family were collated and processed. The processing used simple shell programming, manual editing of the PDB files and a variety of programs (described below).


Overlapping the Crystal Structures

All the protein crystal structures were overlapped using a program called MNYFIT. Overlapping was performed with respect to the conserved active site residues. In each case the program started from a minimum of three residues and then aligned the sequences to determine conserved residues. The first two listed structures were fitted to a common coordinate frame with subsequent structures being added to it.


Extracting the Inhibitor Coordinates
from the Original PDB Files

The inhibitor PDB files for the database could be created by naming all inhibitor atoms as HETATM records in the orginal PDB file and then using the UNIX command grep to extract each line that contained the word HETATM.

For the PDB files of the inhibitor bound in the active site, the program SITE was used. This is part of a suite of programs called JOY. SITE is a program that writes out the neighbouring residues for all the HETATM records in a file. The cutoff distance used was 4.5 Angstroms. Providing that just one atom is within the cutoff distance the whole residue is written out. Care was taken to remove extraneous HETATM records e.g. calcium ions, so that the only the active site residues were written out.


James Bray Page included on 7/10/96