Authors: A.A.Vagin, J.Richelle, S.J.Wodak. email: alexei@ysbl.york.ac.uk A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of procedure for evaluating the quality of macromolecular stracture-factor data and their agreement with atomic model. Acta Cryst.(1999). D55, 191-205
Copy file sfcheck.tar.gz
and uncompress it (`gunzip sfcheck.tar.gz')
After untaring `sfcheck.tar' (command: tar xvf sfcheck.tar) you will get a sfcheck directory, with src, doc and bin subdirectory. To build the executable, go to src
Also you can download binaries (executable files):
( all with memory allocation option)
sfcheck_linux.gz
sfcheck_mac.gz
sfcheck_macintel.gz
sfcheck for windows
You can use this version as previous one: 1. by command (batch) file 2. interactively 3. by ccp4i New style to use: You can use program by command string with options (without any keywords): sfcheck -f file_sf_mtz_or_cif_or_map -m model_pdb_or_cif -out out -nomit Nomit -mem Nm -na Na -scl map_scale_factor -map -invert -origin -h -r -po path_out -ps path_scrath -lf label_F -lsf label_sigF -li label_I -lsi label_sigI -lfree label_free_flag -lp label_phases h = help and information about mtz labels r = rest some special files(.dst,...) out = y - see nomit option a - program creates CIFile (sfcheck.hkl) with anisothermal corrected Fobs u - CIFile with detwinned data map = extract density map will be created (sfcheck_ext.map) or new map if input was map (sfcheck.map). Useful to prepare mirror or/and scaled map invert = mirror map will be used origin = set origin of map 0 0 0 nomit = number of cycles of omit procedure. 2 is a good choice.It takes time if OUT = Y, program creates CIFile (sfcheck.hkl) with omit phases Nm = memory request in Mb (for f90 only) Na = maximal number of atoms in the model label_* = labels for mtz_file For example: sfcheck -f file.mtz or sfcheck -m file.pdb or sfcheck -f file.mtz -m file.pdb or sfcheck -f file.mtz -m file.pdb -nomit 2 -map -out y or sfcheck -f file.mtz -lf FP -lsf SIGFP or sfcheck -f file.mtz -h
1.Crystal: cell parameters and space group 2.Model: number of atoms number of water molecules solvent content <B> for model Matthews coefficient and corresponding solvent % reported resolution reported R-factor 3.Refinement: refinement program resolution range for refinement reported sigma cut-off for refinement reported R-factor reported Rfree 4.Structure factors: number of reflections number of reflections with I > sigma number of reflections with I > 3sigma resolution range completeness R-standard (sum(sigma)/sum(F)) Wilson plot (amplitudes vs. resolution) overall B-factor by Patterson origin peak and by Wilson plot optical resolution expected minimal error in coordinates Anisotropic distribution of Structure Factors -ratio of Eigen values 5.Model vs. structure factors: R-factor Correlation coefficient R-factor for reported resolution range and sigma cut-off Rfree Luzzati plot (R-factor vs. resolution) coordinate error from Luzzati plot expected maximal error in coordinates DPI Patterson scaleing - scale , Badd Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23 Solvent correction - Ks,Bs Optical resolution Optical resolution is defined as an expected minimum distance between two resolved peaks in the electron density map. With a single-Gaussian approximation of the shape of atomic peak the minimum distance between two resolved peaks is twice the standard deviation "sigma" or the width of atomic peak W (W = 2 sigma). Expected width of atomic peak W is computed as W = sqrt ( 2 (sigma_patt^{2} + sigma_res^{2}) ) where sigma_patt - standard deviation of the Gaussian corresponded to the Patterson origin peak. sigma_res - standard deviation of the Gaussian corresponded to the origin peak of spherical interference function which is Fourier transform of the sphere in the reciprocal space with radius 1/d_min. sigma_res = 0.356 d_min. d_min is minimum d-spacing, "nominal resolution". The "expected optical resolution for complete data set" is calculated as above but using all reflections, with values for missing reflection being the average value in the corresponding resolution shell. Plot of Optical resolution for an atom with B=0 demonstrates behavior of the part of Optical resolution corresponded on the series termination. (for the proof see Appendix) Patterson scaling Scaling in SFCHECK is based on the Patterson origin peak which is approximated as a gaussian. Compared to the conventional scaling by the Wilson plot, this method is particularly advantageous when only low resolution data are available. The program gives overall B-factors estimated by both methods. Low resolution cut-off Disordered solvent contributes to diffraction at low resolution. However, removing of low resolution data from calculations results in a series termination effect which is noticeable in the electron density at the surface of the molecule. To reduce the influence of low resolution terms, SFCHECK applies the "soft" low resolution cut-off to structure factors according to the formula: Fnew = Fold (1-exp(-Boff*s^{2})) , where Boff = 2dmax^{2} Program uses Boff = 256 Scaling Program scales Fobs and Fcalc by the Patterson origin peak using all data applying Boff. First, computes Boveralls for observed and calculed amplitudes. Second, makes the width of the calculated peak equal to the observed, i.e. computes an additional termal factor Badd: Badd = Boverall_obs - Boverall_calc Third, computes the scale factor for Fcalc: sum(Fobs^{2}*(1-exp(-Boff*s^{2}))) scale = sqrt ( --------------------------------------------- ) sum(Fcalc^{2}*exp(-Badd*s^{2})*(1-exp(-Boff*s^{2}))) Finally we have: Fcalc_scaled = Fcalc * scale * exp(-Badd*s^{2}) The program computes R-factor and Correlation coefficient for all data applying the soft low resolution cut-off as described above. The program also computes R-factor and Correlation coefficient for the reported resolution range and reported sigma cut-off without applying Boff. If the Fobs file contains reflections marked with the Rfree flag, the program computes Rfree. Completeness Missing data are restored by using the average values of intensities for the corresponding resolution shell. The program produces a plot of completeness vs. resolution and a plot of the average radial completeness in polar coordinates theta and phi. Expected minimal error The minimal coordinate error is estimated using experimental sigmas(F). The standard deviation of atomic coordinates is sig_min(r) = sqrt(3)*sigma(slope)/curvature where sigma(slope) is a slope of electron density in the x direction ( along A). curvature is an average curvature of the electron density in the atomic peak center. and computed as: sigma(slope) = (2pi*sqrt(sum(h^{2}*(sigF)^{2})))/(VOL*A) VOL - volume of cell A - cell parameter h - Miller index summation over all reflections ( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.) curvature = (2pi^{2}*sum(h^{2}*F))/(VOL*A^{2}) ( Murshudov et al., (1997) Acta.Cryst D532, 240.) If there is no experimental sigma for observaed data, the program uses sigma = Fobs * 0.04 for all reflections. Expected maximal error Expected maximal error in coordinates is estimated by the difference between !Fobs! and !Fcalc!: sig_max(r) = sqrt(3)*sigma(slope)/curvature sigma(slope) = (2pi*sqrt(sum(h^{2}*(Fobs-Fcalc)^{2})))/(VOL*A) curvature = (2pi^{2}*sum(h^{2}*F))/(VOL*A^{2}) For missing reflections the program uses the average value of sigma(Fobs) for the corresponding resolution shell instead of (Fobs-Fcalc). DPI - diffraction-data precision indicator The Cruickshank's method of estimation of coordinate error. ( the Refinement of Macromolecular structure Proceeding of CCP4 Study weekend. pp11-22 1996) sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact where C - fractional completeness. Rfact - convential crystallographic R-factor Nobs - number of reflections Dmin - maximal resolution If Rfree flags are specified, the program uses the Murshudov's approach to calculate DPI: (Newsletter on protein crystallography., Daresbury Laboratory, (1997) 33, pp 25-30.) sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree Luzzati plot (R-factor vs. resolution) Program computes the average radial error <delta> in coordinates by Luzzati plot. <delta(r)> = 1.6 sig(x) Solvent content Solvent content is the fraction of the unit cell volume not occupied by the model. The model consists of ALL atoms present in the coordinate file. Residual factor Rmerge sum_i (sum_j |Ij - <I>|) Rmerge(I) = -------------------------- sum_i (sum_j (<I>)) Ij = the intensity of the jth observation of reflection i <I> = the mean of the intensities of all observations of reflection i sum_i is taken over all reflections sum_j is taken over all observations of each reflection
Local error estimation (plotted for each residue, for the backbone and for the side chain): 1. Amplitude of displacement of atoms from electron density 2. Density correlation coefficient 3. Density index 4. B-factor 5. Index of connectivity Displacement Displacement of atoms from electron density is estimated from the difference (Fobs - Fcal) map. The displacement vector is the ratio of the gradient of difference density to the curvature. The amplitude of the displacement vector is an indicator of the positional error. Correlation coefficient The density correlation coefficient is calculated for each residue from atomic densities of (2Fobs-Fcalc) map - "Robs" and the model map (Fcalc) - "Rcalc" : D_corr = <Robs><Rcalc>/sqrt(<Robs^{2}><Rcalc^{2}>) where <Robs> is the mean of "obsereved" densities of atoms of residue (backbone or side chain). <Rcalc> is the mean of "calculateded" densities of atoms of residue. Value of density for some atom from map R(x) is: sum_i ( R(xi) * Ratom(xi - xa) ) Dens = ---------------------------------- sum_i ( Ratom(xi - xa) ) where Ratom(x) is atomic electron density for x-th point of grid. xa - vector of the centre of atom. xi - vector of the i-th point of grid. Sum is taken over all grid points which have distance from the centre of atom less than Radius_limit. For all atoms Radius_limit = 2.5 A. Index of density and index of connectivity The index of connectivity is the product of the (2Fobs-Fcal) electron density values for the backbone atoms N, CA and C, i.e. the geometric mean value for these atoms. Low values of this index indicate breaks in the backbone electron density which may be due to flexibility of the chain or incorrect tracing. The index of density is a similar indicator which is calculated for all atoms of a given residue.
An omit map is a way to reduce the model bias in the electron density calculated with model phases. SFCHECK produces the so called total omit map by an automatic procedure. First, the initial (Fobs, PHImodel) map is divided into N boxes. For each box, the electron density in it is set to zero and new phases are calculated from this modified map. A new map is calculated using these phases and Fobs. This map contains the omit map for the given box which is stored until the procedure is repeated for all boxes. At the end, all the boxes with omit maps compose the total omit map. Phases calculated from the total omit map are combined with the initial phases. The whole procedure may be repeated (keyword NOMIT). Note: it is time consuming! Program can create output file with omit phases (see keyword OUT)
Program can use only one input file of coordinates or structure factors. In this case program gives information derived from input file without local estimation.
Program checks for merohedral twinning. Perfect twinning test: <I^{2}>/<I>^{2} Also (if it's possible) Program will compute Partial Twinning test: H = !I(h1)-I(h2)!/(I(h1)+I(h2)) Alpha (twinning fraction) = 1/2 - <H> If 0.05 <Alpha< 0.45 program can create output file with detwinned data (see keyword OUT)
It is easy to use SFCHECK interactively, but can be used in batch. The best and easiest way to prepare a command file is to run SFCHECK once by dialogue. If a sfcheck.log file was assigned (first request), the program creates a command (batch) file (sfcheck.bat) automatically.
See some command (batch) file examples.
All keywords must be preceded by an underscore (e.g. _DOC). The available keywords are:First keyword always must be defined:
DOC
One or both of these keywords must be defined:
FILE_C
FILE_FOther keywords
NOMIT OUT MAP PATH_SCR TEST SCL INVER
To get started with SFCHECK interactively, you first have to answer this question:
Do you want to have FILE-DOCUMENT sfcheck.log? < N | Y >
_DOC:
Default: <N>
The DOC-file contains the protocol of the running of the program. With the DOC-file, the program creates a command (batch) file: sfcheck.bat.
Also you can use this keyword DOC to redirect output files:
sfcheck.log sfcheck.bat sfchek_XXX.ps sfcheck.hkl sfcheck_ext.map sfcheck.map
to special directory ( _DOC Y>path or _DOC >path). Examples:
_DOC Y>/y/people/alexei/ or _DOC >/y/people/alexei/
Default: < >
Default: < >
When using an MTZ file, MTZ keywords must be used (or program will use default values).
Default: <0>
<nomit> is the number of cycles of omit procedure. 2 is good choice.
Default: <N>
Default: <N>
Default: < >
Default: <1 >
Default: <N>
Default: <N>
The output information is represented in the PostScript file: sfcheck_<identifier>.ps sfcheck_map.ps (if input was map) A simple ASCII version of this file is in: sfcheck.log Also the program can create: a new formatted CIFile of Fobs: sfcheck.hkl (keyword OUT) a file of density around model: sfcheck_ext.map (keyword MAP) /CCP4 format for CCP4 distribution or BLANC format/ a new map if input was map: sfcheck.map Some other files will not be deleted if keyword TEST = Y. These files have internal format of the BLANC program suite (see file README by ftp from anonymous @ftp.ysbl.yorK.ac.uk) and can be used by programs of this suite. sfcheck_fob.dat - BLANC_Fobserved_file sfcheck_ph.dat - BLANC_phases of the model sfcheck_omit_ph.dat - BLANC_omit_phases sfcheck_detwin.dat - BLANC_detwinned_Fobs
You can use keyword PATH_SCR to redirect all scratch files to special directory.Example:
_PATH_SCR /y/people/alexei/
You can use keyword DOC to redirect output files:
sfcheck_<identifier>.ps sfcheck.hkl sfcheck_ext.map sfcheck.map and also (if keyword TEST = Y ) sfcheck_fob.dat sfcheck_ph.dat sfcheck_omit_ph.dat sfcheck_detwin.dat
to special directory.Examples:
_DOC Y>path or _DOC >path
You can have CCP4 version of SFCHECK which can read MTZ file or EM map (format CCP4) and create file with extract density around model or new mirror or/and scaled map (format CCP4). 1. This possibility uses CCP4 libraries. You must make setup CCP4 before. 2. Keywords for reading MTZ file. Next keywords are necessary only for MTZ file F - label of F or F(+)') SIGF - label of sigma F or sigma F(+)') F- - label of F(+)') SIGF- - label of sigma F(-)') FREE - label of Free_flag') I - label of I or I(+)') SIGI - label of sigma I or sigma I(+)') I- - label of I(-)') SIGI- - label of sigma I(-)')
# -------------------------------- sfcheck <<stop # -------------------------------- # _DOC Y # _FILE_C model.pdb _FILE_F fobs.cif # # _END stop
In this case all output files will be in directory: /y/people/alexei/ and all scratch files will be created in directory: /y/people/alexei/work/ # -------------------------------- sfcheck <<stop # -------------------------------- # _DOC >/y/people/alexei/ # _FILE_C model.pdb _FILE_F fobs.cif # # _NOMIT 2 _OUT Y _path_scr /y/people/alexei/work/ _END stop
In this case coordinate file doesn't used. # -------------------------------- sfcheck <<stop # -------------------------------- # _DOC Y # _FILE_C _FILE_F p1.mtz # _F FO _SIGF SDFO _END stop
1. Input PDB_file of coordinates Input PDB_file of coordinates must contain the CRYST1 card with the unit cell and the space group name. Program can use the information from HEADER,SCALE,MTRIX,REMARK cards. 2. Input formatted file of structure factors This file of structure factors must be in PDB-format or CIFile which contains indices and structure factors or intensities. (also simple formatted file with "h,k,l,!F!,sig(F)" or "h,k,l,!F!" and without titles is acceptable) The best is CIFile. A. Example of a CIfile of structure factor amplitudes: data_structure_9ins _entry.id 9ins _struct.title ' insuline 9ins' _cell.length_a 100.000 _cell_length_b 100.000 _cell.length_c 100.000 _cell.angle_alpha 90.000 _cell.angle_beta 90.000 _cell.angle_gamma 90.000 _symmetry.space_group_name_H-M 'P 1 21 1' loop_ _refln.index_h _refln.index_k _refln.index_l _refln.F_meas_au _refln.F_meas_au_sigma 2 3 4 12.3 1.2 -2 -3 -4 11.4 1.1 . . . . . . . . . . . . . C or just: data_structure_9ins loop_ _refln.index_h _refln.index_k _refln.index_l _refln.F_meas_au _refln.F_meas_au_sigma 2 3 4 12.3 1.2 -2 -3 -4 11.4 1.1 . . . . . . . . . . . . . For intensities use: _refln.intensity_meas _refln.intensity_sigma B. Example of a PDB file of structure factor amplitudes: HEADER R2SARSF 15-JAN-91 COMPND RIBONUCLEASE SA (E.C.3.1.4.8) COMPLEX WITH 3'-*GUANYLIC ACID SOURCE (STREPTOMYCES $AUREOFACIENS) AUTHOR J.SEVCIK,E.J.DODSON,G.G.DODSON CRYST1 64.900 78.320 38.790 90.00 90.00 90.00 P 21 21 21 8 CONTNT H,K,L,S,FOBS,SIGMA(FOBS) FORMAT (2(I3,2I4,2F7.0,F6.0,9X)) COORDS 2SAR REMARK 1 TWO REFLECTIONS PER RECORD. REMARK 2 DMIN=1.85, DMAX=16.28 CHKSUM 1 MIN H=0,MAX H=34,MIN K=0,MAX K=41,MIN L=0,MAX L=20 CHKSUM 2 TOTAL NUMBER OF REFLECTIONS=17346 CHKSUM 3 TOTAL NUMBER OF REFLECTION RECORDS=8673 CHKSUM 4 SUM OF FOBS=0.235499E+07 0 0 3 60 9 16 0 0 4 106 307 25 0 0 5 166 23 20 0 0 6 239 657 52 0 0 7 326 0 38 0 0 8 425 511 40 . . . . . . . . . . . . . . . . . . . . . . C. Example of a simple formatted file of structure factor amplitudes which is assumed to contain H,K,L,F,sig(F): 2 3 4 12.3 1.2 -2 -3 -4 11.4 1.1 . . . . . . . . . . . . . or without sig(F): 2 3 4 12.3 -2 -3 -4 11.4 . . . . . . . . . The length of file records must not exceed 80 characters. The format of the records is free, e.g. data must be separated by blancs. ( be careful - some PDB files do not satisfy this rule) The program uses the information about cell parameters and space group from the coordinate file and ignores such information in the structure factor file.
Memory control parameters ( in main_sfcheck_ccp4.f ): C MEMORY - memory for densities, gradients, coordinates, ... PARAMETER ( MEMORY=5000000) REAL POOL(MEMORY) C NCRDMAX - maximal number of coordinates PARAMETER ( NCRDMAX=200000) C IPRSYM - maximal number of symmetry operators PARAMETER ( IPRSYM=96 ) INTEGER*2 ISYM(5,3,IPRSYM) C ISYM - integer*2 array for cryst.symmetry operators IPRSYM - dimension of integer*2 array ISYM(5,3,IPRSYM) maximal number of cryst.symmetry operators. C MEMORY - dimension array POOL. C MEMORY = MAPMAX + (NCRDMAX/2)*5 , where MAPMAX - maximal size of XY-section (NX*NY)
Estimation of the width of atomic peak by the Patterson origin peak. Fourier transform of atomic Gaussian: 1 --------------- exp( -r^{2}/(2 sigma_four^{2}) ) (2pi sigma_four)^{2/3} where sigma_four is standard deviation of Gaussian. is also Gaussian: B s^{2} exp( - ----- ) where B = 8pi^{2} sigma_four^{2} 4 Patterson function which calculated as Fourier transform of reciprocal space Gaussian in square: 2 B s^{2} exp( - ------- ) 4 is also Gaussian with standard deviation (for infinite fourie series) 2B sigma_patt_0^{2} = ---- = 2 sigma_four^{2} 8pi^{2} Effect of series termination of Fourier transform can be considered as the product in the reciprocal space infinite number of Fourier coefficients and the sphere with radius 1/d_min, where d_min is minimum d-spacing. The product in the reciprocal space corresponds to the convolution in the Patterson space. Fourie image of sphere is the sherical interference function T(r) (Int.Tables,1993,vol B,p247): 3 ( sin(x) - x cos(x) ) T(r) = ------------------------- where x = 2pi r (1/d_min) x^{3} Using Taylor's expansion the origin peak of function T(r) can be approximated by Gaussian: r^{2} exp( - ------------- ) 2 sigma_res^{2} where sigma_res is standard deviation of Gaussian. sigma_res = ( d_min *sqrt(5) )/ 2pi = 0.356 * d_min This result is identical to the optical definition of resolution (Blandell,1976), (James,1948) as twice the distance from maximim to the first zero of image of a point source. In 3-dimentional case the coordinate of the first zero is 0.715 d_min ~ 2 sigma_res. Standard deviation 'sigma' of Gaussian which is product of two Gaussians with standard deviations sigma_1 and sigma_2 is sigma^{2} = sigma_1^{2} + sigma_2^{2} Therefore the standard deviation of Patterson origin peak with finite Fourier series is sigma_patt^{2} = sigma_patt_0^{2} + sigma_res^{2} Standard deviation of expected atomic peak for finite Fourier series is sigma_four^{2} = sigma_patt_0^{2}/2 + sigma_res^{2} = = sigma_patt^{2}/2 + sigma_res^{2}/2 Finally, expected width of atomic peak is: W = 2 sigma_four = sqrt ( 2 ( sigma_patt^{2} + sigma_res^{2}) )