[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ccp4bb]: Enforcing a sigma(f) cutoff



***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***

Dear Dr. Berry and everyone else on the CCP4BB,

Perhaps I should put my question into a large context -- as your 
inferences are 90% correct.  I am working on adapting our 
conformational search program for semi-automatic structure 
determination.  I've been re-solving PDB structures in maps phased with 
the PDB structure.  Although this is an unrealistically favorable 
scenario, you have to start somewhere.  And an inability to generate 
structures with R-factors equivalent to the published structure, 
starting with perfect phases, is worrisome.

> Mark, Is your R-factor as well as R-free higher than published?

The models that I generate consistently have R and R-free values 2% 
higher than the published structure -- 0.15 vs. 0.17 and 0.21 vs. 0.23, 
respectively.   The new structures include exactly the same number of 
waters, but include some sidechain atoms (e.g., on LYS) that are not 
present in the PDB structure.

One substantial difference between the PDB structure and my models is 
the bulk solvent correction.  The PDB structure uses no bulk solvent 
model, while I am using the "babinet model with mask" with the standard 
REFMAC5 parameters.  My understanding is that the bulk solvent correct 
should reduce the R-factors calculated for my structures, so this 
doesn't explain why I get higher R-factors.  In fact, a couple of steps 
of constrained refinement in REFMAC5 of the PDB structure produces a 
structure with even lower R-factors (0.14), though I cannot of course 
calculate a meaningful R-free factor, so it's hard to tell whether this 
is just over-fitting of the PDB structure.

> If you use the exact deposited structure including waters,
> and do no or minimal (single rigid body) refinement, do you still
> get a high R-free?

Using the exact deposited structure including waters, SFCHECK reports 
an R factor slightly larger than the published value 0.15 vs. 0.16.  As 
the structure factors deposited in the PDB include both working and 
free reflections, an increase in the R factor is expected when 
calculated against this composite set.  The R-free is, as you suggest, 
around the same value, because I have used a new R-free set that 
includes mostly working reflections used by the original authors.

> Or do you see the discrepancy only when you re-solve the
> structure by molecular replacement using a related protein,
> in which case your structure may really not be as good as the
> published one? Did the original authors have the advantage of
> experimental phases, multiple crystal forms, or something?

I've be re-solving the structure in a map phased by the PDB structure.  
So I have "perfect" phases with respect to the PDB structure.  This 
rules out issues with initial phases estimates.

The responses to my question, all of which have been very thoughtful 
and helpful, lead to the following question: what is a statistically 
significant difference in the R (or R-free) statistic?  If the R and 
related factors can be influenced by the solvent model, and in fact 
differ substantially when calculated by different programs, should a 
difference of 2% in the R or R-free be dismissed as insignificant?  Or 
is it an indication that the model can still be improved, for instance 
through careful analysis of omit maps?  I am aware of Ian Tickle's work 
analyzing the properties of the R-free, but is there other work that 
discusses these issues?

Thanks again,

Mark

> The PDB deposition only involves the dataset against which the
> structure was refined, so that's all you have to work with,
> but in reality a lot of other data may have been used in
> building and refining the model.
>
> Ed
>
> Mark DePristo wrote:
>> ***  For details on how to be removed from this list visit the  ***
>> ***          CCP4 home page http://www.ccp4.ac.uk         ***
>> Hello,
>> I have a general question about refinement with refmac5.  I've been 
>> re-solving a protein structure against its structure factors 
>> (obtained from PDB).  One consistent difference between my new 
>> structure and the PDB structure is that the PDB structure uses a 
>> sigma(f) cutoff of 0.0, as seen in the REMARK in the pdb file:
>> REMARK   3   DATA CUTOFF            (SIGMA(F)) : 0.000
>> My structure's PDB file, as generated by refmac5, has the line
>> REMARK   3   DATA CUTOFF            (SIGMA(F)) : NONE
>> For the life in me I cannot figure out how to enforce a sigma(f) 
>> cutoff in refmac5.  There are some suggestions (in the archives of 
>> ccp4bb) that such a cutoff can introduce artifacts, as the sigma(f) 
>> cutoff disproportionately affects the weaker reflections.  
>> Regardless, I am suspicious that this cutoff accounts for the 
>> slightly higher R and R-Free values (~ 2%) of my new structure.  Is 
>> this a reasonable interpretation?  If so, how can I enforce a 
>> sigma(f) data cutoff?
>> Thanks,
>> Mark
>> Mark DePristo
>> Ph.D. Candidate
>> Dept. of Biochemistry
>> Cambridge University
>> mdepristo@cryst.bioc.cam.ac.uk
>> http://www-cryst.bioc.cam.ac.uk/~mdepristo/
>
>
> -- 
> Edward A. Berry, MailStop 3-250
> Lawrence Berkeley National Laboratory
> 1 Cyclotron Road, Berkeley, CA 94720
>
> Phone: +1-510-486-4335
> Fax: +1-510-486-6059
> Jfax +1-510-588-4829 (you send fax, I receive email)
> LBNL Emergency status number: 800-445-5830 - Call after "The Big One"
> 	to see if the lab slid off into the bay.
> e-mail: EABerry@lbl.gov
> http://www.lbl.gov/LBL-Programs/pbd/xl_research/scientists/BerryE.htm
> http://www.lbl.gov/~berry/
> http://www.lbl.gov/~berry/berrygroup.html
>
>

Mark DePristo
Ph.D. Candidate
Dept. of Biochemistry
Cambridge University
mdepristo@cryst.bioc.cam.ac.uk
http://www-cryst.bioc.cam.ac.uk/~mdepristo/