[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ccp4bb]: SUMMARY: psuedo symmetry question



***  For details on how to be removed from this list visit the  ***
***    CCP4 home page http://www.dl.ac.uk/CCP/CCP4/main.html    ***

Dear All,

Here is a belated summary of responses to my question about pseudo-symmetry
as well as a description of our conclusions. Here is the original question
just to remind you:

----------------------------------------------------------
Dear All,

I'm afraid this is a bit of a long one.........

I wonder if I could get some help on a problem that is driving us mad here.
We have a 1.65 A resolution data set collected from a rhombohedral crystal
(there was also a low resolution pass at 2.8 A). All data were processed and
merged with DENZO/SCALEPACK. If we don't try too hard in the autoindexing we
get the following cell:

rhombohedral setting:	56.646   56.646   56.646   92.678   92.678   92.678
hexagonal setting:	81.960   81.960   93.417   90.000   90.000  120.000 

The data set then processes nicely as R32 with a completeness of 100% and an
overall Rmerge of 4.4% (11.0% for outer shell). If we do it as R3 it gives
virtually identical statistics. In R32 we get one molecule per AU withAmore.
This refines reasonably well (Rfree ~ 24%) but a few bond distances
persitently misbehave. If we do it as R3 with 2 molecules per AU, we can get
the Rfree down to ~ 22%. However the operation relating one molecule to the
other appears to be a near perfect crystallographic one. When the 2
independently refined molecules are superposed the differences are minimal
and I would say not significant. I think the improvement in Rfree is simply
due to allowing the refinement more freedom.

At this point we went back to the raw data and took a closer look. It turns
out that there are alternate layers of strong and very weak reflections. If
we drop the peak picking threshold in DENZO until it finds a significant
number of these weaker reflections we then get the following cell:

rhombohedral setting:	78.313   78.313   78.313   63.129   63.129   63.129
hexagonal setting:	81.987   81.987  187.165   90.000   90.000  120.000

ie. the c axis is twice as long in the hex setting. This also processes
nicely as R32 with a completeness of 100% and an overall Rmerge of 5.0%
(17.3% for outer shell). When we tried to run AMore on this it gave some
very strange results - some problem with defining the Cheshire cell for non
primitive space groups (although it does say in the manual that you can get
problems with rhombohedral cells). We assumed the weak reflections mean that
we have a repeating unit consisting of two of the "small" cells with the
molecules in virtually the same orientation. So we generated a second copy
of the molecule from our small R32 cell by applying half a unit cell
translation along c. This model was then put into rigid body refinement in
REFMAC, but Rfree was very high ~ 60% even though the packing looked fine.

If we look at the outputs from truncate (attached) they are a bit strange,
in particular for the big cell. Look at cumulative intensity distributions
and moment2 values for acentrics. There is also a bit of a bump in the
Wilson plot at about 2 Ang resolution.

As an added complication there is a very strong non-crystallographic 2-fold
axis relating one half of the molecule to the other, so that it looks
virtually identical if you invert it. This means you have to be very careful
which way up your MR solution is. 

We have tried submitting the data to the "Crystal twinning server"
(http://www.doe-mbi.ucla.edu/Services/Twinning/) but the twin fraction is
less than 2%.

So does anyone have any suggestions as to how we proceed? Obviously the easy
way out is to ignore the weak data and go with the small cell - after all an
Rfree of below 25% is definitely publishable!!

Many thanks in advance

Dave Lawson

-------------------------------------------------------
On the basis of several comments we tried a few things and I posted a
follow-up message:

-----------------------------------------------------
We are still struggling with this one, which is why we haven't posted a
summary yet. We are now working in the big R3 cell with 4 molecules in the
AU. AMORE didn't work with a single molecule as the search model, but we
were successful when we used the dimer from the small R3 cell - confused?

Anyway, the maps look quite nice, but the refinement is a bit disappointing:
Rfac=25%, Rfree=28% (they were 19 and 22 in the small R3 cell). Also the
overall FOM in REFMAC is 0.51 (it was 0.84). Could this be because the
alternate layers of strong and weak reflections make it difficult to scale
Fobs to Fcalc in REFMAC? The data have been processed with DENZO/SCALEPACK,
maybe the situation could be improved using MOSFLM/SCALA. Unfortunately we
can't get MOSFLM to correctly autoindex in the large cell as the spot
picking routine struggles to find the weaker spots - it just gives us the
small cell. Is there a way to convert a DENZO autoindexing soultion into
MOSFLM format so that we can proceed with this approach??

Thanks once again,

Dave

--------------------------------------------------------

Here is a summary of the comments received:

There were several responses including ones from Gerard Kleywegt and
Devapriya Choudhury suggesting that our problem had many similarities with a
case described in Carredano et al., Acta Cryst D56, 313-321 (2000). 

- our problem is indeed very similar, but we don't see alternate layers of
molecules with high and low B-factors. Also in this paper there weren't
alternate layers of strong and weak reflections. They stated that the
pseudo-equivalent reflections have the same amplitude within experimental
error, thus Rmerge does not discriminate between R3 and R32 which agrees
with our observations.

David Borhani reported a similar problem in P21 which was so close to
P212121 that it could actually be solved in the latter. 

Ulrich Baumann made a few suggestions, most of which we had already tried
but suggested that the MR didn't work well in the long cell because the
stronger reflections will dominate the result whilst the weak ones will
contribute very little. 

- good point

John Jenkins commented that there is a form of disorder giving well defined
spots at non-integral lattice positions which is discussed in Giacovazo's
book. 

- I don't think this is the case here.

Aleks Roszak mentioned similar experiences in a number of systems including
R3 and suggested that we may have a superlattice. 

- I agree!

Johan Turkenburg said that the cum. int. distributions look fine. With every
other layer weak you have many more weak reflections than you would
theoretically expect for a structure with random atoms (which is where the
theoretical plots originate from). Or in other words, having two molecules
in almost the same orientation is indeed far from random! 

- good points.

Yong Wang had a similar problem where there were two slightly mis-oriented
molecules in the big cell (perfectly oriented in the small cell). They
eventually published in the small cell however. The conversion from small to
large cell (presumably in R3) however is not simply adding another pair of
molecules related to the first by a half cell translation along z. He
suggested taking the latter (m1 + m1 shifted in 0.5 z) and rotating it by 60
degree, which should give nearly identical R factors back in the big cell.
In fact, the molecule does not rotate, it is just the axis system rotated
when moving to the big cell.

- see below next comment

Eleanor made similar remarks: You need to reindex some how I know - for
R3(2) the requirement is that -h+k+l = 3n. When you double l and change all
l1 to 2l2 this no longer holds unless you change the direction of the l
axis.. ie you need to reindex as -k,-h,-2l
If -h1 + k1 +l1 = 3n, then  -h1 + k1 +l1 -3l1 (=-h1+k1 -2l1) = 3n'.
So reindexing as -k,-h,-2l gives +k -h -2l which is OK.

- I'm having trouble getting my head round this but if you rotate the
(correct) solution in big cell onto that in the small cell using LSQKAB we
get the following:

CROWTHER ALPHA BETA GAMMA     56.59279   179.79269   176.43974
  SPHERICAL POLARS OMEGA PHI CHI     89.89634    30.07633   179.90738
  DIRECTION COSINES OF ROTATION AXIS      0.86536     0.50115     0.00181

SO there is not just a simple translation relating one cell to the other. I
should also point out that the missetting angles from autoindexing in DENZO
are also different.

Anne Bloomer commented that there do seem to be many cases of such
pseudo-symmetry with Fabs, and she has had 2 of them! 

- our's is not an Fab however, but a binding protein.


Andrew Leslie mentioned that the is a program called denzo2mosflm written by
Phil Evans (as did Phil Evans, Henry Bellamy and Ana Gonzalez) which will do
the transformation, but added that the overall Rfactor and Rfree will ALWAYS
be much worse in the real cell SIMPLY because half of the data is
systematically very weak. If you were to calculate R and Rfree for the
l=even data (try it !!), you should find that they are very much better, if
you do it for the l=odd data, you will find they are dreadful.

- good points

Valerie Pye suggested converting .x files from Denzo into a format to be
read into SCALA.

Pierre Rizkallah suggested that to get MOSFLM to deal with your larger cell,
just let autoindexing do what it wants, then change the c-cell dimension in
the menu top left. Predict after that and confirm that it is looking for the
weak spots. (However, the observations of Yong Wang and Eleanor above
suggest that this wouldn't work for R3 although I haven't tried it). He also
mentioned that if the two pairs of molecules are perfectly aligned in the
model, the structure factor calculations will give exact zeros for the
intermediate layers. You will have to use the rigid body refinement in
REFMAC to break out of this. Incidentally, when you TRUNCATE the data in the
large cell, check the N(z) plot, and make sure that the observed curves are
to the left of the theoretical ones. This is an indication of the data being
weaker than the cell symmetry suggests, but that is exactly what you have:
An accidental alignment in the cell which renders the intermediate layers
weak.

- more good points

Felix Vajdos had a similar situation (see Vajdos et al. Protein Science, 6:
2297-2307 (1997)) that he never quite satisfactorily refined.  They had a
small (P43) cell which turned out to be a sublattice of a larger P41 cell
(P43 is a subgroup of P41 and vice versa provided the c-axis is increased by
a factor of 3). We raised some important observations: (1) Translational
pseudo-symmetry is particulary insidious because the weak reflections arise
due to "breaks" in the crystallographic symmetry. So these reflections,
which are important for correctly modelling the structure, are also among
the most poorly measured and therefore difficult to refine against. (2) The
presence of systematically weaker reflections results in a very non-normal
distribution of structure factor amplitudes, which means that it becomes
much more difficult to interpret the R-value.  The presence of weaker
reflections has the effect of systematically decreasing the denominator in
the R-value, thus raising it's value. (3) They found that the correlation
coefficient proved to be more reliable indicator of the progress of
refinement.  Thus for a structure (1.58 Angstrom), we saw correlations of
~0.9 (free ~0.87) even though the Rfactor was 39.6% and Rfree was 46.1%! He
suggested if you can get the information you want from refinement in the
pseudo-lattice, then do so, and in the paper simply mention the difficulties
encountered in the refinement in the true-lattice.

- I think we will do this!

Eleanor again: Of course Rfactors are always higher for weak reflections; if
you run old RSTATS I think you can get the rfactors as a function of |F| as
well as resolution.
Or split the file into two - hk 2l, and hk 2l+1 using mtzutils
RZONE 0 0 1 2 0 gices l=2n and RZONE 0 0 1 2 1 gives l+2n+1 and run
rstats on the two subsets..
I suspect you have really good Rfactors for the stronger data and it is
OK in fact.
 ( You were presumably careful to make sure the pseudo-R32 equivalents
are both either Free or non_free..).

- Yes we adopted a similar procedure to that used in Carredano et al., Acta
Cryst D56, 313-321 (2000). Basically apply Rfree to data set processed in
R32 and then expand to R3 using SFTOOLS. Then we CADed these Rfree flags
onto our data set processed in R3. Whereas in the paper they just used the
expanded R32 dataset.

Ana Gonzalez reported a similar problem: P21 crystal,with a 2 fold NCS axis
almost parallel to b. The maps looked very good (these were also high
resolution data) , but the r-factor got stuck just below 30% (r-free around
32%). Later on we collected a native data set, which happened to crystallize
in the smaller cell (no NCS) and the previous model refined easily to about
15% without doing anything to the protein chain.  I always had the strong
feeling that it was the weak "in between" reflections which were to blame
for the high r-factor in the case of the former data set.

Remy Loris said that what we describe is not pseudo-symmetry but a
superlattice. Refining such a thing is known to be a pain in the ass even in
small molecule crystallography (I agree!!!!). The fact that your R and Rfree
go up when refining in the larger cell is absolutely normal since you are
adding a whole load of relatively weak (but perfectly valid) reflections.
Remember that R-factors are unweighted statistics. Therefore the correct
description for your structure is when using the larger cell which results
in higher R and Rfree. It is definitely not a scaling problem. A
superlattice means that the internal symmetry in the smaller cell is broken,
but that it still holds approximately. There are, however, some small
differences introduced and all information relating to these differences is
present in the layers with weak spots (so, DON'T use sigma cut-ofs to get
lower R-factors! It will just hide the correct structure). Using only the
small cell will provide you with an average structure. In fact you should
not try to compare R/Rfree between the large and small cells since they are
not calculated using the same sets of data. The fact that your density in
the large unit cell is really clear means that what you are doing is
probably correct despite the higher Rfree.

- yet more good points

As a result of several suggestions we split the R3 big cell data set into
the strong and weak components using MTZUTILS and carried out the following
analysis:

           Relative         <I>/<sigI> 
              <I>         out. shell(1.65A)         Rcryst (Rfree)(%)
All l         50               6.2                    25.5 (28.1)
l = 2n        100              9.3                    21.0 (24.0)
l = 2n+1      1                0.5                    50.8 (52.1)

So the weak reflections are REALLY weak!

It's good to see that we are not the only ones with this kind of problem and
that there is no clear-cut solution - other than to crystallize in another
space group! I think we will use the small R3 cell after all.

Once again, thanks to everyone who contributed. I apologise if I have
mis-quoted or missed anyone out.

Dave

Dr. David M. Lawson
Biological Chemistry Dept.,
John Innes Centre,
Norwich,
NR4 7UH, UK.
Tel: +44-(0)1603-450725
Fax: +44-(0)1603-450018
Email: david.lawson@bbsrc.ac.uk