Space group assignment in the presence of pseudosymmetry and twinning

The crystal structure of Ferrochelatase-1 (HemH) from Bacillus anthracis presented a moderately difficult case for space group assignment. The lattice parameters, merging statistics and systematic absences favoured the space group P21212. A molecular replacement solution with a high contrast was found using the PDB entry 1ak1 as a search model (sequence identity of 73%). However, it was not possible to refine this solution (the lowest value of R-free was about 40%). The structure has been eventually solved, but in a monoclinic space group.

DATA for this tutorial are in the directory HemH_tutorial and include two files described below.

The file p222.mtz contains unmerged HemH data presented in the space group P222. As these are unmerged data, the space group assignment only affects the order and indexing of observations and can be altered with no loss of information using pointless.

The file 1ak1.pdb was downloaded from the PDB and represents a crystal structure of Ferrochelatase-1 from Bacillus subtilis containing one protein molecule in the asymmetric unit. This structure will be used as a search model for molecular replacement.

Programs that will be used are: pointless, scala, ctruncate, molrep, refmac, all included in ccp4 and ccp4i. There will be three attempts at structure solution. The first will be terminated after scala, when we notice that some parts of data need to be removed; the second could be terminated after twinning tests with truncate, but we will proceed to see molecular replacement and refinement statistics; and the third will result in the correct structure and space group. The step-by-step instructions are given below, with remarks at important checkpoints.

1 Firstly, we examine the data set p222.mtz and exclude bad data.

1.1 Automatic determination of the space group using pointless

- Data Reduction > Import Integrated Data > Import Unmerged Data (Pointless)

- Keep default selections "Determine Laue Group" and "Input ... MTZ file"

- Define input mtz-file (MTZ #1): p222.mtz

- Select "Write output reflections in the best space/point group"

- Define the name of output MTZ-file, e.g. test.mtz

- Select "Use reference setting for primitive orthorhombic..." (This is to avoid non-standard settings such as P21221)

- Run > Run Now, and then Close the job window

- In the main ccp4i window: View Files from Job > View Log File

Check the end of the log-file: this automatic run selects P21212.

1.2 Check data with scala

- Data Reduction > Scale and Merge Intensities

- Unselect "Run Ctruncate..." (We are going to run it separately)

- Select input file (test.mtz) and Run

- View Files from Job > View Log Graphs

(A) Use the plots "Rmerge v Batch for all runs" in the section "Analysis against all Batches..." and "Rmerge v Resolution" in the section "Analysis against Resolution" to select bad batches and choose the useful resolution range.

2 At this step, we exclude bad data and make an attempt at structure solution using automatically defined space group. Indications of incorrect symmetry assignment as well as some misleading figures will be discussed.

2.1 Rerun the pointless job from 1.1 with explicitly specified high-resolution limit and with excluded bad batches.

- Select the previous pointless job and press "Rerun Job"

- Expand the folder "Exclude Data" and input the ranges of batches to exclude and the high-resolution cut-off defined in (A).

- Redefine the name of output MTZ-file, e.g. pointless1.mtz and Run

- View Files from Job > View Log File

(B) Compare the log-files from 1.1 and 2.1. Pay attention to the first two tables highlighted in red. This is an important difference: one of the three two-fold symmetry operations in the first table in 2.1 has substantially lower R-merge than other two and, correspondingly, one of P12/m1 has lower R-merge than the other two P12/m1 and than Pmmm. However, pointless decides that Pmmm is the top hit. This is because the analysis of twinning is delegated to truncate which is to be run later.

2.2 Rerun the scala job from 1.2 but with pointless1.mtz as input. Compare R-merge plots with corresponding plots from 1.2. The R-merge in 2.2 is expected to be lower than 0.5 for all included batches and resolution shells. However, the average R-merge is still high: even the "good" portion of our data is not very good.

(C) Consider the sections "Axial reflections..." in the scala's View Files ... > View Log Graphs. Systematic absences along h and k and not along l are expected for P21212. The systematic absences along h are quite convincing and there are no systematic absences along l as expected. However, there are three reflections with odd k but with I/sig(I) > 4. This can be considered as a minor argument against the current space group assignment. On the other hand, this is a good example of how misleading pseudo-absences can be.

2.3 Conversion of I to F and intensity statistic with truncate

- Data Reduction > Check Data Quality > Analysis with ctruncate

- Select "Input data are intensities from Scala"

- Select "... add FreeR ..."

- Select input mtz-file pointless1_scala1.mtz, keep default output mtz-file name and Run

- View Files from Job > View Log Graphs

N.B. If ctruncate fails, the plots of interest may still be available. If so, use the ctruncate plots for data analysis but run old truncate to generate mtz-file with Fs (rerun ctruncate job and select the check-box "Use old Truncate program").

(D) The plots of interest are: the section "L test for twinning", the plot "4th moment of E..." in the section "Acentric moments of E for k=1,2,3", and the section "Cumulative intensity distribution". All the experimental plots appreciably deviate from the theoretical references for untwinned data. Thus the data seem to be twinned, however the above tests do not indicate crystal symmetry and it is not clear whether the Laue group is (i) mmm or (ii) a subgroup of mmm. Case (i) implies special relations between unit cell parameters, which would make lattice symmetry higher than mmm to include twin operations. If (i) had been the case, ctruncate would have found the additional operations of the lattice symmetry and H-test would be present in the list of available plots, as in 3.2 below. As it is not so, we may assume (ii) and conclude that current space group is incorrect. Therefore the next two steps are only a demonstration of how molecular replacement and refinement can behave with incorrectly merged data and how misleading this behaviour can be (see 3.3 for explanations).

2.4 Run molrep with Fs in P21212 generated in 2.3.

- Molecular replacement > Run Molrep - Auto MR

- Keep defaults

- Input files: pointless1_truncate1.mtz and 1ak1.pdb

- Change the name of the output pdb-file to e.g. pointless1_molrep1.pdb to indicate the current data set and then Run

- View Files from Job > View Log File

(E) The "Score" column in the last table entitled "Summary" is of interest. (The score is the correlation coefficient times packing function.) At the first glance everything is fine, as the contrast is huge: compare the scores for the first and all other solutions. Note, however, that the first line of the "Summary" table starts with something like " S_1_5". This means that the solution, which is the best in terms of the score, is also the best in terms of the value of the rotation function, but only the fifth best in terms of the value of the translation function. If the space group had been correctly specified, such figure would be very unlikely for 73% of sequence identity between the search model and the target structure. (Significant conformation changes can be excluded because of a huge value of the rotation function for the best solution, see the first table in the log-file). This consideration is not very obvious, and refinement of the MR solution may give clearer figures.

2.5 Refinement of the MR solution

- Refinement > Run Refmac5

- Define input files: pointless1_truncate1.mtz and pointless1_molrep1.pdb

- Keep defaults for other parameters and Run

- View Files from Job > View Log File

(F) Check R-factors in the last table of the log-file. Such final values of R-factors would be reasonable for a poor MR model, but not for an MR model with high identity. More important, R-free does not (significantly) decrease even in the first few cycles of refinement suggesting that the model is incorrect. Finally, if you would proceed with the refinement and rebuilding of this model you would unlikely manage to get R-free lower than 40%.

Taken together, the observations (B-F) strongly suggest that the current space group assignment is incorrect.

3 Attempt at structure solution using the "second best" space group.

3.1 Rerun the job from 2.1 with the following changes:

- Select "Choose a previous solution" in the second line of the job window

- In the folder "Choose a solution", type "2" in the box next to "Laue group solution number"

- Change output mtz-file name to pointless2.mtz and Run

- View Files from Job > View Log File

The first two tables in the log-file resemble those from Step 2.1, but the selected space group has change to P1211.

3.2 Repeat pp 2.2-2.5 but starting from pointless2.mtz. Pay attention to the following points.

(G) The space group selected by pointless in 3.1 is P1211. We have seen in (C) that there are two coordinate axes with systematically weak odd reflections, one with clear systematic absences and another with a few relatively strong odd reflections. Check whether clear systematic absences correspond to the 21 axes.

(H) Now, when the data are not merged into a point group that includes twin operations, a partial twinning test is available. The plot can be found in the section "H test for twinning..." of the ctruncate's View Log Graph. This test is not very useful for perfect twins but in our case it shows twinning fraction of about 0.3 and provides a strong evidence for the current monoclinic space group and twinning.

(I) As expected, molrep has found two molecules in the asymmetric unit instead of one in 2.4 (symmetry is now lower). The contrast remains very high but there is an important improvement compared to 2.4 and (E): the solutions with the best scores in the two summary tables are also the best in terms of the values of the TF ("S_1_1 " and "S_2_1").

(J) Most important, R-free has significantly decreased during refinement and final R-factors are substantially lower than in 2.5.

Thus the true space group is P1211 with a=49.9Å, b=109.9Å and c=59.4Å and all angles 90°. The data are twinned and the twinning fraction is about 0.3. It is therefore appropriate to continue refinement and model correction using refmac option of twin refinement.

N.B. There are two P21 subgroups in the initially chosen P21212, one of which is the true space group of the structure. The two subgroups can be distinguished by the order of cell parameters.

3.3 Analyse the final structure in CA-representation using coot. Notice the relation between two molecules in the asymmetric unit. Dependent on the choice of the asymmetric unit, which was done automatically at the MR step, the two molecules are related by two-fold rotation either about X or Z, plus some translation. Such an NCS interfere with twinning, as the equivalent twin operations are the two-fold rotations about X and Z. This NCS is actually a P21212 pseudosymmetry, and therefore it was possible to find a contrast MR solution in P21212.