BUCCANEER (CCP4: Supported Program)

NAME

buccaneer - Statistical protein chain tracing

SYNOPSIS

buccaneer_pipeline.py -title title -mtzin filename -seqin filename -pdbin filename -pdbin-mr filename -pdbin-sequence-prior filename -pdbout colpath -colin-fo colpath -colin-hl colpath -colin-phifom colpath -colin-fc colpath -colin-free colpath -cycles number of cycles -jobs -buccaneer-anisotropy-correction -buccaneer-build-semet -buccaneer-fix-position -buccaneer-fast -buccaneer-new-residue-name -buccaneer-resolution resolution -buccaneer-1st-cycles number of cycles -buccaneer-1st-correlation-mode true/false -buccaneer-1st-sequence-reliability reliability -buccaneer-nth-cycles number of cycles -buccaneer-nth-correlation-mode true/false -buccaneer-nth-sequence-reliability reliability -mtzin-ref filename -pdbin-ref filename -colin-ref-fo colpath -colin-ref-hl colpath -refmac-twin true/false -refmac-mlhl true/false -refmac-weight weight -prefix file/directory name -verbose verbosity -buccaneer-keyword keyword -refmac-keyword keyword -stdin
[Keyworded input]

DESCRIPTION

'buccaneer' performs statistical chain tracing by identifying connected alpha-carbon positions using a likelihood-based density target. Model building is iterated with refinement in 'refmac' to produce a fairly complete model for use in Coot.

The target distributions are generated by a simulation calculation using a known 'reference' structure for which calculated phases are available. The success of the method is dependent on the features of the reference structure matching those of the unsolved, 'work' structure. For almost all cases, a single reference structure can be used, with modifications automatically applied to the reference structure to match its features to the work structure.

HOW TO RUN BUCCANEER

The buccaneer_pipeline program will iterate buccaneer and refmac a specified number of times, given by -cycles. Each buccaneer run will involve multiple internal cycles of buccaneer; these may optionally be specified as well, however the defaults are well chosen. A typical calculation is therefore as follows:

CycleAction
13 cycles of buccaneer
10 cycles of refmac
22 cycles of buccaneer
10 cycles of refmac
32 cycles of buccaneer
10 cycles of refmac
42 cycles of buccaneer
10 cycles of refmac
52 cycles of buccaneer
10 cycles of refmac

A set of reference structure will have been provided with the program. The default structure 1TQW is good for typical protein problems at resolutions up to 1.25A, although in practice including data much beyond 2.0A doesn't make much difference. For exotic cases you might want to provide your own reference structures.

The calculation involves 10 stages:

Finding C-alphas
Candidate C-alpha positions are located by searching the electron density.
Growing fragments
The candidate C-alphas or input chains are grown by adding residues at either end, according to the density.
Joining Fragments
Overlapping fragments are joined to make longer chains. If this leads to a junction in a chain, the contested residue is removed.
Linking Fragments
Nearby N and C termini are examined to see if they can be linked by a short loop.
Assigning Sequence
Likelihood comarison between the density of each residue in the work structure and the residues of the reference structure allows sequence to be assigned to longer fragments.
Correcting sequence.
Insertions and deletions in the model building are fixed by rebuilding, where possible.
Filtering fragments in poor density
Residues in poor density are removed.
Building NCS
Any NCS relationships found in the model are used to augment the related chains.
Pruning Fragments
Clashing fragments are examined and the one with the worse density is removed. This stage can be disabled by the -no-prune keyword.
Rebuilding
Rebuilding allows side chain atoms and carbonyl oxygens to be rebuilt.

INPUT/OUTPUT FILES

-mtzin
Input 'work' MTZ file. This contains the data for the unknown, work structure. The required columns are F, sigF, and a set of HL coefficients from phasing improvement.
-seqin
[Optional] Input sequence file in any common format, e.g. pir, fasta.
-pdbin
[Optional] Input PDB file containing an initial model to extend.
-pdbin-mr
[Optional] Input PDB file used to determine placement and chain IDs of the new model. The new model will be built in the same place with the same chain IDs.
-pdbin-sequence-prior
[Optional] Input PDB file containing an model containing heavy atoms or known residues to help with sequencing.
-pdbout
[Optional] Output PDB file. This will contain the new chain trace.
-pdbin-ref
[Optional] Input PDB file containing the final model for the reference structure.
-mtzin-ref
[Optional] Input 'reference' MTZ file. This contains the data for a known, reference structure. The required columns are F, sigF, and a set of Hendrickson-Lattman (HL) coefficients describing the calculated phases from the final model. Suitable reference structures can be constructed from the PDB using the 'Make Pirate reference' task.

KEYWORDED INPUT

See Note on keyword input.

-colin-fo colpath

Observed F and sigma for work structure. See Note on column paths.

-colin-hl colpath

Hendrickson-Lattman coefficients for work structure. Either -colin-hl or -colin-phifom should be specified, but not both. See Note on column paths.

-colin-phifom colpath

Phase and figure of merit for work structure. Either -colin-hl or -colin-phifom should be specified, but not both. See Note on column paths.

-colin-fc colpath

[Optional] Initial map coefficients (F and phase) for work structure. These must be on the same scale as the observed F's (i.e. do not use after phaser or phenix.refine). See Note on column paths.

-colin-free colpath

[Optional] Free R flags. See Note on column paths.

-cycles number of cycles

[Optional] Number of cycles of building to run. Running multiple cycles leads to a more complete model, although it is not as effective as recycling with refmac.

-jobs CPUs

[Optional] Set number of CPUs to use.

-buccaneer-anisotropy-correction

[Optional] Correct the input F's for anisotropy.

-buccaneer-build-semet

[Optional] Build MSE instead of MET for selenomethionine experiments.

-buccaneer-fix-position

[Optional] Build new model in the same place in the unit cell as the input model.

-buccaneer-fast

[Optional] Use fastest rather than best methods. Typically gives 2-3x speedup for a very similar model, but results vary.

-buccaneer-new-residue-name type

[Optional] Set the name which will be given to newly built residues.

-buccaneer-resolution resolution/A

[Optional] Resolution limit for the calculation. All data is truncated.

-buccaneer-1st-cycles cycles

[Optional] Number of internal buccaneer cycles to run on the first iteration. Default 3.

-buccaneer-1st-correlation-mode

[Optional] Use the correlation target function for growing new chains and for sequencing. This is less effective for initial building, but better for model completion, especial after molecular replacement. Default false.

-buccaneer-1st-sequence-reliability reliability

[Optional] Values between 0.5 and 1.0 vary the relibility cutoff for docking a sequence. The value is the probability at which the sequence will be accepted. 0.5 means every sequence will be docked, 1.0 means that no sequences are docked. Default = 0.95.

-buccaneer-nth-cycles cycles

[Optional] Number of internal buccaneer cycles to run on subsequent iterations. Default 2.

-buccaneer-nth-correlation-mode

[Optional] As above, for subsequent cycles. Default true.

-buccaneer-nth-sequence-reliability reliability

[Optional] As above, for subsequent cycles. Default = 0.95.

-colin-ref-fo colpath

[Optional] Observed F and sigma for reference structure. See Note on column paths.

-colin-ref-hl colpath

[Optional] Hendrickson-Lattman coefficients for reference structure. If you do not have these, they can be generated using the accompanying chltofom program. See Note on column paths.

-refmac-twin true/false

[Optional] Use twin refinement in refmac. Default false.

-refmac-mlhl true/false

[Optional] Use MLHL refinement in refmac. Default true.

-refmac-weight weight

[Optional] Geometry weight to use refmac. Default AUTO.

-prefix file/directory name

[Optional] File prefix or directory name to use for temporary files. If the name ends in a slash, a directory will be created and used. Default buccaneer/.

-verbose verbosity

-buccaneer-keyword keyword

Additional keywords to pass to buccaneer. (Can be repeated.)

-refmac-keyword keyword

Additional keywords to pass to refmac. (Can be repeated.)

Note on column paths:

When using the command line, MTZ columns are described as groups using a slash separated format including the crystal and dataset name. If your data was generated by another column-group using program, you can just specify the name of the group, for example '/native/peak/Fobs'. You can wildcard the crystal and dataset if the file does not contain any duplicate labels, e.g. '/*/*/Fobs'. You can also access traditional non-grouped columns from existing files by giving a comma-separated list of names, e.g. 'FP,SIGFP' or 'HLA,HLB,HLC,HLD'.

Note on keyword input:

Keywords may appear on the command line, or by specifying the '-stdin' flag, on standard input. In the latter case, one keyword is given per line and the '-' is optional, and the rest of the line is the argument of that keyword if required, so quoting is not used in this case.

Reading the Ouput:

The number of residues sequenced and the Free-R factor from refmac are the most useful outputs. These may easily be found using the 'Annotated log file'.

Problems:

AUTHOR

Kevin Cowtan, York.

SEE ALSO