EMAN2 Tomography and Subtomogram/Subtilt Averaging Workflow Tutorial

Important note: Throughout the tutorial you will see a split between (small) and (large) instructions. The (small) tutorial is designed for laptops or basic desktops and can achieve ~12 Å resolution in a time compatible with live tutorial sessions. The (large) tutorial really requires a proper tomography workstation, but can achieve subnanometer resolution. Do not intermix (small) and (large) options below!

Computer Requirements

Note: Any place in EMAN2 where you are requested to enter the number of threads to use, you should specify the number of cores your machine has. Computers are often advertised as 4 core/8 thread or 8 core/16 thread. Trying to run image processing using this advertised number of threads will usually make processing run slower, not faster. You may optionally increase the number of cores by ~25%, ie - on a 4 core machine, specifying 5 may give a 5-10% speedup over 4

Download Data

The tutorial data comes from EMPIAR 10064, but only uses a subset of the data from the mixed CTEM tilt series.

Prepare input files (~2 minutes)

e2projectmanager.py&

Project Manager

When working with your own data:

Tiltseries Alignment and Tomogram Reconstruction (~10 min Small)

Alignment of the tilt-series is performed iteratively in conjunction with tomogram reconstruction. Tomograms are not normally reconstructed at full resolution, generally limited to 1k x 1k or 2k x 2k, but the tilt-series are aligned at full resolution. For high resolution subtomogram averaging, the raw tilt-series data is used, based on coordinates from particle picking in the downsampled tomograms. On a typical workstation reconstruction takes about 4-5 minutes per tomogram.

For the tutorial tilt-series:

If you opted to run without notmp on one or more tilt series:

Tomogram reconstruction

When working with your own data:

Handedness Check

EMAN2 will automatically locate the tilt axis in a tilt series if it is not provided, but there is a 180° ambiguity in this determination. An incorrect choice will lead to structures with the incorrect handedness, and may produce suboptimal CTF correction. In some data sets (not the tutorial) this may lead to particles with mixed handedness. Since we specified tltax above, this step isn't necessary for the tutorial, but you can run it to see what the results look like.

EMAN2 includes a novel procedure for resolving this ambiguity from a tilt series based on defocus estimates across the tilted images. The tutorial data set comes out correctly without running this check, but when working with your own data, this step is highly recommended. Once you know the correct tilt axis direction to use for a given microscope/camera, you shouldn't need to run this test on every data set, but it may not be a bad idea even then, as there are various possible configuration/software errors on the instrument which could potentially cause inconsistent results, particularly with a change of magnification.

For the tutorial tilt-series:

You will need to look at the console where you launched e2projectmanager to see the results of the test. It should look something like:

Average score: Current hand - 4.133, flipped hand - 3.290
Defocus std: Current hand - 0.110, flipped hand - 0.165
Current hand is better than the flipped hand in 86.4% tilt images
The handedness (--tltax=-4.1) seems to be correct. Rerun CTF estimation without the checkhand option to finish the process.

If you run this check on multiple images and it seems that they consistently indicate a flipped tilt axis/handedness, then you need to return to the previous step (Tomogram Reconstruction) and redo the reconstruction for all tomograms, with the correct tilt axis entered in the corresponding box. The same tilt axis should be used for all tilt series collected under the same conditions on the same instrument. The automatic value may vary a little among micrographs, just compute the approximate average or median value.

Note: This method removes almost all of the ambiguity about particle handedness. The one potential issue is that the MRC file format uses a non-conventional origin for images. If the data collection software doesn't take this into account, the images may be flipped when written to disk. The easiest way to check the software would be to collect 2 images of the same target and save them directly into different file formats, then checking (in EMAN2) whether the two images appear to have the same handedness. If not, it is likely that the MRC files are incorrect.

CTF Estimation (<5 min)

Do NOT forget this step!

This step will determine the defocus as a function of location for each tilt in each tilt series. This information is stored in the headers of particles as they are extracted from the tilt series, and used for CTF correction during subtomogram averaging. If you forget to do this, you will need to re-run the particle extraction step, which is quite time consuming.

For the tutorial tilt-series:

When working with your own data:

Note: this program is only estimating CTF parameters, taking tilt into account. It is not performing any phase-flipping corrections on whole tomograms. CTF correction is performed later as a per-particle process. This process requires metadata determined during tilt-series alignment, so it cannot be used with tomograms reconstructed using other software packages.

Note: In >=2022 snapshots of EMAN2 it is possible after CTF correction to return to the 3-D reconstruction step and produce CTF corrected whole tomograms, but this does nothing useful when following the EMAN2 pipeline. If you wish to compare EMAN2 tomograms with other software doing CTF correction, this could potentially be useful

Tomogram reconstruction evaluation

Tomogram evaluation

Analysis and visualization -> Evaluate tomograms can be used to evaluate the quality of your tilt series alignments and tomogram reconstructions. This tool will show more information as you progress through the tutorial, but can be used already at this point to make various assessments of your tomograms. Note that some of this information may not be available if you had notmp checked during the reconstruction.

The correctrot option often does not work well on tutorial tomo3. If you go through the tomograms you should see ribosomes spanning the plane of the image for all of the tomograms. If tomo3 shows only a narrow band containing ribosomes, return to the tomogram reconstruction step above, and re-run the process for only that tomogram (uncheck alltiltseries, and select tiltseries/*tomo3.hdf. Also uncheck correctrot).

Particle Picking Choices

There are 4 different tools you can use for particle picking in EMAN2 as of Feb 2022:

  1. A new deep-learning based 3-D picker. Not available in 2.91, must use a recent (2022+) snapshot.
  2. Abusing the deep-learning based segmentation tool for particle picking purposes
  3. Manual particle picking
  4. Template based picking (usually seeded with some manual picking results)

For live versions of this tutorial, we use the older manual+template based approach as it requires no specific hardware, and is a good learning experience, but the deep learning 3-D picker is a good choice for typical situations. For cellular tomograms, the annotation tool approach may still be a good choice.

New Deep-Learning 3-D Picker (recommended approach)

This is the easiest approach by far, and is basically a single step process, but is only available in recent (2022+) snapshots of EMAN2.

Tomogram annotation (GPU recommended)

2D particle picking

For a detailed description of how to use the annotation tool, see: TomoSeg

Here is a brief summary of the annotation-based approach:

Manual particle picking (10-15 min)

3D particle picking

Particle extraction (~2 min (a few manual) - hours (a couple of thousand))

Note that this step will be vastly different resource-wise if you are only extracting a few manually selected particles for purposes of later template matching or if you already selected hundreds of particles from each tomogram.

The reduced 1k x 1k (or 2k) tomograms are used only as a reference to identify the location of the objects to be averaged. Now that we have particle locations, the software returns to the original tilt-series, extracts a per-particle tilt-series, and reconstructs each particle in 3-D independently at full resolution. Since this is performing a full resolution reconstruction of each particle it is somewhat resource intensive.

For the tutorial tilt-series:

For your own data

Initial model generation (10 - 60 min)

While intuitively it seems like (since the particles are already in 3-D) the concept of an "initial model" should not be necessary. Unfortunately, due to the missing wedge, and the low resolution of one individual particle (particularly from cells), it is actually critical to make a good starting average. Historically it has been challenging to get a good starting model, depending on the shape of the molecule. This new procedure based on stochastic gradient descent has proven to be quite robust, but it is difficult for the computer to tell when it has converged sufficiently. For this reason, the default behavior is to run much longer than is normally required, and have a human decide when it's "good enough" and terminate the process. If you use a small shrink value and let it run to completion, it can take some time to run. This is harmless, but unnecessary. While the section below the solid line remains fully functional, a new program available since 2021 does a much more efficient job of making initial models.

New Initial Model Generator

Initial model generation

The new initial model generator was only added to e2projectmanager in Feb 2022, so if you have an older version, you may need to run it from the command-line. If it isn't available from the command-line either, you should probably update.

If you do not see New initial model generator in e2projectmanager, you can run it from the command line, replacing appropriate options:

e2spt_sgd_new.py sets/initribo.lst --res 50 --niter 10 --shrink 2 --parallel thread:4

The second program will produce output like:

Gathering metadata...
 69/69
iter 0, class 0:
17 jobs on 4 CPUs
iter 1, class 0:
17 jobs on 4 CPUs
iter 2, class 0:
17 jobs on 4 CPUs

Once it gets past 3-4 iterations, you can use the browser to look in sptsgd_00, and double-click on output_cls0.hdf. This file will change after each iteration completes. It contains the results of the most recent iteration. When you are satisfied with the quality of the initial model, you can kill it with the task manager in e2projectmanager.

For your own data:

Old Initial Model Generator

This section is the older program, which is still functional, but not recommended. Skip this if you already used the new version above.

For the tutorial tilt-series:

For your own data:

Template matching (ONLY if you manually boxed above)

If you manually boxed only a few particles in the steps above rather than using one of the semi-automated methods to pick the full 1000 - 3000 particles, then you will now need to use template matching to box out all of the particles. If you already have a set of 1000-3000 particles, skip ahead to New Integrated Refinement Program

Note that here, and everywhere else in the tomography pipeline, reconstructed particles have positive contrast (look white in projection) and tomograms/tilt series have negative contrast (look dark in projection). If you wish to use a reference volume from the PDB or somesuch, then it should have positive contrast as is normal in the single particle CryoEM field.

For your own data:

Particle extraction (ONLY if not done for full data above)

If you already extracted a complete set of particles (not just the few initial references) above, you don't need to repeat it again here.

Since this involves several hundred particles instead of 30-50, it will take quite a lot longer to run.

For the tutorial tilt-series:

Build Sets (again)

It's harmless to repeat this, as it's almost instantaneous, and overwrites existing files.

New integrated refinement program

There is a new refinement program which implements both traditional subtomogram averaging and subtilt refinement in a single program. This is an alternative to the next two major sections (Subtomogram Refinement and Subtilt Refinement). The full tutorial on the new program is here. It was integrated into e2projectmanager in mid-2022, so make sure you have an up to date version.

Small (large below)

In the small tutorial will do this as 2 sequential refinement runs for efficiency, and to show how you can continue a refinement. The initial refinement should turn the initial model into something ribosome shaped, but at low resolution. The second refinement should get to ~12 Å resolution.

e2spt_refine_new.py --ptcls sets/ribo.lst --ref sptsgd_00/output_cls0.hdf --iters p,p,t --goldstandard --startres 50 --tophat local --parallel thread:4 --threads 4

When that's done, you should have a spt_00 folder containing the results of the 3 iterations we requested. You may take a look at:

The 3rd map should look considerably more like a ribosome than the initial model (threed_00.hdf), but the resolution will still be limited. In the next run, we will do more subtilt refinement to push the resolution.

We could have done this all in a single run, but simply including the additional --iters letters except for one key addition. The --maxres 12 argument indicates the highest resolution information to consider in the refinement. If this is not specified, it will be determined automatically based on the results of the previous iteration. This will make the iterations run faster, but it may make them progress towards the final resolution more slowly. So, we use the automatic method for 3 iterations to get the shape correct quickly, then in the second run, push the resolution.

e2spt_refine_new.py --ptcls spt_00/aliptcls3d_02.lst --ref spt_00/threed_03.hdf --loadali2d spt_00/aliptcls2d_03.lst --loadali3d --goldcontinue --iters t,p,t,r,d --keep 0.95 --tophat local --parallel thread:4 --threads 4 --maxres 12 

That's it. You hopefully have a ~12 Å resolution map. If you wish to try and push the resolution further, you can follow the Large version of the tutorial

Skip the next 2 sections about the old refinement. Other notes are below that.

Large

The Large data set should have 1000-3000 particles at full sampling, and should readily achieve ~6.5 Å resolution. This is really designed to run on a workstation, not a laptop.

e2spt_refine_new.py --ptcls=sets/tomobox.lst --ref=sptsgd_00/output_cls0.hdf --startres=50.0 --goldstandard --sym=c1 --iters=p,p,p,t,t,t --keep=0.95 --tophat=global --parallel=thread:4:/home/username/tmp --threads=4 --m3dthread

If you like you could use the modified iters sequence: p,p,p,t,t,r,d,t but this will take considerably longer to run, and likely produce only very slightly improved results. This alternative includes in-plane rotational subtilt refinement and defocus refinement.

You can (and probably should) look at the results while the job is runing, you should have a spt_00 folder containing the results each completed iteration. You may take a look at:

That's it. You hopefully have a ~6.5 Å resolution map when the job finishes, and it should get pretty close to that after 4-6 iterations.

Skip the next 2 sections about the old refinement. Other notes are below that.

Old Subtomogram refinement (~1 hr/iteration)

3D refinement As an alternative to the new integrated tool above, the older pair of programs is still available. You shouldn't need to do both approaches. This step is similar to the "p" iterations above, though it uses an older algorithm.

This step performs a conventional iterative subtomogram averaging using the full set of particles. Typically it will achieve resolutions in the 15-25 A range with a reasonable number of particles. As it involves 3-D alignment of the full set of particles multiple times, it takes a significant amount of compute time. Higher resolutions are achieved in the next stage after this (subtilt refinement).

For the tutorial tilt-series:

Results will gradually appear in spt_XX/ Feel free to look at intermediate results with the EMAN2 file browser as they appear.

For your own data:

Old Subtilt refinement (~9 hr/iteration)

Subtilt refinement directory This is the second half of the old refinement strategy. It is conceptually similar to the t,p and r iterations in the newer integrated program above.

With the results of a good subtomogram alignment/average, we are now ready to switch to alignment of the individual particle images in each tilt, along with per-particle-per-tilt CTF correction and other refinements. This is effectively a hybrid of single particle analysis and subtomogram averaging, and can readily achieve subnanometer resolution IF the data is of sufficient quality. The tutorial data set is, but many cellular tomograms, for example, are not collected with high resolution in mind, and even with this sort of refinement will be unable to achieve resolutions better than 10-30 A, depending on the data. This process is completely automatic, based on all of the metadata collected up to this point. While it is possible to perform "subtomogram refinement" with subtomograms from any tomogram, Subtilt Refinement cannot operate properly unless all preceding steps occurred within EMAN2.

For the tutorial tilt series:

For your own data:

Congratulations! The final result of the tutorial will be found in "subtlt_00/". The final 3-D map will be "threed_04.hdf" with the default parameters. The final gold standard resolution curve will be "fsc_maskedtight_04.txt". The optional steps below are tools you can use to evaluate your results in more detail.

Refinement evaluation (optional)

Refinement evaluation This tool helps visualize and compare results from multiple subtomogram refinement runs.

EMAN2/e2TomoSmall (last edited 2024-07-14 21:59:58 by MuyuanChen)