Angular sampling, number of projections and resolution
Note: If you are running e2refine_easy, you should not really need to worry about this parameter. It will be computed for you based on your project parameters. If you use speed=5, the angular sampling will be a bit finer than required based on the discussion below. If you lower speed to 1, angular sampling is increased roughly 2-fold (making the refinements take considerably longer). If you want to experiment with angular step, I suggest first decreasing '--speed' to see if you get an improvement, before considering overriding the default parameters.
Historically in EMAN, you needed to specify a value for 'ang=', the angular spacing of the projections used to classify the particles. In EMAN2.1 this is normally computed for you, but a general discussion of the principles involved can still be useful, as this is the basis for how EMAN2.1 decides what to use. In essence these are very simple geometrical arguments, but in practice you will need to understand Fourier transforms pretty well to be able to fully follow the arguments on this page. Even if you don't fully follow, the practical advice can still be followed.
If you hope to achieve resolution X, how fine an angular sampling do you need to have, or, to put it a different way, how many different particle orientations do you need ? To an extent, the answer to this question depends on your situation. The classic answer to this question, dating back to the 60s (?) is pi*D/d, where D is the object size and d is the targeted resolution. However this gives the value for tomographic sampling, ie - only looking at particle orientations evenly spread around the equator of the specimen. In single particle analysis, we (hopefully) have particles in completely random orientations on the grid. While in practice there will often be a preferred orientation, which may lead to anisotropic resolution in the final structure, the sampling required to achieve a given resolution is still the same.
Let's begin by considering the number of projections produced on the unit sphere as a function of the angular step between projections. This will vary slightly with different algorithms used to spread the points as equally as possible on the sphere, but for the default EMAN algorithm, it looks like this:
As you can see, even on a log scale, this curve falls rapidly with angular step. The relationship is actually fairly simple, however, and
P = 20602/(A^2 * S) gives a very close approximation, where P is the number of projections, A is the angular step in degrees and S is the symmetry number (number of asymmetric units, eg 60 for icos, 14 for D7, etc.)
Due to the squared relationship you can see that going from a 2 degree angular step to a 1 degree angular step will make refinements take 4x longer to run, and will produce class averages with 1/4 as many particles. But how many projections are really necessary ?
This question is easiest to address in Fourier space. The projection theorem tells us that any projection image of a 3-D structure is simply a 2-D slice through the origin of the Fourier transform of the 3-D structure. That is, one can quickly compute projections of a map (with some nasty artifacts, unfortunately) by computing the FFT of a 3-D structure, extracting a 2-D slice, then computing the IFT of the slice. Conversely, when reconstructing a 3-D map, the FFT of all of the particles or class-averages can be computed, then inserted into an empty volume in Fourier space, each in the correct orientation. In practice, this process is a bit tricky and requires interpolation from a 2-D grid to a 3-D grid, which produces artifacts, and so on, however, the basic concept holds. Now, in Fourier space, radius corresponds to resolution. Higher radii correspond to finer levels of detail in the real-space 3-D structure. If you picture inserting slices in various orientations, it should be clear that as you get to higher and higher radii, the slices will cover less and less of the unit sphere. To truly achieve resolution X, we need to have complete coverage of this sphere with some level of sampling. The amount of sampling is determined by three factors:
- the box-size of our particle (in pixels)
- the radius in Fourier space of our targeted resolution
- How interpolation is performed in Fourier space (the so-called interpolation kernel)
These two images show how Fourier space is filled by slices (box size of 512x512x512):
|
|
Clearly the finer sampling produces better filling in Fourier space, but at 7.5 degrees with this box size, filling is far from complete near the edge of the box.
What we need to compute, then, is the Fourier shell coverage percentage for different situations. That is, what fraction of the shell is completely sampled by the given slices at a given radius (resolution). We can forget the actual resolution for a moment, and simply consider the geometrical problem of inserting slices with a 2-voxel wide interpolation function into a box at different N-voxel radii.
Extremely Conservative Approximation
Let's begin with an extremely conservative estimate of the number of projections we need. In this estimate, we required that Fourier space be 99.5% filled with at least 1.5 samples in every voxel at Nyquist frequency (the highest conceivable resolution for a given box size). While we could come up with some theoretical estimates for these values, the actual numbers will depend quite strongly on the specific algorithm used for slice insertion, so we take the experimentalist's approach and simply compute the coverage at Nyquist for different box sizes as a function of the number of uniformly distributed projections, requiring that each voxel have a summed weight of at least 1.5. This means we only consider a voxel to be 'covered' if its value has been contributed to by at least 2 pixels in the projection images. Doing this little computational experiment gives the following:
Each of these curves represents a different number of projections, and shows the %coverage as a function of radius in Fourier space. As you can see, due to thresholding and other effects, these curves are highly nonlinear. Using our selected threshold value of 99.5% coverage, we can then compute the sampling required to achieve full sampling at Nyquist, and using the relationship computed above, further convert that to an angular step:
Reassuringly, the first plot shows a linear relationship between the number of slices and the desired box size. This is equivalent to the original pi*D/d estimate based on tomographic sampling. The fact that this curve should be linear is simply based on the fact that we are sampling a 3-D space with a 2-D function, so the number of samples must increase linearly with size. What we are really computing here is the coefficient (slope) of this line for a specific set of algorithms. The relationship to angular step again becomes nonlinear, which is less convenient, but still a useful number to have when trying to conceptualize these results. Very often the angular spacing must also be specified to the reconstruction software.
This represents an extremely conservative estimate for the number of projections you need to generate for each box size. There should really never be any reason to use finer sampling than the numbers presented here in any practical reconstruction problem. While there are a few caveats to this statement related to preferred orientation, this result can be used as a limiting value, and save considerable wasted CPU time.
An Alternative Approximation
Overestimating the number of projections required for a reconstruction can lead to considerable waste in both human and computer time, as well as, in some cases, produce lower quality structures than could be achieved with more reasonable values. However, as it turns out, the difference between this alternative approximation and the conservative approximation only differs by roughly a factor of two in number of projections. For this modified estimate, we use a reduced requirement (0.5 weighted voxel conribution instead of 1.5 above) for samples contributing to voxels and we only require full sampling at 2/3 Nyquist rather than Nyquist.
For a typical refinement anything between this value (~1.5x the conservative angular step) and the conservative value itself should be a reasonable choice.
The one exception to this rule is cases where the data is massively oversampled. In most cases it is best to target A/pix = 3-4x the desired targeted resolution. That is, if the target resolution is 4 A, a sampling between 1 and 1.333 A/pix would be most appropriate. That would put the FSC resolution threshold between 1/2 and 2/3 Nyquist, and be nicely computationally efficient. However, if, for example, we were targeting 10 A resolution, and still used a 1 A/pix sampling, our FSC curve would cross threshold at 1/5 Nyquist, leaving most of Fourier space full of noise. In situations like this, the answer is not to decrease the angular step to achieve the full sampling described above, but rather to downsample the data before processing to bring it into a reasonable range.
Caveat
You may find, if using a deterministic projection generation scheme (the default EMAN2.1 method includes a small random factor, and is thus not strictly deterministic), it is possible to produce FSC curves which do 'funny' things, such as falling to zero as they should, then rising back to higher values at higher resolutions. While there are other effects, such as masking artifacts, which can also cause this behavior, a common cause is the pattern of missing data, or patterned insertion at high resolution caused by inadequate angular sampling. While there are often other approaches for correcting this problem, and it should not normally occur in EMAN2.1, decreasing the angular step can also reduce this issue.
Practical Advice
The number of projections you want to have should be roughly between 2.5x and 4.5x the box size divided by the symmetry. That is, if you have a box size of 128x128 with no symmetry, you should have 320 - 576 projections. You can convert these numbers to angles using the formula above ( sqrt(20602/#proj) ), meaning your angular step (delta=) should be between 6 and 8 degrees for a box size of 128, assuming you are targeting 2/3 Nyquist resolution, and your particle occupies the recommended amount of the box.