Welcome to EMAN. This document will take you through the first steps of performing a single particle reconstruction. Please note that these instructions are designed to be a starting point only. While you should be able to achieve a reasonably good initial reconstruction by following these instructions, there are many subtleties involved in single particle reconstructions, and most of the refinement commands have a wide variety of options for a variety of specific situations. If you run into problems, don't be shy about emailing for help (sludtkebcm.tmc.edu). User feedback is how we improve the software.
By way of introduction for beginners, a few EMAN basics:
You didn't specify a particle box size. This is required for most of the memory evaluation. When you decide what box size to use in step 1, press the step 1 button again and fill this value in.
Generally 3x oversampling is suggested, so, with a target resolution of 25, the optimal A/pix would be 8.3. If your final resolution goal is somewhere around 8A, and you are simply starting with a low resolution, this is fine. Otherwise you should consider rescaling the data closer to the optimal sampling. This can make a dramatic improvement in reconstruction speed, and in some cases will even improve the final model. The boxed out particles can be reduced in size with proc2d and the shrink option if desired.
The individual particles must be located in each micrograph. In EMAN, the program for doing this is called 'boxer'. Note that this program uses a lot of memory (it loads the entire image into memory). To determine how much memory an image will require, type:
iminfo imagefileYour machine should have about 1.5-2 times this much memory. If your machine has less than this much memory, do NOT run the following command, or your machine will begin to swap heavily, and will generally become very slow. If you DO have enough memory, run the following command:
boxer imagefileOn an SGI, you can find out how much memory your machine has with the 'hinv' command. On linux machines just 'cat /proc/meminfo'. If your machine doesn't have enough memory, boxer may allow you to split the image in pieces. See the boxer documentation for more info on this, and for detailed instructions on using boxer. In many cases particles can be selected semiautomatically, using programs like 'batchboxer', or internal features in 'boxer'. At this point we'll stick to manual particle selection.
Select the particles from each micrograph using boxer. When you save the particles, use a different file for the particles from each micrograph. They will be combined later. At this point, you also want to use a box size that's about 25-50% larger than your particles. That is, if the smallest box that you could possibly use is 80 pixels, you should start with a box size of 100-120 pixels. Technically, any box size can be used, but things will run faster if the box size is a multiple of 8, or at least 4. Some good sizes are: 32, 40 ,48 , 56, 64, 72, 80, 96, 112,128, and160. Do not use a box size smaller than 32 pixels.
This is currently the most difficult part of performing a reconstruction in EMAN, largely because it is one of the most important when a high resolution reconstruction is the goal. You should probably read the manual section on CTF Correction to get the proper background before trying to proceed.
There are two methods for determining CTF parameters in EMAN. The 'classic' method is to use 'ctfit' to manually determine the parameters of each micrograph. The other method is fully automatic, but requires that you have a 1D structure factor file for your molecule.
The trick, of course, is where to get a 1D structure factor. Unfortunately this isn't a trivial question. The first and best option is to perform an x-ray solution scattering experiment at a small-angle x-ray scattering beamline. Of course, this requires a substantial amount of protein at fairly high concentrations, as well as time on an appropriate beamline. Since this won't be an option in most cases we need to look for alternative methods.
As it happens, if you take a sampling of different structures from the PDB, generate 1D structure factors from each, and plot them all on a semilog scale, you immediately notice that at resolutions higher than ~15 A, all proteins have a very similar structure factor. There will be subtle variations depending on the secondary structure distribution of the protein, but the overall shape of the curves at high resolution is remarkably consistent. The low resolution section of the curve, however, varies considerably with the tertiary and quaternary structures of the protein. However, it is possible, using CTFIT, to (manually) simultaneously fit the power spectra of several curves collected at different defocuses. This is not a trivial process to explain, however. If you wish to attempt this, contact sludtkebcm.tmc.edu for assistance.
Run ctfit and make sure the microscope voltage, Cs and A/pix values are correct. Then, using the 'Read Clip Set' item 'file' menu, read the particles from each micrograph into ctfit. Alternatively, you can invoke ctfit with a list of image files to open when you run the program. Each file you read will appear as a separate item in the list in the upper left of the control panel. For each file displayed in this list, 2 lines will be drawn in the plot window. One line will be smooth, and one line will be somewhat jagged. The smooth line represents the current CTF model based on the parameters set with the top 9 sliders in the control panel. The 'jagged' line represents the power spectrum of the images you read in. You will probably want to read the manual section on ctf parameter determination in ctfit before proceeding.
You will now need to determine the 8 CTF parameters for each micrograph. This is a nontrivial process, and is difficult to describe. The best description currently exists in the ctfit documentation mentioned above. The suggested method is to use x-ray scattering data for your specimen if you have it. Of course, you probably don't, in which case, for optimal results, you'll need to generate a predicted structure factor from your data. We used to suggest preparing a structure factor from a PDB file (and this technique may still be suitable for some uses). However, in the current EMAN version we now suggest making more 'aggressive' use of the structure factor file, and PDB -> structure factor problems suffer from significant solvation artifacts. We are not currently aware of any software that can solve this problem. The best current technique is to perform simultaneous fitting of 3-5 micrographs to produce a reasonable low-resolution structure factor, then combine this with a canonical solution scattering curve using sfmerge.py. Again, the details of this process are beyond the scope of this document. You might consider looking through the material from the Dec 2002 EMAN workshop as well.
EMAN does not currently do astigmatism correction, so if some images are astigmatic or have a significant amount of drift, they should be excluded from the reconstruction.
Once the parameters have been determined, highlight the first data set, and use the 'Phase Correct' item on the 'Process' menu. Repeat this process for the other data sets. This will generate a new file for each data set with '.fix' inserted in the name. All of the data in these files has now been phase corrected, and the CTF parameters you determined have been stored in the headers of each particle image. You will now use these '.fix' images for the remainder of the processing.
Note: there is an alternative to using 'Phase Correct' one each image manually. A command called applyctf can perform the same function. Just invoke applyctf with the 'flipphase setparm' options, along with one method of specifying the CTF parameters. Please read the program documentation before attempting this.
At this point, you should also make a note of the maximum resolution of your images. One of the 8 parameters you determined for each image is the envelope function width (which can also be displayed as a B factor). When displayed as 'Envelope', this number represents approximately the highest resolution you are likely to achieve in a reconstruction using this data set. Record the average value of this number for a few of the close to focus images for use in step 2.
This is a simple step. Take all of the image files you are going to use in your reconstruction and combine them into a file called 'start.hed'. For example, if you have data files: 2345.fix.hed, 2346.fix.hed and 2347.fix.hed, you would do (proc2d appends to output files):
rm start.hed start.img proc2d 2345.fix.hed start.hed proc2d 2346.fix.hed start.hed proc2d 2347.fix.hed start.hedJust to keep things neat, at this point, you might want to make a subirectory for all of the raw data. eg :
mkdir raw-data mv * raw-data (ignore the warning message this produces) mv raw-data/start.* .
Note: There is a widespread problem with single files larger than 2 Gb in size. While EMAN contains the necessary code to deal with such files, in many cases the operating system may have problems. So, if you suspect that your image data may exceed this value, it is a good idea to use the workaround described in the ctfitEMAN FAQ.
Near the origin in Fourier space, there is a very strong component due to the structure factor and incoherent scattering. This term is so strong that interpolation errors here could potentially interfere with alignment in the reconstruction process. However, EMAN now contains built in factors that largely compensate for this effect. Nonetheless, some people may like to apply a small high-pass filter to the particles before reconstruction. Usually 1 pixel is sufficient:
proc2d start.hed start.hed hp=1 inplace [invert]The hp option does high-pass filtering, and the inplace option tells proc2d not to append to the output file, but to overwrite the input images in the same location in the file. If necessary, add the invert option to reverse the density of your particles. EMAN assumes that positive values (white) indicate high density. For cryo images, that means the protein should appear white against the water background. Use invert if your protein looks darker than the background.
At the end of the reconstruction, the unhp= option in proc3d can be used to undo the highpass filter.
An alternative to this centering technique is to use the new multireference-based automatic boxing routine in 'boxer', which does a pretty good job of centering. Unfortunately, generating the appropriate references requires a preliminary 3D model. If you have a preliminary model, you can use makeboxref.py to generate the necessary references. If you decide to use cenalignint, run:
cenalignint start.hed mask=<mask> [frac=<num>/<denom>]This program will read ALL of the particles into memory, and effectively make 2 copies of each. That means if you do an 'iminfo start.hed', your computer should have 3 times this much physical memory. If this is not the case, you should use the frac=<n>/<d> option. This causes only a fraction of of the data to be processed. For example, if you have 1/3 as much memory as you need, you'd do:
cenalignint start.hed maxshift=<max> frac=0/3 cenalignint start.hed maxshift=<max> frac=1/3 cenalignint start.hed maxshift=<max> frac=2/3Replace <max> with the maximum shift, in pixels, that should be used to center the particles. If you don't specify one, 1/4 of the box size will be used. If the particles are already fairly well centered, using a small value here will prevent erroneous centering with large translations.
This program will generate 3 new image files: ali.img contains the centered particles after processing. bad.img contains the particles that were rejected because of ambiguous alignment. Finally, avg.img contains the average images after each iteration of the alignment. This third file can be examined to determine the size of your particle. Find the radius a pixel or two outside the outermost whitish ring in the last image in avg.hed. This is the mask radius you should use from here on. Go back to 'step 1' in eman and enter the correct value if you haven't already.
Once you're satisfied with the results of the centering, copy ali.hed/img over start.hed/img. If you're concerned about being able to retrace your steps, you may wish to make a copy of start first. The main reason for this step is to get the centering good enough that you can reduce the box size somewhat. You probably still want to leave about 15% padding around your particle. So, if your maximum particle dimension is 64 pixels, and you used a 100 pixel box, you might reduce this to 80 pixels now (remember this number should be divisible by 8), like so:
proc2d ali.hed start.hed clip=80 rm ali* avg* bad*Note that it's not necessary to get perfect centering, just good enough so the particles don't get chopped off at the edge of the box. The smaller box size is very important for speed. A 20% box size reduction may mean as much as a factor of 2 increase in reconstruction speed.