EMAN supports a lot of different formats, and it does it transparently. That is, in general, all EMAN program can read any image in a wide variety of formats without you having to do anything special. EMAN currently supports reading SPIDER, IMAGIC, MRC, Gatan DM2, Gatan DM3, PIF and ICOS formats. TIFF images are now natively supported using libtiff. You should now be able to directly read 16 bit tiffs. Most generic image formats like TIFF, GIF, PGM, BMP, PNG, etc. are also supported if you have the IMAGEMAGICK package installed on your machine. Due to Gatan constantly changing things, we cannot guarentee that DM3 file reading will be perfect.
For image writing, EMAN supports most of the above formats as well. However, most EMAN programs default to IMAGIC format (for 2D) and MRC format (for 3D). To convert to a different format, use 'proc2d' for 2D images and 'proc3d' for 3D images.
Some of you may also be aware of the 'byte ordering' issue. Different machines (SGI vs Intel, for example) store their numbers in the opposite byte-order. Often this means files generated on one machine will be unreadable on machines using the opposite convention. However, EMAN handles this problem as well. Any supported image can be read regardless of byte-order. When writing images EMAN uses the native byte order of the machine the software is being run on.
The documentation really needs to address this, but doesn't. There are two reasons for this, though. First, it is really difficult to describe this adequately textually. Second, you really need to have a sound understanding of the mathematics being used in CTF correction to use this method properly and avoid doing bad things to your structure (without realizing it). That said, I've found that it may not be quite so bad as I make it out to be.
One other note. Many people (myself included) have suggested generating a structure factor curve computationally from a PDB structure of a similar protein. As it turns out, this is a very difficult thing to do, largely because solvent effects have a profound effect on the overall shape of this curve. Current software (2003) used by the solution scattering community can accurately predict peak locations, etc., but it doesn't have the correct overall shape, and should not be used for EM work. Perhaps this situation will improve in the future.
Still, there is a way to get the necessary curve. It isn't perfect, but it's probably adequate in most cases. The basic idea is to use several sets of particles from images at different defocuses. You then simultaneously fit the CTF of these data sets such that the CTF curve is a reasonable fit, and simultaneously the predicted structure factor for all of the curves matches pretty well at low resolution. This process must be done manually using ctfit, but once you have a result, you should be able to do most of your fitting with the automated program 'fitctf'.
The optimal way to approach this problem is to have some sort of solution scattering curve on-hand. This curve is simply used for scaling the data, and getting some general idea of a reasonable B-factor and amplitude contrast. It will not impose it's features on the final structure factor. This is also not strictly necessary, it is possible to proceed without one. The 'groel.sm' curve (native GroEL structure factor) is probably adequate for most cases. Then do the following:
Good questions! To find out how many particles were included in the class-averages, type (for example) 'iminfo classes.4.hed all'. The last number on each line is the number of particles included in that class-average. At the end a total number of particles included in the classes file will be shown.
Now this is where it gets tricky. If all of the class-averages were used in the reconstruction, you'd be done. However, some class-averages may get excluded (depending on the value you select for hard=). In addition, if you use 3dit= or 3dit2=, some class-averages may get excluded. However, they are not necessarily the same class averages that are excluded from the original make3d reconstruction. make3d will output how many original particles were included in the final reconstruction as part of its output on the screen. This is probably the best answer you'll get, but it isn't stored anywhere. Generally when I talk about the number of particles used in a reconstruction, I'm referring to the 'iminfo classes.4.hed all' method.
The next part is a little trickier. A complete record of particles excluded from the class-averages is kept (along with classification information) for all iterations, in 'particle.log'. This file has a variety of different information in it, depending on the first character of the line. Lines starting with 'X' indicated excluded particle numbers. If going through this file is too much of a pain, you can rerun the 'classalignall' command with the 'badfile'. This will create a set of files containing the excluded images for whatever options you provide to classesbymra.
Note that the particle.log file can also be used to recreate the 'cls' files from any particular iteration using the 'clsregen' command.
Not exactly. The curve from your data (the power spectrum) is equal to noise + ctf * envelope * structure factor. The curve you're fitting with is just noise + ctf * envelope. The structure factor is missing, and is very important. This problem can be tackeled three ways, described in : http://ncmi.bioch.bcm.tmc.edu/~stevel/EMAN/doc/ctfc/ctfc.html If you're in the situation where you don't have x-ray data (which you presumably don't), the best results are generally achieved via the following process:
1) you will need several data sets at different defocuses
2) read the first data set into CTFIT
3) set the 'amp' to zero, then use the 4 noise sliders to fit the
background by passing through the zeros of the the ctf, and matching the
high-resolution end of the curve (where the zeros are no longer visible).
4) Increase the amplitude and adjust the defocus and envelope function as
best you can. You should be able to determine the defocus quite
accurately. The envelope function is somewhat arbitrary.
5) Read in the second data set, and repeat this process for it.
6) Bring up a second plot window, and set it to display the structure
factor. This will show the structure factor for each displayaed data set,
calculated from the data. This is not a very accurate calculation, but
it's generally good enough.
7) Now, without spoiling the fitting you've just done, adjust all of the
parameters of the 2 data sets such that the structure factor curves match
as well as you can. Don't worry too much about the divergence at high
frequency. Work on getting a match out to the first or second zero, then
just try to get the general trend at high frequency to be the same.
8) continue to add the other data sets in the same way.
9) When you've got them all fitted satisfactorally, use the 'phase
correct' option as described in the instructions.
Gosh, I hope so! Seriously, there are a few issues to be aware of. First, one factor often of concern is the fact that EMAN generally keeps the entire set of particles in a single image file stack A really big reconstruction might cause this file to become bigger than 2 gigabytes, which is a problem for some parts of certain operating systems and a lot of other software packages. While EMAN does the proper things to support files larger than 2G, sometimes the underlying system still won't allow it. However, there is a good workaround. EMAN supports a special file format called the LST format. This format is basically a text file containing a list of images in other files. For example, if you have too much data to put into a single start.hed/img file, do the following:
You now have 2 files (start.hed/img) which appear to eman to be equivalent to a big imagic file, without any of the normal 2G limitations. Note, however, that you cannot add new images to this file with proc2d. It isn't really an image file.
There are other issues. Some are of concern when there are a lot of pixels in each image and others are of concern when there are a lot of images. In the first case, memory on the computer is the biggest problem. For example, if you were trying to reconstruct a 512x512x512 volume, each volume dataset requires 512 MB. Several programs require enough memory for 2 or 3 3D models. So any machine used to process this dataset would need to have at least 2 GB of RAM. There are too many issues involved to cover all possibilities here. In general, I'd say yes, EMAN can handle really big problems. If you run into problems, email me, and I'll try to help you resolve them.
The answer depends on the source images. EMAN reads most EM file formats directly. If you have each particle in a separate image file, for example, img001.img, img002.img, etc., then the following command would do it (zsh):
foreach i (img*.img)
proc2d $i start.hed
end
-or- (csh)
for i in img*.imgIt doesn't matter what file format the source images are in. Any EMAN program transparently reads any supported image type (byte order doesn't matter either). If the images are already in a Spider stack file called, for example, part.spi, the following would do it:
proc2d $i start.hed
proc2d part.spi part.hed
All of the 2D EMAN commands currently write Imagic files by
default. 3D commands write MRC format by default. There are options in
proc2d and proc3d for writing to other file formats.
Tricky. There is a possibility that the answer is "you can't". In most cases, however, it's possible to get a pretty accurate answer. In cases where the symmetry of the particle is unknown, the ability to distinguish between different symmetries is proportional to the overall contrast in the image. In cryo-EM there is always a tradeoff between contrast and resolution, so the best thing to do if you're trying to determine symmetry is exactly the opposite of what you'd do for high resolution. That is, take some micrographs in negative stain, or in ice, fairly far from focus at low voltage. This will provide the best overall contrast for an attempt to determine symmetry.
Once you've collected high-constrast data, there are a number of techniques to try to determine symmetry. for particles with a suspected Cn or Dn symmetry, startcsym is a good starting point. By running it several times with each possible symmetry you can see how well each one fits the data. Frequently comparing the symmetrized model in sym.hed with the class-averages in classes.hed will give the first indications of the true symmetry.
The next step is to try to refine each of the possible initial models and see if they 'fall apart' during refinement. This should resolve the symmetry question IF you have sufficient contrast, and IF your particles are in fairly random orientations. If the contrast is too low, or there is a strongly preferred orientation, however, an accurate answer may not be possible.
If the first technique fails, there are other possibilities, like using multivariate statistical analysis on an aligned set of raw particles. These issues are too complicated for discussuion in this FAQ.
No, the docs don't really explain all of the text output at this point. I can tell you what the numbers are, but I don't think it's going to help you very much. While you may be able to judge the quality of an individual particle when compared to a good model using the quality factor, they really won't tell you what you're trying to find out. There are just too many variables involved.
If you're anxious that things are going too slow, the best approach is to increase the angular step for the first couple of refinement iterations. For an asymmetric model you could go as high as 15 or 18 degrees for the first round or two. That should be enough to tell you if the model is reasonable.
As we tried to impress in the documentation, asymmetric models can be very tricky. It depends on their overall shape. If, for example you have something 'L' shaped, then getting a good starting model shouldn't be difficult at all. However, if you have something that's basically round with a few lumps, it may actually be impossible to generate an unabiguous accurate starting model. It is actually possible to have a set of random projections which can produce several DIFFERENT models, all of which are consistent with the data at some resolution.
StartAny uses c1startup, so no, there's no difference. The routine it uses isn't all that great. For 'easy' models it will work pretty well, but in tough cases, it may just come up with something completely wrong. In these cases, there are really only two good solutions in EMAN right now:
1) if your model may have a pseudosymmetry, ie - it's vaguely cylindrical in shape or something, you can often use the startcsym routine and get something that's good enough to start.
2) Final resort. Use tomography. If you're comfortable with it, then you might actually start here. Anyway, the idea is simple enough, take a tilt series (probably have to use stain or glucose for this). EMAN has a few experimental programs for generating a 3D model from the tilt series and aligning/averaging several such 3D models to generate a starting model. Even this approach isn't perfect (at least the simple implementation EMAN uses isn't), but we have used is with some success on a few projects.
To answer your question anyway: the output from classesbymra looks like:
0 -> 256 (506.86)
1 -> 296 (508.74)
2 -> 278 (502.86)
3 -> 273 (504.82)
The first line is saying that particle 0 (the first one in start.hed) looked the most like projection number 256. The quality factor was 506.86. The interpretation of the quality factors can depend on the shape of your model and the box size, etc.
import EMAN help(EMAN.EMData) help(EMAN.Euler)Here's the script:
# This reads a text file with a space separates Euler triplet # and generates projections from EMAN import * infile=open("eulers.txt","r") lines=infile.readlines() infile.close() # Ok, this next line is not all that transparent, there # are other ways to do this, but it is a useful construct # converts a set of input lines into a list of tuples eulers=map(lambda x:tuple(map(lambda y:float(y)*math.pi/180.0,x.split())),lines) e=Euler() # read the volume data data=EMData() data.readImage("model.mrc",-1) for euler in eulers: e.setByType(euler,Euler.MRC) # -4 is the best real-space projection mode out=data.project3d(e.alt(),e.az(),e.phi(),-4) out.writeImage("proj.hed",-1) # file type determined by extension
Last Modified: