There are several possible causes for this. The first possibility is
that your particles in start.hed and your initial 3D model in
threed.0a.mrc aren't the same size. Do an 'iminfo start.hed' and
'iminfo
threed.0a.mrc' and make sure they're the same size in pixels.
Aside from that, by far the most likely problem is that you used
the 'ctfc=' or 'ctfcw=' options in the 'refine' command, but didn't
properly prepare the input particles. With 'ctfit', 'fitctf' and/or
'applyctf'. The answer here is RTFM (read the _ manual). If you want to
see if your particles are prepared for CTF correction run the 'eman'
file browser, and look at the start.img file. In the text box below the
image, you should see something like:
'!--2.38 252 1.22 0.15 0 6.84 2.54 1.43 400 4.1 2.7'
Your numbers will be different, of course, but there should be a row
of numbers beginning with '!-'. If you don't see this, then your images
haven't been properly phase flipped.
EMAN supports a lot of different formats, and it does it
transparently. That is, in general, all EMAN program can read any
image in a wide variety of formats without you having to do anything
special. EMAN currently supports reading
SPIDER, IMAGIC, MRC, Gatan DM2, Gatan DM3, PIF and ICOS formats. TIFF
images are
now natively supported using libtiff. You should now be able to
directly
read 16 bit tiffs. Most generic image formats like TIFF, GIF, PGM, BMP,
PNG, etc. are also supported if you have the IMAGEMAGICK package
installed on your machine. Due to Gatan constantly changing things, we
cannot guarentee that DM3 file reading will be perfect.
For image writing, EMAN supports most of the above formats as well.
However, most EMAN programs default to IMAGIC format (for 2D) and MRC
format (for 3D). To convert to a different format, use 'proc2d' for 2D
images and 'proc3d' for 3D images.
Some of you may also be aware of the 'byte ordering' issue. Different
machines (SGI vs Intel, for example) store their numbers in the
opposite
byte-order. Often this means files generated on one machine will be
unreadable on machines using the opposite convention. However, EMAN
handles this problem as well. Any supported image can be read
regardless
of byte-order. When writing images EMAN uses the native byte order of
the machine the software is being run on.
This can be done in EMAN, though it
doesn't use rotational power spectra. Real-space approaches are more
accurate, though proper centering is critical. Past attempts at the
rotational power spectrum approach (on several test cases) showed it to be
unreliable and imprecise.
First, center the particles: cenalignint particles.hed maxshift=<pixels> (warning, this can use a lot of memory. You should have 3x as much ram as the size of the file you operate on. If not, use the frac= option) - or - proc2d particles.hed centered.hed <center | acfcenter>
One of those three should do a decent job centering your particles (they do not need to be in the same orientation).
Then take the centered data and run : startcsym centered.hed <# top view particles to keep> sym=<trial symmetry>
While this is also designed to look for side views, it will find top views (with the corresponding symmetry) very nicely.
So, pick a trial symmetry, and run startcsym. Then look at the first 2
image in classes.hed and the first image in sym.hed. The first image in
classes.hed is an unsymmetrized particle with the strongest specified
symmetry. The first image in sym.hed is a symmetrized version. If the two
look the same and have a visible symmetry, you've probably got the right
answer. Repeat for all possible symmetries. The answer will usually stand
out very clearly, and can be presented in publication by showing the 2
images side-by-side for each trial symmetry. Note that there are some
known situations (detached virus portal complexes, for example) where a
single data set may contain particles with multiple symmetries.
Also see related question below
The documentation really needs to address this, but doesn't. There are two reasons for this, though. First, it is really difficult to describe this adequately textually. Second, you really need to have a sound understanding of the mathematics being used in CTF correction to use this method properly and avoid doing bad things to your structure (without realizing it). That said, I've found that it may not be quite so bad as I make it out to be.
One other note. Many people (myself included) have suggested generating a structure factor curve computationally from a PDB structure of a similar protein. As it turns out, this is a very difficult thing to do, largely because solvent effects have a profound effect on the overall shape of this curve. Current software (2003) used by the solution scattering community can accurately predict peak locations, etc., but it doesn't have the correct overall shape, and should not be used for EM work. Perhaps this situation will improve in the future.
Still, there is a way to get the necessary curve. It isn't perfect, but it's probably adequate in most cases. The basic idea is to use several sets of particles from images at different defocuses. You then simultaneously fit the CTF of these data sets such that the CTF curve is a reasonable fit, and simultaneously the predicted structure factor for all of the curves matches pretty well at low resolution. This process must be done manually using ctfit, but once you have a result, you should be able to do most of your fitting with the automated program 'fitctf'.
The optimal way to approach this problem is to have some sort of solution scattering curve on-hand. This curve is simply used for scaling the data, and getting some general idea of a reasonable B-factor and amplitude contrast. It will not impose it's features on the final structure factor. This is also not strictly necessary, it is possible to proceed without one. The 'groel.sm' curve (native GroEL structure factor) is probably adequate for most cases. Then do the following:
Good questions! To find out how many particles were included in the class-averages, type (for example) 'iminfo classes.4.hed all'. The last number on each line is the number of particles included in that class-average. At the end a total number of particles included in the classes file will be shown.
Now this is where it gets tricky. If all of the class-averages were used in the reconstruction, you'd be done. However, some class-averages may get excluded (depending on the value you select for hard=). In addition, if you use 3dit= or 3dit2=, some class-averages may get excluded. However, they are not necessarily the same class averages that are excluded from the original make3d reconstruction. make3d will output how many original particles were included in the final reconstruction as part of its output on the screen. This is probably the best answer you'll get, but it isn't stored anywhere. Generally when I talk about the number of particles used in a reconstruction, I'm referring to the 'iminfo classes.4.hed all' method.
The next part is a little trickier. A complete record of particles excluded from the class-averages is kept (along with classification information) for all iterations, in 'particle.log'. This file has a variety of different information in it, depending on the first character of the line. Lines starting with 'X' indicated excluded particle numbers. If going through this file is too much of a pain, you can rerun the 'classalignall' command with the 'badfile'. This will create a set of files containing the excluded images for whatever options you provide to classesbymra.
Note that the particle.log file can also be used to recreate the 'cls' files from any particular iteration using the 'clsregen' command.
Not exactly. The curve from your data (the power spectrum) is equal to noise + ctf * envelope * structure factor. The curve you're fitting with is just noise + ctf * envelope. The structure factor is missing, and is very important. This problem can be tackeled three ways, described in : http://ncmi.bioch.bcm.tmc.edu/~stevel/EMAN/doc/ctfc/ctfc.html If you're in the situation where you don't have x-ray data (which you presumably don't), the best results are generally achieved via the following process:
1) you will need several data sets at different defocuses
2) read the first data set into CTFIT
3) set the 'amp' to zero, then use the 4 noise sliders to fit the
background by passing through the zeros of the the ctf, and matching the
high-resolution end of the curve (where the zeros are no longer visible).
4) Increase the amplitude and adjust the defocus and envelope function as
best you can. You should be able to determine the defocus quite
accurately. The envelope function is somewhat arbitrary.
5) Read in the second data set, and repeat this process for it.
6) Bring up a second plot window, and set it to display the structure
factor. This will show the structure factor for each displayaed data set,
calculated from the data. This is not a very accurate calculation, but
it's generally good enough.
7) Now, without spoiling the fitting you've just done, adjust all of the
parameters of the 2 data sets such that the structure factor curves match
as well as you can. Don't worry too much about the divergence at high
frequency. Work on getting a match out to the first or second zero, then
just try to get the general trend at high frequency to be the same.
8) continue to add the other data sets in the same way.
9) When you've got them all fitted satisfactorally, use the 'phase
correct' option as described in the instructions.
Gosh, I hope so! Seriously, there are a few issues to be aware of. First, one factor often of concern is the fact that EMAN generally keeps the entire set of particles in a single image file stack A really big reconstruction might cause this file to become bigger than 2 gigabytes, which is a problem for some parts of certain operating systems and a lot of other software packages. While EMAN does the proper things to support files larger than 2G, sometimes the underlying system still won't allow it. However, there is a good workaround. EMAN supports a special file format called the LST format. This format is basically a text file containing a list of images in other files. For example, if you have too much data to put into a single start.hed/img file, do the following:
You now have 2 files (start.hed/img) which appear to eman to be equivalent to a big imagic file, without any of the normal 2G limitations. Note, however, that you cannot add new images to this file with proc2d. It isn't really an image file.
There are other issues. Some are of concern when there are a lot of pixels in each image and others are of concern when there are a lot of images. In the first case, memory on the computer is the biggest problem. For example, if you were trying to reconstruct a 512x512x512 volume, each volume dataset requires 512 MB. Several programs require enough memory for 2 or 3 3D models. So any machine used to process this dataset would need to have at least 2 GB of RAM. There are too many issues involved to cover all possibilities here. In general, I'd say yes, EMAN can handle really big problems. If you run into problems, email me, and I'll try to help you resolve them.
The answer depends on the source images. EMAN reads most EM file formats directly. If you have each particle in a separate image file, for example, img001.img, img002.img, etc., then the following command would do it (zsh):
foreach i (img*.img)
proc2d $i start.hed
end
-or- (csh)
for i in img*.imgIt doesn't matter what file format the source images are in. Any EMAN program transparently reads any supported image type (byte order doesn't matter either). If the images are already in a Spider stack file called, for example, part.spi, the following would do it:
proc2d $i start.hed
proc2d part.spi part.hed
All of the 2D EMAN commands currently write Imagic files by
default. 3D commands write MRC format by default. There are options in
proc2d and proc3d for writing to other file formats.
Tricky. There is a possibility that the answer is "you can't". In most cases, however, it's possible to get a pretty accurate answer. In cases where the symmetry of the particle is unknown, the ability to distinguish between different symmetries is proportional to the overall contrast in the image. In cryo-EM there is always a tradeoff between contrast and resolution, so the best thing to do if you're trying to determine symmetry is exactly the opposite of what you'd do for high resolution. That is, take some micrographs in negative stain, or in ice, fairly far from focus at low voltage. This will provide the best overall contrast for an attempt to determine symmetry.
Once you've collected high-constrast data, there are a number of techniques to try to determine symmetry. for particles with a suspected Cn or Dn symmetry, startcsym is a good starting point. By running it several times with each possible symmetry you can see how well each one fits the data. Frequently comparing the symmetrized model in sym.hed with the class-averages in classes.hed will give the first indications of the true symmetry.
The next step is to try to refine each of the possible initial models and see if they 'fall apart' during refinement. This should resolve the symmetry question IF you have sufficient contrast, and IF your particles are in fairly random orientations. If the contrast is too low, or there is a strongly preferred orientation, however, an accurate answer may not be possible.
If the first technique fails, there are other possibilities, like using multivariate statistical analysis on an aligned set of raw particles. These issues are too complicated for discussuion in this FAQ.
No, the docs don't really explain all of the text output at this point. I can tell you what the numbers are, but I don't think it's going to help you very much. While you may be able to judge the quality of an individual particle when compared to a good model using the quality factor, they really won't tell you what you're trying to find out. There are just too many variables involved.
If you're anxious that things are going too slow, the best approach is to increase the angular step for the first couple of refinement iterations. For an asymmetric model you could go as high as 15 or 18 degrees for the first round or two. That should be enough to tell you if the model is reasonable.
As we tried to impress in the documentation, asymmetric models can be very tricky. It depends on their overall shape. If, for example you have something 'L' shaped, then getting a good starting model shouldn't be difficult at all. However, if you have something that's basically round with a few lumps, it may actually be impossible to generate an unabiguous accurate starting model. It is actually possible to have a set of random projections which can produce several DIFFERENT models, all of which are consistent with the data at some resolution.
StartAny uses c1startup, so no, there's no difference. The routine it uses isn't all that great. For 'easy' models it will work pretty well, but in tough cases, it may just come up with something completely wrong. In these cases, there are really only two good solutions in EMAN right now:
1) if your model may have a pseudosymmetry, ie - it's vaguely cylindrical in shape or something, you can often use the startcsym routine and get something that's good enough to start.
2) Final resort. Use tomography. If you're comfortable with it, then you might actually start here. Anyway, the idea is simple enough, take a tilt series (probably have to use stain or glucose for this). EMAN has a few experimental programs for generating a 3D model from the tilt series and aligning/averaging several such 3D models to generate a starting model. Even this approach isn't perfect (at least the simple implementation EMAN uses isn't), but we have used is with some success on a few projects.
To answer your question anyway: the output from classesbymra looks like:
0 -> 256 (506.86)
1 -> 296 (508.74)
2 -> 278 (502.86)
3 -> 273 (504.82)
The first line is saying that particle 0 (the first one in start.hed) looked the most like projection number 256. The quality factor was 506.86. The interpretation of the quality factors can depend on the shape of your model and the box size, etc.
A good question. This feature is not well documented. The best approach
is:
1) generate a set of projections 'proj.hed'
2) 'ctfit proj.hed'
3) On the 'Process' menu select 'Simulate' -> 'RT CTF Sim'
this will open a file dialog, select proj.hed again
4) A window with a picture of your first projection will appear
as you modify the ctf parameters, the appearance of this
image will
be updated continuously.
5) Set the desired parameters. A reasonable set for a decent FEG scope
is:
200 kev, 1 mm Cs
Noise : 0, 1, 6, 3
Amp Cont: 10
Envelope: 7.5
Defocus: whatever you want, usu 1-4 microns
6) This will set the basic parameters. The one parameter you cannot
define
a 'typical' value for is 'Amp', since it depends on the
amplitude of
your projections (usually you'd start out with a
normalized 3d model).
Basically you adjust 'Amp' to achieve the desired signal
to noise
ratio visually. Once you've applied the ctf to the entire
set of
projections, you can read the simulated set back into
ctfit again and
redetermine the SNR.
7) select 'process' -> 'simulate' -> 'apply ctf 2d'
This will not overwrite the input file. It will create a
new file
with the CTF parameters determined
Note: if you want to do this on many files, you can use 'applyctf' once
you have a good set of parameters.
import EMAN help(EMAN.EMData) help(EMAN.Euler)Here's the script:
# This reads a text file with a space separates Euler triplet # and generates projections from EMAN import * infile=open("eulers.txt","r") lines=infile.readlines() infile.close() # Ok, this next line is not all that transparent, there # are other ways to do this, but it is a useful construct # converts a set of input lines into a list of tuples eulers=map(lambda x:tuple(map(lambda y:float(y)*math.pi/180.0,x.split())),lines) e=Euler() # read the volume data data=EMData() data.readImage("model.mrc",-1) for euler in eulers: e.setByType(euler,Euler.MRC) # -4 is the best real-space projection mode out=data.project3d(e.alt(),e.az(),e.phi(),-4) out.writeImage("proj.hed",-1) # file type determined by extension