Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2011-06-30 19:43:42
Size: 7575
Editor: SteveLudtke
Comment:
Revision 14 as of 2011-07-13 15:43:56
Size: 12639
Editor: SteveLudtke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
on just about any GUI window will bring up a powerful 'control-panel' for manipulating the image
an
d display. However, in addition to the 2-D image display widgets in EMAN1, EMAN2 includes a powerful
on just about any window will bring up a powerful 'control-panel' for that window. In addition to the 2-D image display widgets in EMAN1, EMAN2 includes a powerful
Line 8: Line 7:
Chimera, they do provide users with a quick way of looking at their 3-D results. Chimera, they do provide users with a quick way of looking at 3-D maps and other 3-D data.
Line 10: Line 9:
Here is a list of some of the most important points:
Also :
Line 14: Line 12:
 * All EMAN2 programs will respond to '-h' or '--help' at the command line, and do not require a GUI to view.  * All EMAN2 programs will respond to '-h' or '--help' at the command line (no GUI required to view this help).
Line 18: Line 16:
 * EMAN2 uses an [[EMAN2/DatabaseWarning|embedded (no database server) database system]] for storing a lot of image data and metadata. This system is both extremely useful and powerful, and at times very frustrating and irritating.  * EMAN2 uses an [[EMAN2/DatabaseWarning|embedded (no database server) database system]] for storing a lot of image data and metadata. This system is extremely useful and powerful, but at times can also be very frustrating and irritating.
Line 28: Line 26:
||EMAN1||EMAN2||Comments|| ||'''EMAN1'''||'''EMAN2'''||'''Comments'''||
Line 41: Line 39:
There are, of course, many others as well. There are, of course, many others.
Line 44: Line 42:
In EMAN1, every different filter or option had its own name. For example in proc2d, you had 'lp' for a low-pass Gaussian filter or 'tlp' for a sigmoidal filter. In refine, you would pick between 'phasecls', 'fscls' or 'dfilt'. Every time we wanted to add a new capability, we had to code it into every program and implement new option names, etc. In addition, when using 'lp' you also had to know that the value you specified was a radius in Fourier pixels, unless you also specified apix= in which case it would be 1/half width in A. This is messy, and tends to cause mistakes. In EMAN1, each filter or option had its own name. For example in proc2d, you had 'lp' for a low-pass Gaussian filter or 'tlp' for a sigmoidal filter. In refine, you would pick between 'phasecls', 'fscls' or 'dfilt'. Every time we wanted to add a new capability, we had to code it into every program and implement new option names, etc. In addition, when using 'lp' you also had to know that the value you specified was a radius in Fourier pixels, unless you also specified apix= in which case it would be 1/half width in A. This is messy, and tends to cause mistakes.
Line 67: Line 65:

== Everything is Saved and (hopefully) Defined ==
While EMAN1 did preserve some of the information generated during refinement, there were some omissions that people found
frustrating. In EMAN2, we try to preserve everything computed during the refinement (with a few impractical exceptions).
While this can take a lot of extra disk space during processing, you are always free to delete any intermediate files you
don't want and increasingly, disk space is cheap. The [[EMAN2|EMAN2 Wiki]] contains pages documenting everything we store:
 * [[Eman2Metadata|metadata stored in image headers]]
 * [[Eman2AppMetadata|metadata associated with a project]]
 * [[EMAN2/Concepts|information on intermediate files, conventions, etc.]]

== Where Have the LST files gone ? ==
While EMAN2 can read EMAN1 style LST files, they are not used in any of EMAN2's standard processes. Instead, there is the concept of a 'virtual database'. EMAN2 stores most image data in a serverless database system based on BerkeleyDB. These database files can contain metadata (information about the images) as well as data (the images themselves). In a 'virtual database', the metadata is stored, but the image data is drawn from a different database. This mechanism is used for 'sets' in the EMAN2 workflow, permitting you to try processing your data using various subsets of the data. It is worth taking a little time to [[EMAN2/DatabaseWarning|read about the database]].

== Workflow ==
EMAN2 has adopted a workflow system for most common operations: ''e2workflow.py''. This system can take you step by step through processes such as single particle reconstruction, single particle tomography, random conical tilt, etc.

== Browser ==
''e2display.py'' is a file browser and display program, which can examine any supported file, including the BerkeleyDB database files. It can also be launched from the workflow interface. When browsing files, remember that right-clicking on a file will bring up a menu of options other than the default (double-click) visualization.

== Single Particle Refinement in EMAN2 vs EMAN1 ==
In this section, we consider how traditional single particle refinements worked in EMAN1, and how they now work in EMAN2. One of the largest differences is, due to many user requests, EMAN2 now saves all intermediate information,
and leaves it to the user to delete things they don't need (after all, disk space has become relatively cheap). Preserving this information also permits a number of new algorithms to be considered, which were not feasible
in EMAN1.

'''The refinement strategy in EMAN1 is:'''

 1. start with projections & and initial 3-D model
 1. reproject the 3-D model (''project3d'')
 1. reference-based classification of particles (''classesbymra'')
 1. iterative class-averaging (''classalignall'' -> ''classalign2'')
 1. reconstruction by direct Fourier inversion (''make3d'')
 1. post-processing (masking, mass adjustment, filtration)
 1. iterate -> 2

'''EMAN2, without the ''twostage'' option, is very similar:'''
 1. start with projections & and initial 3-D model
 1. reproject the 3-D model (''e2project3d'')
 1. reference-based classification of particles
  a. compute a similarity matrix between the particles and the projections (''e2simmx'')
  a. classify the particles based on the similarity matrix (''e2classify'')
 1. iterative class-averaging (''e2classaverage'')
 1. reconstruction by direct Fourier inversion (''e2make3d'')
 1. post-processing (masking, mass adjustment, filtration)
 1. iterate -> 2

'''Finally, we consider how the ''twostage'' option, which increases overall speed by 2-30x, impacts the process:'''
 1. start with projections & and initial 3-D model
 1. reproject the 3-D model (''e2project3d'')
 1. reference-based classification of particles
  a. compute a similarity matrix between the particles and the projections (''e2simmx2stage'')
   i. compute a similarity matrix between all of the projections (''e2simmx'')
   i. identify a subset of projections which are the most similar
   i. align and average the similar projections together to produce reduced representation for initial classification''
   i. compute a similarity matrix between the particles and the reduced set of averaged projections (''e2simmx'')
   i. Identify the best N reduced representation projections for each particle to identify the specific orientations we must check
   i. Compute the normal similarity matrix between particles and the full projections, but this matrix is sparsely populated (this sparseness is where the time savings occurs)
  a. Classify the particles based on the similarity matrix (''e2classify'')
 1. iterative class-averaging (''e2classaverage'')
 1. reconstruction by direct Fourier inversion (''e2make3d'')
 1. post-processing (masking, mass adjustment, filtration)
 1. iterate -> 2

'''There are many output files produced by refinement in EMAN2, following its strategy of keeping everything, unless the user explicitly removes it,
These files are documented in the last section [[EMAN2/Concepts|here]].'''

An overview of changes between EMAN1 and EMAN2

We have tried to preserve as many of the conventions from EMAN1 as possible, to limit the difficulty in making the transition. For example, just like EMAN1, middle-clicking (alt-click on Mac) on just about any window will bring up a powerful 'control-panel' for that window. In addition to the 2-D image display widgets in EMAN1, EMAN2 includes a powerful set of 3-D display widgets. While in no way are these designed to compete with dedicated packages like Chimera, they do provide users with a quick way of looking at 3-D maps and other 3-D data.

Also :

  • All EMAN2 programs start with 'e2' and end with '.py' (because they are python scripts and Windows requires this)
  • Most EMAN1 programs have a direct EMAN2 analog, for example, proc2d has become e2proc2d.py, and iminfo has become e2iminfo.py
  • All EMAN2 programs will respond to '-h' or '--help' at the command line (no GUI required to view this help).
  • All EMAN2 command-line programs, including the GUI, are written in Python, which gives advanced users a lot of flexibility without having to recompile EMAN2 from source.
  • The command-line options in EMAN2 have adopted the standard Unix style rather than the EMAN1 style, for example 'proc2d abc.hed def.spi clip=64,64' has become 'e2proc2d.py abc.hed def.spi --clip=64,64'. In many, but not all, cases, the option names are the same.
  • While EMAN2 supports all of the file formats (and more) from EMAN1, if you wish to make sure all of the metadata is stored during processing, you must use either HDF or the BDB database.

  • EMAN2 uses an embedded (no database server) database system for storing a lot of image data and metadata. This system is extremely useful and powerful, but at times can also be very frustrating and irritating.

  • EMAN2 supports a modular system for parallel processing, and can support:
    • multi-cpu (threaded) parallelism on the local computer
    • ad-hoc, distributed processing (somewhat like seti-at-home)
    • MPI (though there are a few specific requirements)

Translation Table

Here are a few of the more common EMAN1 commands and their EMAN2 equivalents:

EMAN1

EMAN2

Comments

proc2d

e2proc2d.py

ordering of command-line options matters in EMAN2, and it is possible to specify a series of ordered image processing operations in one command. In EMAN2 can work with 3d MRC image stacks in addition to traditional multi-image files.

proc3d

e2proc3d.py

see proc2d. In EMAN2 can support sets of 3-D volumes in a single file (HDF and BDB only)

iminfo

e2iminfo.py and e2bdb.py

e2bdb.py works only with BDB databases, but has similar functions. e2iminfo.py also can work with BDB databases

speedtest

e2speedtest.py

The numbers from EMAN1 and EMAN2 are not directly comparable, but have a similar range

refine

e2refine.py

MANY more options in EMAN2. In particular, note the --twostage option which can produce speedups of 5-25x while retaining accuracy

refine2d

e2refine2d.py

Much faster than EMAN1, with some minor changes in operation.

multirefine

e2refinemulti.py and e2classifyligand.py

Files are organized very differently than EMAN1, but functions in a similar way (though much faster). e2classifyligand is a different program, but can be used for 2-way splits of data

eman

e2workflow.py

The workflow interface replaces EMAN1's 'custom tutorials'

boxer

e2boxer.py

Works completely differently than EMAN1. Same overall purpose and name.

v2, v4, eman browser

e2display.py

e2display provides a browser or can be launched on a single image file from the command-line, and shows files of any supported type

ctfit, fitctf

e2ctf.py

CTF determination is now fully automatic including structure factor determination for 90% of specimens. Easier to use from the workflow

There are, of course, many others.

Everything is Modular

In EMAN1, each filter or option had its own name. For example in proc2d, you had 'lp' for a low-pass Gaussian filter or 'tlp' for a sigmoidal filter. In refine, you would pick between 'phasecls', 'fscls' or 'dfilt'. Every time we wanted to add a new capability, we had to code it into every program and implement new option names, etc. In addition, when using 'lp' you also had to know that the value you specified was a radius in Fourier pixels, unless you also specified apix= in which case it would be 1/half width in A. This is messy, and tends to cause mistakes.

What's the alternative, you ask. The answer, a modular system. In EMAN2, each class of algorithm, such as filters (processors), aligners, cmps (similarity metrics), etc. maintains a list of all available algorithms, and any program using one of these categories can use any algorithm from the list. Each modular operation takes a list of parameters, and the parameter names are matched whenever possible.

For example, say you had a 3-D model and you wanted to high pass-filter it, mask it with a sharp spherical mask, then low pass filter the final result. In eman1, the only safe way to do it was a series of 3 commands:

proc3d model.mrc model.mrc hp=100 apix=2
proc3d model.mrc model.mrc mask=42
proc3d model.mrc model.mrc lp=10 apix=2

In EMAN2, it can be done with a single command, and, if more verbose, the options are clear and readable.

e2proc3d.py model.mrc model.mrc --process=filter.highpass.gauss:cutoff_freq=.01 --process=mask.sharp:outer_radius=42 --process=filter.lowpass.gauss:cutoff_freq=.1

and if you wanted to use a hyperbolic tangent lowpass filter instead of a gaussian, you would simply replace 'filter.lowpass.gauss' with 'filter.lowpass.tanh'. The parameter would be exactly the same.

Of course, if the system is modular, you need some mechanism to find out what the available options are. In the GUI interface, it will present you with a menu of the available options. However, for the command line, or if you want the detailed documentation for any particular option, you use the e2help.py command. For example to list all of the processors, which includes, filters, masks, mathematical operations, etc (178 of them at last count), you would type e2help.py processors. This will give a list of 1 processor per line with parameter names. If you want more details, and a definition of each parameter, then 'e2help.py processors -v 2' will give you a more detailed listing.

Similarly, say you want to specify what similarity metric is used when comparing particles to projections during classification. You can get a list by saying e2help.py cmps. To get a list of the categories available in e2help, just type 'e2help.py' with no options and it will list the categories (at present: processors, cmps, aligners, averagers, projectors, reconstructors, analyzers, symmetries and orientgens).

This system currently embodies over 240 different algorithms for a wide range of different purposes. If you have some image processing task, chances are that e2proc2d.py or e2proc3d.py with the --process option can meet your need. In addition, there is a GUI interface e2filtertool.py which allows you to graphically create filter chains and adjust their parameters interactively. Don't know how much to low pass filter that model ? Run e2filtertool.py on it, and you can play with different filters and parameters to your heart's content.

Everything is Saved and (hopefully) Defined

While EMAN1 did preserve some of the information generated during refinement, there were some omissions that people found frustrating. In EMAN2, we try to preserve everything computed during the refinement (with a few impractical exceptions). While this can take a lot of extra disk space during processing, you are always free to delete any intermediate files you don't want and increasingly, disk space is cheap. The EMAN2 Wiki contains pages documenting everything we store:

Where Have the LST files gone ?

While EMAN2 can read EMAN1 style LST files, they are not used in any of EMAN2's standard processes. Instead, there is the concept of a 'virtual database'. EMAN2 stores most image data in a serverless database system based on BerkeleyDB. These database files can contain metadata (information about the images) as well as data (the images themselves). In a 'virtual database', the metadata is stored, but the image data is drawn from a different database. This mechanism is used for 'sets' in the EMAN2 workflow, permitting you to try processing your data using various subsets of the data. It is worth taking a little time to read about the database.

Workflow

EMAN2 has adopted a workflow system for most common operations: e2workflow.py. This system can take you step by step through processes such as single particle reconstruction, single particle tomography, random conical tilt, etc.

Browser

e2display.py is a file browser and display program, which can examine any supported file, including the BerkeleyDB database files. It can also be launched from the workflow interface. When browsing files, remember that right-clicking on a file will bring up a menu of options other than the default (double-click) visualization.

Single Particle Refinement in EMAN2 vs EMAN1

In this section, we consider how traditional single particle refinements worked in EMAN1, and how they now work in EMAN2. One of the largest differences is, due to many user requests, EMAN2 now saves all intermediate information, and leaves it to the user to delete things they don't need (after all, disk space has become relatively cheap). Preserving this information also permits a number of new algorithms to be considered, which were not feasible in EMAN1.

The refinement strategy in EMAN1 is:

  1. start with projections & and initial 3-D model

  2. reproject the 3-D model (project3d)

  3. reference-based classification of particles (classesbymra)

  4. iterative class-averaging (classalignall -> classalign2)

  5. reconstruction by direct Fourier inversion (make3d)

  6. post-processing (masking, mass adjustment, filtration)
  7. iterate -> 2

EMAN2, without the twostage option, is very similar:

  1. start with projections & and initial 3-D model

  2. reproject the 3-D model (e2project3d)

  3. reference-based classification of particles
    1. compute a similarity matrix between the particles and the projections (e2simmx)

    2. classify the particles based on the similarity matrix (e2classify)

  4. iterative class-averaging (e2classaverage)

  5. reconstruction by direct Fourier inversion (e2make3d)

  6. post-processing (masking, mass adjustment, filtration)
  7. iterate -> 2

Finally, we consider how the twostage option, which increases overall speed by 2-30x, impacts the process:

  1. start with projections & and initial 3-D model

  2. reproject the 3-D model (e2project3d)

  3. reference-based classification of particles
    1. compute a similarity matrix between the particles and the projections (e2simmx2stage)

      1. compute a similarity matrix between all of the projections (e2simmx)

      2. identify a subset of projections which are the most similar
      3. align and average the similar projections together to produce reduced representation for initial classification

      4. compute a similarity matrix between the particles and the reduced set of averaged projections (e2simmx)

      5. Identify the best N reduced representation projections for each particle to identify the specific orientations we must check
      6. Compute the normal similarity matrix between particles and the full projections, but this matrix is sparsely populated (this sparseness is where the time savings occurs)
    2. Classify the particles based on the similarity matrix (e2classify)

  4. iterative class-averaging (e2classaverage)

  5. reconstruction by direct Fourier inversion (e2make3d)

  6. post-processing (masking, mass adjustment, filtration)
  7. iterate -> 2

There are many output files produced by refinement in EMAN2, following its strategy of keeping everything, unless the user explicitly removes it, These files are documented in the last section here.

EMAN2/Eman1Transition/Eman1v2 (last edited 2014-04-11 12:54:55 by SteveLudtke)