Dealing with image data on disk in EMAN2

EMAN2 supports a variety of mechanisms for dealing with your data on disk. Virtually all cryo-EM file formats are supported as well as some good generic formats. In addition, EMAN2 has a local embedded database storage scheme used heavily during processing. This mechanism is faster than typical direct file access, and permits easy logging of tasks and book-keeping. Finally, we support communications with the EMEN2 OODB, permitting things like directly reading image data from a centralized database for processing.

File Formats

EMAN2 supports the following file formats:

HDF5

R/W

MRC/CCP4

R/W

IMAGIC

R/W

SPIDER

R/W

PIF

R/W

ICOS

R/W

VTK

R/W

PGM

R/W

Amira

R/W

Xplor

W

Gatan DM2

R

Gatan DM3

R

TIFF

R/W

Scans-a-lot

R

LST

R/W

PNG

R/W

Video-4-Linux

R

JPEG

W

To convert from one format to another, the e2proc2d.py and e2proc3d.py programs can be used for 2-D and 3-D images respectively. The basic usage proc2d.py <infile> <outfile> will simply convert from one file format to another. By default, image type for the output file is recognized by file extension. Both programs also have options for specifying file type when it would otherwise be ambiguous.

Any program in EMAN should be able to read/write any of the above file formats seamlessly, though each format may have its own limitations. We attempt to preserve as much metadata as possible, but some formats simply aren't very flexible in this regard. The only format supporting EMAN2's full model for associating attributes with individual images is HDF5, which is the format we encourage for general use and file interchange moving forward. This would be considered the default format for EMAN2. Unfortunately, while HDF5 is exceptionally flexible and portable, its performance on large image stacks is substantially worse than the simpler flat-file formats. For this reason, the primary storage mechanism in EMAN2 for internal processing is a BerkeleyDB-based embedded database system.

EMAN2 Embedded Database

You'll note that whenever you run and EMAN2 program in a new directory, a subdirectory called EMAN2DB is also created. In EMAN1, a hidden file '.emanlog' was created, and this simple file contained a history of all of the EMAN1 commands run in that directory. In EMAN2, we have converted to a model where most of the image data being processed is stored in and 'embedded database' in the local directory rather than in the traditional MRC/IMAGIC/SPIDER files. Files may still be copied into and out of this database into conventional files, but by storing data internally, we gain a (sometimes substantial) performance benefit, have much more flexibility in how metadata (known as 'header information') is stored, and permit much better tracking of what tasks have been completed on each data item. This idea might take some getting used to, and we hope you will appreciate its elegance once you do.