Extending EMAN2 or connecting EMAN2 to external software
Let me clarify the possible ways of having external code interact with EMAN2/SPARX to see if that helps the discussions any...
- Use EMAN2/SPARX as a C++ library for an external package, to gain access to some of its image processing capabilities
- Use EMAN2/SPARX at the Python level, and interact with other software with Python wrapping
- Write a wrapper module to permit EMAN2/SPARX to exchange images/metadata between packages
- Add new image processing functions to the EMAN2/SPARX modular core, then use them in new high-level applications written in Python
- Add new capabilities to EMAN2/SPARX by linking to another existing external library.
In more detail:
- While this is certainly possible, I don't know of anyone who has actually tried to do this, since most of the people developing for EMAN2/SPARX are part of the project. The only real use-case I can come up with here is if you have a large external package that you want to add additional functionality to, and the desired functionality is available in EMAN2/SPARX. If your goal is to add new functionality (I don't really know what your goal is) to EMAN2/SPARX, this is not a good approach.
This can work, assuming you have both EMAN2/SPARX and the other package compiled with the same python interpreter. Then you just have to come up with data exchange methods. NumPy is a logical choice for this, though it is a bit wasteful in terms of memory and computation.
- This is the approach we're taking to interface EMAN2 to standalone external software. We already have fairly extensive file-format compatibility, and coordinate/transform compatibility in the EMAN2 core. For example, to make FREALIGN work from EMAN2, we would write a wrapper script to dump files into a FREALIGN compatible format, and produce the necessary metadata text files. Then FREALIGN would be launched, and the final results then re-imported into the system again. This approach is good, if your goal is to gain access to EMAN2/SPARX GUI components and image processing routines, but not have to rewrite the external software at all.
- The preferred approach. Take existing discrete algorithms from the non-EMAN2/SPARX code and add them into EMAN2's Factory Method-based modular core library as new C++ algorithms. These will automatically get wrapped into Python and the GUIs. We have done what we can to make this process as easy as possible, but it certainly will require a certain level of refactoring the existing code. Significantly, the core image processing algorithms in the C++ library are prohibited from doing file i/o directly, since they may be called from a variety of contexts where the filesystem may not be available (for example when running in parallel), and to always take advantage of transparent support for all file formats. All file i/o is localized to one specific C++ module, or to the Python level. That is, if one were porting a large monolithic program to be integrated with EMAN2, the existing code would be broken up into individual algorithms, added as modules to the core, then a new high-level program would be written in Python to replicate the original program.
- This choice is tricky. EMAN2 does have a number of dependencies, to support various file formats, or other functionalities. We do, for example, have the GSL (GNU scientific library) as a dependency for use of certain matrix math functions. It would certainly be possible to wrap an external library into EMAN2/SPARX this way, but that would require making the external library a dependency of EMAN2, and would have to be distributed with all EMAN2 binaries. Generally we are VERY resistant to doing this, as each new library adds to the complexity of the compilation/distribution system.