Q. What is the current eman2 development model? Anybody with cvs access can change anything, or is there a "gatekeeper" who adds patches to the current code, and thus has ultimate authority over what gets into the program? What I mean is who decides whether a given change is desirable or not? Do you care if something that you are philosophically opposed to gets added to the code?
A. There aren't THAT many people with CVS access right now, so it isn't a major issue yet. As it stands now, anyone with CVS access is permitted to check changes in. Certainly anyone is allowed to add completely new routines, like new processors, new reconstruction routines, etc.
Q. Is there a policy about changing code submitted by somebody else? Similarly, is there anything that prevents (or facilitates) having several groups working on the same piece of code.
A. CVS automatically allows people to work on the
same code
simultaneously. If two people make conflicting changes (ie - change the
same line of code), then the person who checks their changes back into
CVS last has
to go through the 'merging' process before CVS allows their changes to
be committed. ie -
Note that, a record still exists of A's changes, so if B changes something in a bad way, it is still possible to check out A's version. Regarding changing someone elses code, I am certainly concerned about the issue of someone 'fixing' something and unintentionally breaking it. For now, please check with me before you spend time changing an existing piece of code. If there is a good reason to leave it as it is, you can simply make a different routine with the desired functionality. However, if something is really broken, or can be substantially improved, I'm all in favor of this happening. Just drop me a quick message explaining what you intend. For example, the background fitting routine you described in this message was never actually completed, and did not work, so putting in a functional version is a welcome addition.
Q. What's the policy of changing someone else's code?
A. I agree in general that changing someone elses code should only be done with permission of the original author. I would (for now) still like to act as a clearinghouse for this sort of activity, because there may be issues that the original author isn't even aware of. It is always possible to find out who wrote a particular routine through CVS, but this can be a bit annoying to accomplish. Right now we put author stamps at the top of each source code file.
Q. Is it clear who wrote a particular piece of code?
A. It is always possible to find out who wrote a particular routine through CVS, but this can be a bit annoying to accomplish. Right now we put author stamps at the top of each source code file. Starting now, why don't we set a policy that each new function/method that is added to EMAN2 should have an author/date stamp in the docstring for the method. I don't think this needs to extend to small changes within a routine. It is always possible to check CVS for detailed changes.
Q. User docs currently seem to be quite thin. Is that by design? Moreover, do the auto-generated docs adequately allow developers to determine how and by what a particular chunk of code is used. Also, is there a simple way to find out whether a given procedure used by multiple pieces of code and if yes, by which ones?
A. Well, there aren't really any user docs yet because there aren't really any user programs yet. There are just a few that we've started writing over the last few weeks. These should all be internally documented, ie e2pdb2mrc --help will provide some documentation, but this needs to be improved, and we do need to develop a script to collect this information into web pages. Right now our effort is mostly focused on programming, so the docs have also focused that way.
Q. What is the reason for having monolithic source files (such as processor.cpp) instead of having separate source files for each distinct piece of code? There are advantages to both solutions...
A. Simply to prevent having too many source files scattered around. If every Processor was in an individual file, there would be a tremendous number of files. As any particular file gets too large, we generally split it into a few smaller pieces. Since CVS takes care of the changes, and the code for a particular function can be localized within the source file, there aren't any major problems with this approach. One problem with the individual file method is that it tends to dramatically increase build and install times. This is a somewhat arbitrary policy.
Q. What is the policy on "stolen" code (such as that taken from Numerical Recipes)?
A. Clearly taking obvious copyrighted code is a BAD thing, especially when it has been made clear that such use is not permitted. However, I think NRC has actually relaxed their polices a bit recently. In most cases equivalent publicly available code can be found. We have already adopted GSL (Gnu Scientific Library), Opt++ and FFTW as dependencies. So routines from any of those toolkits can be used freely. If there are other toolkits you find you need, you need to talk to me before adding new dependencies to EMAN2 as a whole. More dependencies are generally a bad thing.
Q. May one incorporate pure C (or even pure Fortran) routines, particularly numerical routines, into eman2?
A. No Fortran. This is strictly due to the added compilation difficulties it poses. Pure C is a bit trickier, I certainly have no objections to C-like C++, but to work in the framework, the 'functions' need to at least be defined as static methods in some classes, for example, in Util class or EMUtil class.
Q. Is there a specification somewhere that distinguishes fatal from recoverable errors?
A. We are trying to adopt try/except exception processing for errors. The details for this are still being worked out. Fatal errors should be avoided if possible in the libraries, but when they occur, we should have a central routine to log the error and exit (EMAN1 did). This is in flux right now I'd say.
Q. Are operations on arrays / matrices defined anywhere in the code? Similarly, why does the EMData class define an image as a low-level C chunk of memory, with the user constantly performing on-the-fly index calculations?
A. Using C++ abstractions for the low level image storage (like an array of arrays) generally imposes a substantial speed penalty. There IS a higher level abstraction for accessing the image data get_value_at, set_value_at, etc. These are generally inlined functions, so the ones that don't check limits are pretty much the same speed as doing the index calculations yourself. Currently there aren't any routines for treating EMData objects as vectors/matrices/tensors, but this could be done. Liwei did make a matrix class, but this is used strictly for transformation matrices (3x3 or 4x4). GSL has a large set of matrix math routines. The main reason we haven't settled on anything is that we haven't completely decided on a particular linear algebra suite. We could use numpy Python array on the python side, but this wouldn't provide C++ functionality. We could use GSL, but then we need to expose all of the appropriate routines on the python side, or we could decide to go full bore and adopt something like Lapack. This is still open for debate...
Q. Is there a testing policy or framework?
A. We would like to have a full set of unit-tests, so every method in EMAN2 will have a corresponding test. We are working on this. Bugfixes are still a higher priority right now. If you guys are actively working on EMAN2, I would strongly suggest doing frequent CVS updates. Wen and I have begun writing user-level programs now, and finding many bugs, which are rapidly getting fixed. Still, there are probably quite a few problems.
Q. How does eman2 report results (such as a goodness-of-fit, etcetera) to the user when manipulating images? The LOG facility appears to exist mainly for error/warning reporting. Some commands can produce copious output, foe example the ramp removal can print the coeffs of the fit. Should they be printed, and if yes, where?
A. Here is what I would propose: