Parallel Processing in EMAN

EMAN uses a very portable and very coarse-grained type of parallelism. It can function on virtually any multiple processor supercomputer (shared or distributed memory), and can even run on clusters of connected workstations (with certain restrictions). Many of the EMAN command line programs accept the 'proc=' argument. You can discover if an individual command accepts this argument by typing its name followed by 'help', eg 'refine help'. Any command which takes this argument impliments parallelism in the same way.

The form for specifying the number of processors to use is [proc=<min>[,<max>]]. That is, 'proc=5' and 'proc=5,10' would be valid specifications. In the first case, 5 processors would be used. In the second case, a minimum of 5 processors would be used at all times, but if the machine load fell to reasonable levels, it would progressively use more processors, periodically dropping down to the minimum number to allow other jobs time to start. On clusters of workstations, EMAN will perform dynamic load balancing. That is, if it is configured for 10 workstations, and only 5 are used, it will use the 5 workstations with the lowest load. Note, however, if you give a minimum number of processors greater than the available processors, it will run multiple jobs on a single processor, causing both jobs to run very slowly.

On a multiple processor computer, when specifying a fixed number of processors to use, no configuration is necessary. Simply specify the number of processors to use, and it will run. However, if you wish to use a variable number of processors or run on workstation clusters, you must create a configuration file in the directory you are running the command. Typically you would create this file in the project directory where you are running the 'refine' command (which takes the proc= argument).

The configuration file is called '.mparm'. All parallelism is implemented through the 'runpar' command. Once you have created a '.mparm' file, you can test it by running 'runpar test'. This will test each configured machine and return it's current load. This can also be used to decide how many processors to specify when running a job.

The format of the .mparm file is quite simple. Each line contains the specification for 1 computer. The first line MUST be 'localhost', and contain information on the machine from which the jobs will be run. Let's begin with the configuration for a single supercomputer with 24 processors. In this case, the .mparm file would contain 1 line:

rsh	24	1	localhost	/homes/stevel/testdir
The 'rsh' and the '1' are present in every line and are reserved for future expansion. There should be <tab>'s between the entries on the line. The '24' is obviously the number of processors on the machine. 'localhost' is the machine name (the first line should always be localhost). The final entry is the path to the local directory on the remote computer. To use EMAN on clusters of computers, the directory where processing is being done MUST be cross-mounted on all of the computers being used. However, on some machines, the directories aren't always mounted in the same place. In most workstation setups, the path will be the same on all lines, but if your local configuration has the directory mounted in different places on different machines, you can use this feature. The second requirement to use EMAN on workstation clusters is that the machines MUST be configured to allow rsh/rlogin/rcp commands to work (with no password). Some sites do not allow this, due to security concerns. If this is the case, there is no way around it, and you'll have to use EMAN on one machine. The way to test this is to log in to the machine you'll be starting your EMAN jobs from and 'rlogin <machine name>' to each of the machines you plan to use. If you are prompted for a password when you do this, hope is not lost yet. You can create a file in your home directory called '.rhosts', and add an entry for the machine you plan to start your jobs from (see the 'rhosts' man page). If that solves the problem, set up your .mparm file, and do a 'runpar test'. If this reports the machine load for each configured computer, everything should be fine. Note that the machines must NOT be mixed byte-order. That is, it's fine to use a cluster of SGIs or a cluster of PCs, but don't mix the two.

Here is an example of a 4 machine configuration, 2 of which have 2 processors:

rsh     2       1       localhost       /homes/stevel/tst
rsh     2       1       fire    /homes/stevel/tst
rsh     1       1       phoenix /homes/stevel/tst
rsh     1       1       tiger   /homes/stevel/tst

Please note that the automatic load balancing feature is far from perfect. You may find cases where the machine EMAN chooses for a job may not be the optimal one. Also, the current version does not have any support for including information about the relative speed of each machine. While it will do some amount of load balancing, if you have one machine that's 5 times faster than another, it's generally a good idea to exlude the slower machine. It won't add much in most cases, and it can potentially slow things down quite a bit. To determine the relative speed of a computer (as far as EMAN is concerned), run 'speedtest'. This will give a single number which is fairly representative of how fast a refinement will run on that machine.


EMAN 1.0 beta1, last modified 1/9/00