EMAN uses a very portable and very coarse-grained type of parallelism. It can function on virtually any multiple processor supercomputer (shared or distributed memory), and can even run on clusters of connected workstations (with certain restrictions). Many of the EMAN command line programs accept the 'proc=' argument. You can discover if an individual command accepts this argument by typing its name followed by 'help', eg 'refine help'. Any command which takes this argument impliments parallelism in the same way (with 2 specific experimental exceptions).
The form for specifying the number of processors to use is [proc=<min>[,<max>]]. That is, 'proc=5' and 'proc=5,10' would be valid specifications. In the first case, 5 processors would be used. In the second case, a minimum of 5 processors would be used at all times, but if the machine load fell to reasonable levels, it would progressively use more processors, periodically dropping down to the minimum number to allow other jobs time to start. Note, however, if you give a minimum number of processors greater than the available processors, it will run multiple jobs on a single processor, causing all jobs to run very slowly.
On a multiple processor computer, when specifying a fixed number of processors to use, no configuration is necessary. Simply specify the number of processors to use, and it will run. However, if you wish to use a variable number of processors or run on workstation clusters, you must create a configuration file in the directory you are running the command. You must create this file in the project directory where you are running any parallel commands (which take the proc= argument). Failing to set this file up in each directory can result in accidentally running many jobs on a single processor, potentially causing the system to require a reboot.
The configuration file is called '.mparm'. The format of this file is quite simple. Each line contains the specification for 1 computer. The first line should be 'localhost', and contain information on the machine from which the jobs will be run. The only exception to this rule is a cluster with a host node where the actual 'refine' command is run. Note, however, that refine performs some jobs in serial. That is, even if you omit the localhost line, refine may run some small single-processor jobs on the local node. Let's begin with the configuration for a single supercomputer with 24 processors. In this case, the .mparm file would contain 1 line:
rsh 24 1 localhost /homes/stevel/testdirThe 'rsh' and the '1' are present in every line and are reserved for future expansion. There should be <tab>'s between the entries on the line. The '24' is obviously the number of processors on the machine. 'localhost' is the machine name. The final entry is the path to the local directory on the remote computer. It is very important to remember this when copying .mparm files from one directory to another. It is critical that the directory specification in the file be changed to match its new location. This is inconvenient and will be corrected in the next version, but for now it's necessary.
To use EMAN on clusters of computers, the directory where processing is being done MUST be cross-mounted on all of the computers being used. However, on some machines, the directories aren't always mounted in the same place. In most workstation setups, the path will be the same on all lines, but if your local configuration has the directory mounted in different places on different machines, you can use this feature. The second requirement to use EMAN on workstation clusters is that the machines MUST be configured to allow rsh/rlogin/rcp commands to work (with no password). Some sites do not allow this due to security concerns. If this is the case, there is no way around it, and you'll have to use EMAN on one machine. The way to test this is to log in to the machine you'll be starting your EMAN jobs from and 'rlogin <machine name>' to each of the machines you plan to use. If you are prompted for a password when you do this, hope is not lost yet. You can create a file in your home directory called '.rhosts', and add an entry for the machine you plan to start your jobs from (see the 'rhosts' man page). If that does not solve the problem, contact your sysop. Note that the machines must NOT be mixed byte-order. That is, it's fine to use a cluster of SGIs or a cluster of PCs, but don't mix the two for a single job.
Here is an example of a 4 machine configuration, 2 of which have 2 processors:
rsh 2 1 localhost /homes/stevel/tst rsh 2 1 fire /homes/stevel/tst rsh 1 1 phoenix /homes/stevel/tst rsh 1 1 tiger /homes/stevel/tst
Current distributions of linux have some known problems with their NFS implementations. Specifically, if two nodes append to the same file within a few seconds of each other, the file will be corrupted. Lacking a better solution, the rather extreme measure of writing a custom fileserver for EMAN was undertaken. When the cluster version of EMAN is run, all file write operations are passed through a single-threaded fileserver running within runpar on the host machine. Not only does this avoid NFS bugs, but it seems to run much faster than NFS did. File reads are still performed through NFS, thus taking advantage of local file-caching. This feature is disabled in the single processor and shared memory versions of the EMAN binaries.
The previous version of EMAN contained an experimental load-balancing feature to help level machine load on clusters. This option worked poorly and has now been removed. In the version of EMAN compiled for clusters, a fixed number of processors is used, and the nodes are filled in order (although 1 job will be put on each node before a second is put on a 2-processor node).
Also, the current version does not have any support for
including information about the relative speed of each machine. While it
will do some amount of load balancing, if you have one machine that's 5
times faster than another, it's generally a good idea to exlude the
slower machine. It won't add much in most cases, and it can potentially
slow things down quite a bit. To determine the relative speed of
a computer (as far as EMAN is concerned), run 'speedtest'. This will give
a single number which is fairly representative of how fast a refinement
will run on that machine.
Note: As of Mar 2001, Athlon 1.2ghz/133fsb processors are the best available. While
the P4 has potential, with current linux compilers, it will generally perform slower than
a fast PIII. The Athlon also has a much better floating point unit than the PIII/4, and
performs math-intensive work. For comparison, an Athlon 1.2/133 benchmarks twice
as fast as a PIII/800 running EMAN code (it's also much less expensive).
EMAN 1.0 beta1, last modified 3/22/01