436
Comment:
|
2577
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
environment. Unfortunately, as of May, 2009, the parallelism infrastructure is just beginning to come together. This should be gradually fleshed out over summer 2009. At the moment, only one parallelism infrastructure is fully functional. |
environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation. Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, use MPI. In many cases a single cluster node has enough cores that using Threaded parallelism on one cluster node at a time isn't a bad choice. MPI setup can be painful for people not familiar with clusters, and Threaded can be used without any extra configuration. Please follow the appropriate link: * '''[[EMAN2/Parallel/Threaded|Threaded]]''' - This is for use on a single computer with multiple processors (cores) or a single node of a cluster. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer. * '''[[EMAN2/Parallel/Mpi|MPI]]''' - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details * '''[[EMAN2/Parallel/Distributed|Distributed]]''' - This was the original parallelism method developed for EMAN2. Having said that, it hasn't really been developed or actively used for at least 5 years, with MPI now preferred for clusters and threaded preferred for individual computers. * '''--threads''' option - In addition to --parallel, some commands have a --threads option. There are a few commands which cannot be run using the generic multi-computer parallelism provided by --parallel. These commands may still be able to take advantage of multiple cores on a single machine. --threads is the number of available processors on a single computer. It should be specified in addition to --parallel when both are available. Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores, for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties. |
Parallel Processing in EMAN2
EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation.
Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, use MPI. In many cases a single cluster node has enough cores that using Threaded parallelism on one cluster node at a time isn't a bad choice. MPI setup can be painful for people not familiar with clusters, and Threaded can be used without any extra configuration.
Please follow the appropriate link:
Threaded - This is for use on a single computer with multiple processors (cores) or a single node of a cluster. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
MPI - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
Distributed - This was the original parallelism method developed for EMAN2. Having said that, it hasn't really been developed or actively used for at least 5 years, with MPI now preferred for clusters and threaded preferred for individual computers.
--threads option - In addition to --parallel, some commands have a --threads option. There are a few commands which cannot be run using the generic multi-computer parallelism provided by --parallel. These commands may still be able to take advantage of multiple cores on a single machine. --threads is the number of available processors on a single computer. It should be specified in addition to --parallel when both are available.
Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores, for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties.