436
Comment:
|
2484
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
environment. Unfortunately, as of May, 2009, the parallelism infrastructure is just beginning to come together. This should be gradually fleshed out over summer 2009. At the moment, only one parallelism infrastructure is fully functional. |
environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation. Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, I would suggest MPI as probably the easiest option, and the one that will cause your sysadmin the fewest headaches, but this may not be true on all clusters. DC is most appropriate when you are trying to use multiple independent computers, or combine the resources from multiple clusters. In a sense it is the most flexible, as nodes can be added and removed during the job at any time and DC will make efficient use of what's available at any moment in time. However, it takes a lot more work to use it, is somewhat complicated, and the network policies on some clusters will not permit its use. Please follow the appropriate link: * [[EMAN2/Parallel/Threaded|Threaded]] - This is for use on a single computer with multiple processors (cores). For example, the Core2Duo processors of a few years ago had 2 cores. In 2010, individual computers often have single or dual processors with 2, 4 or 6 cores each, for a total of up to 12 cores. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer. * [[EMAN2/Parallel/Mpi|MPI]] - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details * [[EMAN2/Parallel/Distributed|Distributed]] - This was the original parallelism method developed for EMAN2. It can be used on anything from sets of workstations to multiple clusters, and can dynamically change how many processors it's using during a single run, allowing you, for example, to make use of idle cycles at night on lab workstations, but reduce the load during the day for normal use. It is very flexible, but requires a bit of effort, and a knowledgeable user to configure and use. ''As of 12/17/2010 MPI parallelism is now functioning, but still under testing/optimization.'' |
Parallel Processing in EMAN2
EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation.
Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, I would suggest MPI as probably the easiest option, and the one that will cause your sysadmin the fewest headaches, but this may not be true on all clusters. DC is most appropriate when you are trying to use multiple independent computers, or combine the resources from multiple clusters. In a sense it is the most flexible, as nodes can be added and removed during the job at any time and DC will make efficient use of what's available at any moment in time. However, it takes a lot more work to use it, is somewhat complicated, and the network policies on some clusters will not permit its use.
Please follow the appropriate link:
Threaded - This is for use on a single computer with multiple processors (cores). For example, the Core2Duo processors of a few years ago had 2 cores. In 2010, individual computers often have single or dual processors with 2, 4 or 6 cores each, for a total of up to 12 cores. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
MPI - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
Distributed - This was the original parallelism method developed for EMAN2. It can be used on anything from sets of workstations to multiple clusters, and can dynamically change how many processors it's using during a single run, allowing you, for example, to make use of idle cycles at night on lab workstations, but reduce the load during the day for normal use. It is very flexible, but requires a bit of effort, and a knowledgeable user to configure and use.
As of 12/17/2010 MPI parallelism is now functioning, but still under testing/optimization.