4313
Comment:
|
6683
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
summer 2009. At the moment, only one parallelism infrastructure is fully functional. | 2009. At the moment, only one parallelism infrastructure is fully functional. |
Line 12: | Line 12: |
=== GPGPU Computing === While not precisely a parallelism methodology, this technique makes use of the GPU (graphics processing unit) common in most modern PC's, to dramatically accelerate many image processing algorithms. At present (summer 2009) we are at the initial stages of implementing GPGPU support using Nvidia's CUDA infrastructure. We will likely move to OpenCL in future as it becomes a stable platform. We have only implemented a few algorithms using this methodology to date, and we will need to implement and optimize virtually all of them before this becomes a viable platform for day-to-day use. However, we have demonstrated speedups of as much as 100x in select algorithms, meaning a desktop PC with a GPU could easily become the equivalent of a small Linux cluster. While all of the GPGPU code is available in the nightly source snapshots, you are encouraged to contact sludtke@bcm.edu if you are interested in experimenting with this technology. |
|
Line 36: | Line 45: |
With any of the e2parallel.py commands below, you may consider adding the --verbose=1 option to see more of what it's doing. |
|
Line 42: | Line 53: |
What follows are specific instructions for doing this under 3 different scenarios. |
|
Line 44: | Line 57: |
* Run a server on the workstation ''e2parallel.py dcserver'' | * make an empty directory on a local hard drive * Run a server on the workstation ''e2parallel.py dcserver'' from the empty directory you just created |
Line 52: | Line 66: |
* Run a server on the workstation ''e2parallel.py dcserver'' | * Run a server on the head-node ''e2parallel.py dcserver'' in an empty directory on the local hard drive |
Line 54: | Line 68: |
* Run one client for each core you want to use for processing : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values) | * Run one client for each core you want to use for processing on each node : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values) |
Line 57: | Line 71: |
===== Using DC on a set of workstations ===== * The server should run on a computer with a direct physical connection to the storage * All of the clients must be able to make a network connection to the server machine * Run a server on the desired machine ''e2parallel.py dcserver'' in an empty directory on the local hard drive * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it. * Run one client for each core you want to use for processing on each computer : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values) * Run your EMAN2 programs with the option ''--parallel=dc:<server>:9990'' (again, use the right port number and server hostname) For all of the above, once you have finished running your jobs, kill the server, then run 'e2parallel.py dckillclients' from the same directory. When it stops spewing out 'client killed' messages, you can kill this server. |
Parallel Processing in EMAN2
EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment. Unfortunately, as of May, 2009, the parallelism infrastructure is just beginning to come together. This should be gradually fleshed out over 2009. At the moment, only one parallelism infrastructure is fully functional.
Programs with parallelism support will take the --parallel command line option as follows:
--parallel=<type>:<option>=<value>:<option>=<value>:...
for example, for the distributed parallelism model: --parallel=dc:localhost:9990
GPGPU Computing
While not precisely a parallelism methodology, this technique makes use of the GPU (graphics processing unit) common in most modern PC's, to dramatically accelerate many image processing algorithms. At present (summer 2009) we are at the initial stages of implementing GPGPU support using Nvidia's CUDA infrastructure. We will likely move to OpenCL in future as it becomes a stable platform. We have only implemented a few algorithms using this methodology to date, and we will need to implement and optimize virtually all of them before this becomes a viable platform for day-to-day use. However, we have demonstrated speedups of as much as 100x in select algorithms, meaning a desktop PC with a GPU could easily become the equivalent of a small Linux cluster. While all of the GPGPU code is available in the nightly source snapshots, you are encouraged to contact sludtke@bcm.edu if you are interested in experimenting with this technology.
Local Machine (multiple cores)
Not yet implemented, please use Distributed Computing
Distributed Computing
Introduction
This is the sort of parallelism made famous by projects like SETI-at-home and Folding-at-Home. The general idea is that you have a list of small jobs to do, and a bunch of computers with spare cycles willing to help out with the computation. The number of computers willing to do computations may vary with time, and possibly may agree to do a computation, but then fail to complete it. This is a very flexible parallelism model, which can be adapted to both individual computers with multiple cores as well as linux clusters, or sets of workstations laying around the lab.
There are 3 components to this system:
User Application (customer) <==> Server <==> Compute Nodes (client)
The user application builds a list of computational tasks that it needs to have completed, then sends the list to the server. Compute nodes with nothing to do then contact the server and request tasks to compute. The server sends the tasks out to the clients. When the client finishes the requested computation, results are sent back to the server. The user application then requests the results from the server and completes processing. As long as the number of tasks to complete is larger than the number of clients servicing requests, this is an extremely efficient infrastructure.
Internally things are somewhat more complicated and tackle issues such as data caching on the clients, how to handle clients that die in the middle of processing, etc., but the basic concept is quite straightforward.
With any of the e2parallel.py commands below, you may consider adding the --verbose=1 option to see more of what it's doing.
How to use Distributed Computing in EMAN2
To use distributed computing, there are three basic steps:
- Run a server on a machine that the clients can communicate with
- Run some number of clients pointing at the server
- run an EMAN2 program with the --parallel option
What follows are specific instructions for doing this under 3 different scenarios.
Using DC on a single multi-core workstation
- Ideally your data will be stored on a hard drive physically connected to the workstation (not on a shared network drive)
- make an empty directory on a local hard drive
Run a server on the workstation e2parallel.py dcserver from the empty directory you just created
- The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
Run one client for each core you want to use for processing : e2parallel.py dcclient --server=localhost --port=9990 (replace the port with the correct number if necessary)
Run your EMAN2 programs with the option --parallel=dc:localhost:9990 (again, use the right port number)
Using DC on a linux cluster
- The server should run on the node (often the head node or a specialized 'storage node') with a direct physical connection to the storage
- If you want to use clients from multiple clusters, then remember all of the clients must be able to make a network connection to the server machine
Run a server on the head-node e2parallel.py dcserver in an empty directory on the local hard drive
- The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
Run one client for each core you want to use for processing on each node : e2parallel.py dcclient --server=<server> --port=9990 (replace the server hostname and port with the correct values)
Run your EMAN2 programs with the option --parallel=dc:<server>:9990 (again, use the right port number and server hostname)
Using DC on a set of workstations
- The server should run on a computer with a direct physical connection to the storage
- All of the clients must be able to make a network connection to the server machine
Run a server on the desired machine e2parallel.py dcserver in an empty directory on the local hard drive
- The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
Run one client for each core you want to use for processing on each computer : e2parallel.py dcclient --server=<server> --port=9990 (replace the server hostname and port with the correct values)
Run your EMAN2 programs with the option --parallel=dc:<server>:9990 (again, use the right port number and server hostname)
For all of the above, once you have finished running your jobs, kill the server, then run 'e2parallel.py dckillclients' from the same directory. When it stops spewing out 'client killed' messages, you can kill this server.
MPI
Sorry, we haven't had a chance to finish this yet. For the moment you will have to use the Distributed Computing mode on clusters.