Differences between revisions 4 and 16 (spanning 12 versions)
Revision 4 as of 2013-09-13 21:15:11
Size: 7958
Editor: SteveLudtke
Comment:
Revision 16 as of 2015-07-13 12:02:58
Size: 14021
Editor: SteveLudtke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
=== Update 9/13/2031 ===
In the Intel lineup today, for a basic machine (<$2000), I would probably lean towards a single 6-core, i7-3930K Sandy Bridge-E 3.2GHz (3.8GHz Turbo). If you want to go all-out, a machine with dual 8-core Xeon (E5-2690 Sandy Bridge-EP 2.90GHz) processors is currently at the high-end of the lineup, but this will run you about $4000 just for the two CPUs, so you could easily hit $5000-7000 for a machine like this with a decent amount of RAM. My normal RAM recommendations haven't changed much 2G/core is enough for most applications, but if you want to do tomography or deal with large viruses at high resolution, you may want 4G/core.
=== (Almost) Timeless Recommendations ===
Since I won't always update this page every few months, let me give a few general tips:
Line 6: Line 6:
It's worth noting, however, that for many projects, you can get away with relatively little in modern computer terms. My quad-core mac laptop, for example, can refine a ribosome to ~12 Å resolution overnight very easily. It's when you start pushing for higher resolutions or larger structures that the computing needs really increase, and in such situations you are probably better off getting some time on a cluster, rather than paying $10k for a super-duper workstation...  * As of 2015, Intel processors maintain a very substantial computational advantage over AMD for the sort of processing we do. I would tend to say it isn't even worth considering AMD as an alternative at this point in time. That could change, as it did in the mid 2000's, but no sign of it yet.
 * For the sort of processing we typically do, scaling is basically linear with CPU speed, and almost linear with number of cores. So, when pricing, optimize the product of these two numbers.
 * There are still some tasks that do NOT scale with number of cores, so if the product of speed * cores is similar, opt for the machine with fewer faster cores.
 * Try for at least ~4 GB/core of RAM, with an absolute minimum of 2 GB/core. This is particularly true if you plan to use other image processing software as well.
 * For a number of tasks, most specifically direct detector movie processing, disk speed is critical. There are 2 ways to get fast disk access:
  * A RAID array. If you set up a hardware RAID 5 with 8 drives, you get ~7x the performance of a single drive
  * drive speed. Standard 3.5" hard drives are typically ~150 MB/sec. SSDs are more often in the ~600 MB/sec range (and this is improving with time), but at this time are still significantly more expensive, and have long-term reliability issues when used heavily. If a RAID isn't an option, consider an SSD for daily use with a standard drive for backup/archive.
 * You really don't need a $4000 graphics card unless you have specific software you need to run on it. Even then for a typical desktop system a high-end gaming GPU will generally get you about the same performance as the much more expensive units marketed for science. If you don't have GPU specific software, there is no point in spending more than $200-300 for a mid-tier graphics card.
 * Get a big monitor with high resolution, or multiple monitors. Small 4K TVs are quite cheap now, and you really want the extra resolution for working with large images and other purposes.

=== Update late 2014 ===
Earlier this year I updated my workstation to get something for <$10k that would be optimal for processing movies as well as single particle reconstruction for small-medium projects. We have purchased a couple of machines like this, and they are quite cost effective and perform very well. Here are the basic specs:

 * Case with 8 hot-swap drives : Supermicro SC743 TQ-865B-SQ - tower - 4U
 * SUPERMICRO X9DAE Motherboard
 * 2x Intel Xeon E5-2650v2 (8 core, 2.6 - 3.4 Ghz)
 * 128 GB DDR3 RAM
 * 8x 4TB WD Black drives
 * PCIe RAID controller with cables - LSI MegaRAID SAS 9271-8i Kit

This was earlier in the year, so there are probably better processor choices now, but this machine performs very well. Disk performance is ~1.2 - 1.5 GB/sec with ~24 TB of storage. 16 cores (32 threads) with threading which actually works (~30% speed boost over using 16 threads). Total cost (self-assembled) was ~$8000. Some of the prices will have fallen since then.

Of course this is just representative, and you can likely get a vendor to build you one for about the same price.

=== Update late 2013 ===
In the Intel lineup today, for a basic machine (<$2000), I would probably lean towards a single 6-core, i7-3930K Sandy Bridge-E 3.2GHz (3.8GHz Turbo). If you want to go all-out, a machine with dual 8-core Xeon (E5-2690 Sandy Bridge-EP 2.90GHz) processors is currently at the high-end of the lineup, but this will run you about $4000 just for the two CPUs, so you could easily hit $5000-7000 for a machine like this with a decent amount of RAM. My normal RAM recommendations haven't changed much 2G/core is enough for most applications, but if you want to do tomography or deal with large viruses at high resolution, you may want 64+ GB (regardless of the number of cores). RAM is cheap enough that 64 GB isn't all that expensive.

For many projects, you can get away with relatively little in modern computer terms. My quad-core mac laptop, for example, can refine a ribosome to ~12 Å resolution overnight very easily. It's when you start pushing for higher resolutions or larger structures that the computing needs really increase, and in such situations you are probably better off getting some time on a cluster, instead of paying $10k for a super-duper workstation. My recommendation would be to get a high clock speed single CPU computer with 4 or 6 cores for desktop use.

SSD hard drives are (each) ~4x faster than traditional spinning drives. They have improved dramatically in recent years (as shown by their use in all current Mac laptops). They are still expensive, but much less so than they used to be. Many tasks in cryo-EM data processing, particularly with DDD movie data, are disk-limited, so you can improve the interactivity of your computer dramatically by at least supplementing your regular hard-drives with SSDs.

Two options for SSD use:
 * single 256 or 512GB SSD drive used for booting.
  * make SURE that NO swap space is allocated on the SSD. SSD's have limited lifetimes, and SWAP involves a lot of reading/writing. IMHO, there is no need for any swap at all on a computer with 16+GB of RAM, and not much reason to get a machine with less RAM than this.
  * you should have a nightly backup (rsync is a good tool for this) from your SSD drive to another hard drive. Uncomfortably often when an SSD fails, it fails catastrophically (no data recovery at all)
 * SSD RAID0 (data striping with no redundancy)
  * 4x 1TB SSD drives will cost ~$2000 (early 2014)
  * configured as a RAID0 (data striping with no redundancy) gives ~4TB usable space
  * can achieve speeds better than 1GB/sec ! (typical hard drives are ~120-150 MB/sec, ie - >8x faster)
  * Extremely useful when working with large tomograms or direct-detector movie mode data !
  * However, if any 1 drive fails, all data is lost, so these should be for active data processing only, and should have a nightly backup onto a traditional 4TB drive (rsync again).

Regular spinning hard drives:
Note that you can also get ~1GB/sec performance out of spinning drives if you get a large enough RAID array. For example if you get an 8 drive RAID and put high speed regular hard drives in it, and have a high speed interconnect (NOT SAN), you can get pretty good performance.

A note on GPU computing: EMAN2 does have support for GPUs available, however, there are many caveats:
 * You have to compile EMAN2 from source to get this capability (we haven't come up with a strategy for distributing usable GPU binaries due to library versioning issues)
 * The only application where the GPU provides enough speedup to be worthwhile in EMAN2 is single particle tomography. For regular single particle analysis, most modern CPUs (with multiple cores) can outpace a GPU.
 * If you do decide to try GPUs on a workstation, A: make sure you get Nvidia, we only support CUDA. B: don't waste your money on Tesla cards, just buy a high-end consumer gaming card. Performance will be nearly the same at ~1/10 the cost.

What sort of desktop computer should I get for EMAN2 reconstructions

(Almost) Timeless Recommendations

Since I won't always update this page every few months, let me give a few general tips:

  • As of 2015, Intel processors maintain a very substantial computational advantage over AMD for the sort of processing we do. I would tend to say it isn't even worth considering AMD as an alternative at this point in time. That could change, as it did in the mid 2000's, but no sign of it yet.
  • For the sort of processing we typically do, scaling is basically linear with CPU speed, and almost linear with number of cores. So, when pricing, optimize the product of these two numbers.
  • There are still some tasks that do NOT scale with number of cores, so if the product of speed * cores is similar, opt for the machine with fewer faster cores.
  • Try for at least ~4 GB/core of RAM, with an absolute minimum of 2 GB/core. This is particularly true if you plan to use other image processing software as well.
  • For a number of tasks, most specifically direct detector movie processing, disk speed is critical. There are 2 ways to get fast disk access:
    • A RAID array. If you set up a hardware RAID 5 with 8 drives, you get ~7x the performance of a single drive
    • drive speed. Standard 3.5" hard drives are typically ~150 MB/sec. SSDs are more often in the ~600 MB/sec range (and this is improving with time), but at this time are still significantly more expensive, and have long-term reliability issues when used heavily. If a RAID isn't an option, consider an SSD for daily use with a standard drive for backup/archive.
  • You really don't need a $4000 graphics card unless you have specific software you need to run on it. Even then for a typical desktop system a high-end gaming GPU will generally get you about the same performance as the much more expensive units marketed for science. If you don't have GPU specific software, there is no point in spending more than $200-300 for a mid-tier graphics card.
  • Get a big monitor with high resolution, or multiple monitors. Small 4K TVs are quite cheap now, and you really want the extra resolution for working with large images and other purposes.

Update late 2014

Earlier this year I updated my workstation to get something for <$10k that would be optimal for processing movies as well as single particle reconstruction for small-medium projects. We have purchased a couple of machines like this, and they are quite cost effective and perform very well. Here are the basic specs:

  • Case with 8 hot-swap drives : Supermicro SC743 TQ-865B-SQ - tower - 4U
  • SUPERMICRO X9DAE Motherboard
  • 2x Intel Xeon E5-2650v2 (8 core, 2.6 - 3.4 Ghz)
  • 128 GB DDR3 RAM
  • 8x 4TB WD Black drives
  • PCIe RAID controller with cables - LSI MegaRAID SAS 9271-8i Kit

This was earlier in the year, so there are probably better processor choices now, but this machine performs very well. Disk performance is ~1.2 - 1.5 GB/sec with ~24 TB of storage. 16 cores (32 threads) with threading which actually works (~30% speed boost over using 16 threads). Total cost (self-assembled) was ~$8000. Some of the prices will have fallen since then.

Of course this is just representative, and you can likely get a vendor to build you one for about the same price.

Update late 2013

In the Intel lineup today, for a basic machine (<$2000), I would probably lean towards a single 6-core, i7-3930K Sandy Bridge-E 3.2GHz (3.8GHz Turbo). If you want to go all-out, a machine with dual 8-core Xeon (E5-2690 Sandy Bridge-EP 2.90GHz) processors is currently at the high-end of the lineup, but this will run you about $4000 just for the two CPUs, so you could easily hit $5000-7000 for a machine like this with a decent amount of RAM. My normal RAM recommendations haven't changed much 2G/core is enough for most applications, but if you want to do tomography or deal with large viruses at high resolution, you may want 64+ GB (regardless of the number of cores). RAM is cheap enough that 64 GB isn't all that expensive.

For many projects, you can get away with relatively little in modern computer terms. My quad-core mac laptop, for example, can refine a ribosome to ~12 Å resolution overnight very easily. It's when you start pushing for higher resolutions or larger structures that the computing needs really increase, and in such situations you are probably better off getting some time on a cluster, instead of paying $10k for a super-duper workstation. My recommendation would be to get a high clock speed single CPU computer with 4 or 6 cores for desktop use.

SSD hard drives are (each) ~4x faster than traditional spinning drives. They have improved dramatically in recent years (as shown by their use in all current Mac laptops). They are still expensive, but much less so than they used to be. Many tasks in cryo-EM data processing, particularly with DDD movie data, are disk-limited, so you can improve the interactivity of your computer dramatically by at least supplementing your regular hard-drives with SSDs.

Two options for SSD use:

  • single 256 or 512GB SSD drive used for booting.
    • make SURE that NO swap space is allocated on the SSD. SSD's have limited lifetimes, and SWAP involves a lot of reading/writing. IMHO, there is no need for any swap at all on a computer with 16+GB of RAM, and not much reason to get a machine with less RAM than this.
    • you should have a nightly backup (rsync is a good tool for this) from your SSD drive to another hard drive. Uncomfortably often when an SSD fails, it fails catastrophically (no data recovery at all)
  • SSD RAID0 (data striping with no redundancy)
    • 4x 1TB SSD drives will cost ~$2000 (early 2014)
    • configured as a RAID0 (data striping with no redundancy) gives ~4TB usable space
    • can achieve speeds better than 1GB/sec ! (typical hard drives are ~120-150 MB/sec, ie - >8x faster)

    • Extremely useful when working with large tomograms or direct-detector movie mode data !
    • However, if any 1 drive fails, all data is lost, so these should be for active data processing only, and should have a nightly backup onto a traditional 4TB drive (rsync again).

Regular spinning hard drives: Note that you can also get ~1GB/sec performance out of spinning drives if you get a large enough RAID array. For example if you get an 8 drive RAID and put high speed regular hard drives in it, and have a high speed interconnect (NOT SAN), you can get pretty good performance.

A note on GPU computing: EMAN2 does have support for GPUs available, however, there are many caveats:

  • You have to compile EMAN2 from source to get this capability (we haven't come up with a strategy for distributing usable GPU binaries due to library versioning issues)
  • The only application where the GPU provides enough speedup to be worthwhile in EMAN2 is single particle tomography. For regular single particle analysis, most modern CPUs (with multiple cores) can outpace a GPU.
  • If you do decide to try GPUs on a workstation, A: make sure you get Nvidia, we only support CUDA. B: don't waste your money on Tesla cards, just buy a high-end consumer gaming card. Performance will be nearly the same at ~1/10 the cost.

Suggestion as of 3/20/2012

Sandy - bridge Xeons are now available, and I've been getting questions about which computer to get again. Note that Macs are still using the earlier Westmere technology. Anyway, here's a quick analysis:

Sandy-bridge Xeons (E5-2600 series) have finally become available, but aren't available in Macs yet. Certainly the Mac Pro loaded with 12 cores will give you the best available performance on a Mac right now. However, it is very far from the most cost-effective solution. So, it really depends on your budget and goals. Westmere still offers a decent price-performance ratio if you want dual CPUs. If you are happy with a single CPU, I'd say Core-i5's are actually the way to go (this is what I just set up in my home PC).

Here is a rough comparison of 3 machines I use: Linux - 12 core Xeon X5675 (3.07 Ghz, westmere): Speedtest = 4100/core -> ~50,000 total (2 CPU ~$2880 total) Mac - 12 core Xeon (2.66 Ghz): Speedtest = 3000 -> ~36,000 total Linux - 4 core i5-2500 (3.3 Ghz+turbo): Speedtest = 6400 (turbo), 5600 (sustained) -> ~22,000 total (1 CPU ~$210)

Now, they have just released the Sandy-bridge Xeons, but, for example, a dual 8 core system: 16 core E5-2690 (2.9 Ghz): Speedtest (Estimated) = 5650 (turbo), 4950 (sustained) -> ~80,000 total (2 CPU ~$4050)

Now, the costs I gave above are just for the CPUs. If you wanted to build, for example, several of the core i5 systems and use them in parallel, you'd need motherboard, case, memory, etc for them as well. A barebones Core I5 pc with 8 GB of ram and a 2TB drive would run you ~$650.

If you built a 16 core system around the E5-2690, $4050 - CPU $600 - motherboard $200 - case $150 - power supply $300 - 32 gb ram $500 - 4x 2TB drives (equivalent)

So ~$5800 for the (almost) equivalent 16 core machine vs $2600 for 4 of the 4-core i5 systems.

ie - you pay ~2x for the privilege of having it all integrated into a single box. Of course, that buys you a bit of flexibility as well, and saves you a lot of effort in configuration and running in parallel, etc. It also gives you 32 GB of ram on one machine, which can be useful for dealing with large volume data, visualization, etc.

On the Mac side, a 12-core 2.93 Ghz westmere system with 2 GB/core of ram -> $8000 and would give a speedtest score of ~45,000. ie ~40% more expensive and 1/2 the speed of a single linux box with the 16 core config, and 3x as expensive and 1/2 the speed of the core-i5 solution.

Please keep in mind that this is just a quick estimate, and that actual prices can vary considerably, but as you can see, the decision you make will depend a lot on your goals and your budget.

Suggestion as of 12/1/2011

Obviously for large jobs you're going to need access to a linux cluster, but regardless you will still need a desktop workstation.

A complete answer to the question depends a bit upon your budgetary constraints, or lack thereof. As you are probably aware, at the 'high end', computers become rapidly more expensive for marginal gains in performance. Generally speaking, we tend to build our own Linux boxes in-house rather than purchasing prebuilt ones, both as a cost-saving measure, and to insure future upgradability. Then again, there is nothing wrong with most available commercial pre-build PCs as long as you get the correct components. For a minimal cost-effective workstation, I would suggest:

  • Sandy-bridge series processor, the quad core Core i7-2600K is a good choice
    • If you can get one of the new 6-core versions, that would be 50% more performance
    • note that Sandy-bridge significantly outperforms the previous generation, so going with a 6-core from the pre-sandy bridge series is not a great choice)
    • If you can afford a dual processor configuration, with dual 6-core Xeon's you will presently have to go with the previous generation, as the Sandy Bridge Xeons won't be out for a while. This configuration (12 cores last gen) is worthwhile, but expensive.
  • RAM - 3-6 GB/core is what I'd recommend for image processing
    • This depends a bit on the target application. For large viruses, you may wish to get more RAM/core
    • The performance benefit of high-speed RAM is rarely worth the cost. Get the fastest you can without breaking the bank
  • Disk - we would generally get something like 4, 2 TB drives for data/user files configured as software RAID 5, with a small (~100gb) SSD as a boot drive, current Intel SSDs are good for this purpose.
    • Note that other than the very fastest SSD drives, none of the drives can actually keep up with the latest SATA busses anyway, so going out of your way to get the superfast SATA drive is kind of pointless
  • Video - Get an NVIDIA card, NOT ATI, particularly if you plan on doing stereo. This will also get you some CUDA capabilities. A reasonably high-end GeForce with a good amount of RAM is generally fine with some caveats below.

  • Stereo - This is a tricky and complicated issue. There are 2 main choices:
    • Active stereo
      • Requires a 120 hz stereo capable 1080P display, AND, importantly, a Quadro series Nvidia graphics card (to do stereo in a window under Linux !). Note that you will have difficulties making most consumer '3D TVs' work with this setup, though some will. The most reliable option is to get a monitor designed for stereo use with Nvidia cards (Acer makes a decent 24"). Note that this also requires a dual-link DVI port.
    • Passive
      • By FAR the easiest and cheapest option, which also allows multiple users with cheap passive glasses. It also does NOT require an expensive Quadro video card. Chimera and many other programs have built-in support for 'interleaved' stereo, which they implement without support from the underlying Nvidia driver, so you can do it even with cheap graphics cards. Only disadvantage is that you lose 50% of your vertical resolution. Personally this doesn't bother me overly. The other minor issue is that over the last couple of years these have been hard to find. Finally, LG came out with one which can be easily purchased again, though I confess we haven't purchased one of these new ones yet. Does not require dual-link DVI.
  • Monitor - Dual monitor setups can be very useful for image processing. If you can afford it, I would suggest a high-resolution 30" primary display with a passive stereo secondary display. If you get an active stereo secondary display, you will need 2 dual-link DVI outputs on your graphics card.

hope that helps.

Note that these are just my own personal opinion, and do not represent an official recommendation from anyone other than myself. Your mileage may vary.

EMAN2/FAQ/Computer (last edited 2023-08-31 17:58:11 by SteveLudtke)