Archive for November, 2008

How many cores do we really need

Sunday, November 2nd, 2008

After CPU manufacturers have hit the frequency wall, adding cores became the new way of making “better” processors. However, there does not seem to be much discussion on whether additional cores actually improve performance for common, real-life use cases. After a release of a new CPU we see a slew of performance reports most of which use synthetic benchmarks. If those benchmarks are able to take advantage of additional cores then we often see significant performance improvements as the number of cores increase.

While it may be hard to conduct a precise performance comparison of real use cases (e.g., how does one measure the performance of a word processor?), we can determine a set of common applications used in a particular setting. We can then analyze the typical load for each of these applications and reason whether additional cores will improve their performance.

It seems natural to divide all computer use-cases into three broad categories: Desktops (including laptops), Workstations, and Servers. I am intentionally ignoring the high-performance computing (HPC) as being too specialized. Typical applications for a desktop machine include office suite, email client, web browser, instant messenger, audio/video player, and a photo management application. Desktops for home use normally also include games.

The common property of most of these applications is that they are user input and/or network-bound. That is, most of their time they are waiting for user input or data arriving over the network. The few applications that have a CPU-intensive workload (e.g., audio/video player and photo management application) are not easily parallelizable. Only the photo management application and games have the potential for performing several CPU-intensive tasks concurrently (e.g., enhancing or resizing a bunch of photos). For games, however, a more powerful graphics card is often a more effective and cheaper way to increase the performance.

From this analysis it becomes quite obvious that a typical desktop CPU usage pattern consists of a mostly idle state with some bursts of activity usually associated with responding to user input or availability of network data. It is also clear that adding a second or any subsequent core to a desktop won’t improve the performance of its common applications while a better-performing single-core CPU probably would. A second core might be beneficial to a few applications and can also improve the responsiveness of the system in the case of a CPU-intensive task running on the background (e.g., batch photo processing). Furthermore, having extra cores in a power-constrained machine (e.g., laptop) can actually be a disadvantage unless extra cores can be completely shut down.

Some of the alternative paths used to improve the performance of desktop systems include the higher-performance memory subsystem, such as faster memory buses and larger caches as well as specialized processors. An example of the latter approach is the use of the modern GPU’s stream processing capabilities in general applications.

Besides the desktop applications mentioned above, workstations usually include one or more specialized applications, such as compilers, CAD applications, or graphics/video/sound editing software, which normally have CPU-intensive workloads. Having additional CPUs and/or cores in a workstation often improves the performance of the specialized application but to which degree depends on how well the workload can be processed in parallel. For example, C and C++ compilation can often be performed in parallel and, on big projects, one can add extra cores and achieve better build times until a memory or disk subsystem becomes a bottleneck. On the other hand, single-stream video encoding can be a lot harder to parallelize.

Servers are where multi-core CPUs have the most potential. Server applications are naturally parallelizable since they often need to perform the same or similar tasks concurrently. However, some server applications, for example database management software, may have a hard time scaling to a truly large number of CPU/cores because of the need for exclusive access to shared resources. Other applications, for example web servers and stateless application servers, can take advantage of truly massively-parallel systems. Virtualization software is another class of applications which can benefit greatly from multi-core CPUs.