CG Performance Results

Utilized hardware and software:

  • Intel Core 2 Duo E8500 (3,16 GHz, FSB 1333 MHz, 6 MB Cache)
  • OCZ Reaper HPC Edition 4GB (2048MB x 2) DDR2 1066 MHz
  • XFX GeForce GTX 280 1024MB DDR3 XXX (GX-280N-ZDDU) (670 MHz, 1024MB DDR3@2500MHz)
  • Ubuntu Linux 9.04 32 bits
  • CUDA Driver 2.3 (190.18 Beta)
  • nvcc 0.2.1221 compiler for the CUDA version, gcc 4.3.1 for the OpenMP version, both with -O3

Results

cg_time.png
Figure 1. Time comparison between the CUDA and OpenMP versions of the CG benchmark.

cg_mops.png
Figure 2. Million operations per second comparison between the CUDA and OpenMP versions of the CG benchmark.

Figures 1 and 2 show the performance results in seconds and million operation per second for the CUDA and OpenMP versions of the CG benchmark. For the smaller instance (CG W), the dual core processor obtained a better performance than the CUDA GPU. This is mostly related to the small parallelism of the instance and the sparse memory access. Still, CUDA maintained the same throughput for both instances, while the dual core performance was reduced for the second instance. The increase in parallelism balanced the increase in sparse accesses for the GPU, but not for the CPU version.

Conclusions

The sparse memory accesses and synchronizations were a performance killer for the GPU. Still, this kind of application does not obtain great speedups in other parallel systems.

Last edited Mar 10, 2010 at 11:37 AM by Pilla, version 3

Comments

No comments yet.