Abstract
For the past several years the possible benefits of GPU acceleration for reservoir simulation have been the subject of intense study by many researchers. To date results have been somewhat mixed in that the promise of using hundreds of processors available on the GPU did not achieve the anticipated parallel speedups. The reasons for this are many, but the question still remained as to whether GPU acceleration could become a viable solution for reservoir simulation. To attempt to answer this question an experimental investigation of GPU acceleration was undertaken. Experimental parameters included not only simulation model size but also the number of GPUs. Models varying up to millions of gridblocks were accelerated with up to four GPUs each with hundreds of processors. A highly parallel simple linear equation solver was the main focus of the study. Results for a single GPU indicated that speedups from approximately 25-45 could be easily achieved on the GPU if attention is paid to the use of shared memory, allocation with reduced bank conflicts, warp synchronization, coalescence, and efficient use of registers. When increasing the number of GPUs from one to four, it was noted that poor scalability occurred for the smaller simulation problems due to the dominance of overhead. Finally, a unique mixed precision algorithm showed excellent promise for improving GPU performance and scalability to greater than a factor of one hundred with four GPU accelerators. The mixed precision algorithm utilized single precision for the preconditioning with orthogonal acceleration and update being performed in double precision resulting in higher processor performance and lower memory access requirements.