Abstract
Graphics Processing Units (GPUs) bring the promise of supercomputing power for a fraction of the cost of traditional supercomputing, with possible speed-ups over comparable CPU hardware of one or two orders of magnitude. Rapid development of both proprietary libraries, such as NVIDIA's CUDA, and an open standard, OpenCL, have opened the doors to the GPU's cheap computing power. Unfortunately, random number generators (RNGs) have been slow to catch up with the rapid expansion of GPU computing. The number of types of RNGs available for GPUs is small, and the statistical quality of those provided with standard libraries are frequently unknown. Because specific RNGs may have statistical quality for certain applications, new kinds of RNGs must be made available for GPU computing to bring the full power of GPUs to different kinds of research. Lagged-Fibonacci Generators (LFGs), in particular, have been difficult to develop for memory-challenged GPUs because of their large state space, which is unfortunate because they have excellent statistical properties for many applications. In this paper, we discuss our implementation of memory efficient, integer, cycle-split, additive and multiplicative LFGs for both CUDA and OpenCL. The latter LFG has been implemented neither for GPUs nor as a split-steam parallel generator before. We also discuss portability and reproducibility between CPUs and GPUs.
Funding source: NASA
Award Identifier / Grant number: NNX12CD32P
Funding source: NVIDIA
Funding source: Intel
� 2015 by De Gruyter