Loading...
Parallel Multiple Precision Division by a Single Precision Divisor (IEEE)
Publication Year:
2012
Abstract
We report an algorithm for division of a multi-precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi's recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi's algorithm at O(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.
Paper available at IEEE.
Institution:
Computer Science Department, University of Massachusetts Amherst, MA 01003-4610, USA
Groups:

BayWebSoft