Apr 2, 2014

The Cyclic Data Distribution

The cyclic distribution of data is a good strategy to solve the load balance problem is parallel computing. Here we discuss the 1D cyclic distribution.

cyclic distribution

Let $v$ be a 1D vector, $\dim v = N$. $P$ is the number of processors and $r$ denotes the rank of a processor, $0 \le r < P$. Suppose the global index of the vector is $I, (I = 0, 1, \cdots N-1)$ and the local index is $i, (i = 0, 1, \cdots n_r-1)$. Then in the 1D cyclic data distribution, \[
    I = i \cdot P + r \;; \quad r = I \bmod P, \; i = \lfloor I/P \rfloor.  
\] The size of the local array on processor $r$ is $n_r = \left\lfloor \frac{N-1-r}{P} \right\rfloor$ + 1. The cyclic distribution is a permutation. We rarely use the global id after permutation, it still may be useful to know: $ I' = (I\bmod P) * \lfloor N/P \rfloor + \lfloor I/P \rfloor + \min(I\bmod P, N\bmod P)$.

example:

$v = (0, 1,2,3,4,5,6,7,8)$, $N = 9, P = 2$. The global ids are in bold font.

r \ i | 0  1  2  3  4    |  nr
 - - - - - - - - - - - - - - - -   
 0   | 0  2  4  6  8    |  5
 1   | 1  3  5  7        |  4

Fig. 1-1, the original matrix
Fig 1-2, the matrix after cyclic redistribution on each dimension

block cyclic distribution

The block cyclic distribution is another scheme that can reconcile the need of load balance and the need of block distribution of data. In addition to the quantities we have introduced before, we need to introduce the block number $b, (b\gt 0)$. In the 1D block cyclic distribution scheme, \[
I = \left( \left\lfloor i / b \right\rfloor P + r \right) \cdot b + ( i \bmod  b) \;; \quad
r = \left\lfloor (I \bmod bP ) / b \right\rfloor, \;
i = \left\lfloor I / b P \right\rfloor \cdot b + (I \bmod b).  \] The size of the local array on processor $r$ is $n_r = \left\lfloor \frac{N-1-r\cdot b}{b P}\right\rfloor b + b + \delta \left(r, \left\lfloor (N-1) / b \right\rfloor \bmod P \right) \cdot \left( (N-1) \bmod b  + 1 - b \right) $.

example:

$v = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)$, $N = 10, P = 2, b = 3$. The global ids are in bold font.

r \ i | 0  1  2   3  4  5 |  nr
 - - - - - - - - - - - - - - - -   
 0   | 0  1  2   6  7  8  |  6
 1   | 3  4  5   9          |  4