Processing math: 88%

Jul 13, 2013

On the Spin in Relativistic Dynamics


Spin is an intrinsic property of a particle associated with rotational symmetry. Roughly speaking, it is the intrinsic part of the angular momentum. Measurement of angular momentum J differs in different reference frames. It is tempted to decompose it J=L+S, where L called orbital angular momentum, S called spin angular momentum or spin for short. Orbital angular momentum should vanish in the particle rest frame P=0. In non-relativistic quantum mechanics, it is defined as L=X×P. But it is not clear a priori that such a decomposition is always available in relativistic dynamics. Another way of defining spin, is to measure the angular momentum in the particle rest frame. In relativistic dynamics, there is no unique Lorentz transformation that transforms a momentum state to the particle rest frame. Ambiguity thus exists for the definition of spin.

Nevertheless, spin can be defined formally as a vector operator that satisfying the following conditions: [Si,Sj]=iεijkSk;(i,j,k=1,2,3)[Si,Pj]=0;()S=J if P=pc. The last condition should be understood within momentum states S|p,σ=J|p,σif p=pc. The spin operator S described above is not covariant. The canonical example of pc is 0.

Spin as an operator must also depend on the specific realization of the space-time symmetry (the Poincaré symmetry), i.e. the representation. According to Wigner theorem, a symmetry transformation in quantum mechanics should be realized either as unitary operator or as anti-unitary operator. Because Lorentz group is non-compact, all unitary representation has to be infinite dimensional, namely fields. Nevertheless, a relativistic theory still can have a finite-dimensional representation. The prominent example is the Dirac theory of relativistic electrons.

Poincaré algebra

Consider the Poincaré algebra, [Pμ,Pν]=0;[Pλ,Mμν]=i(gλμPνgλνPμ);[Mλρ,Mμν]=i(gλνMρμ+gρμMλνgλμMρνgρνMλμ); Pμ and Mμν are the 10 generators of the Poincaré algebra. The six independent components of Mμν are, Jk12ϵijkMij,(i,j,k=1,2,3) the angular momenta; KiM0i,(i=1,2,3) the boosts.

The metric tensor is gμν=gμν=(1111)
P2=PμPμ=M2 is a Casimir element, known as invariant mass squared. In this post, we only consider massive case M2>0 for simplicity.

Pauli-Lubanski Pseudo Vector

Wμ=12εμνκρMνκPρ, where εμνκλ is the Levi-Civita tensor. It can be shown, W0=JP, W=K×P+P0J. Pauli-Lubanski vector satisfies,
  1. PμWμ=0;
  2. [Pμ,Wν]=0;
  3. [Mμν,Wκ]=i(gκμWνgκνWμ);
  4. [Wμ,Wν]=iεμνκρWκPρ;
Wμ are the generators of a stabilizer group for each Pμ. For irreducible representations, W2=WμWμ is a Casimir operator (Casimir element). To evaluate its scalar value, we can study it in the center-of-momentum frame where P=0: Wμ=M(0,J). So W2=M2J2=M2s(s+1). Note that P2 and W2 are the only two independent polynomial invariant operators that commute will all the operators (the Casimir elements). Here the total spin appears naturally. But Wμ/M (or W/M) does not satisfies the angular momentum commutation relations, hence it is still not a spin operator.

It is convenient to define a Pauli-Lubanski tensor, Wμν=1i[Wμ,Wν]=1g{MμνP2MμλPλPν+MνλPλPμ}

Before we continue our discussion on spin operator, let's see how the Casimir elements help to identify irreducible representations. Casimir elements P2 and W2 belong to a set of mutually commuting operators. {P2,W2,Pμ,W0} is one possible set of mutually commuting operators. It is also customary to choose h=W0|P|=ˆPJ, the helicity, instead of W0. Therefore, particles (irreps.) can be identified by their invariant mass, momentum and spin, helicity, |M,pμ,s,h. It is also possible to choose Sz (spin projection in z direction) which we'll defined later to identify the irreps. In fact, the z-direction is often chosen along the longitudinal direction ˆP. In that case, spin projection Sz is the same as the helicity operator h.

"Relativistic Spin Operator" via Lorentz Transformed Pauli-Lubanski Vector

As we have stated in the beginning, it seems reasonable to measure the spin operator by a Lorentz transformation to the particle rest frame. In the literature spin defined in this way is called the "relativistic spin". We shall explore the idea in this section. 

Let Sp be the spin operator that depends on momentum p. Now we know its operator value at p=0, S0=J. To define its operator value at arbitrary momentum p,  we require ψ|Sp|ψ=ψ|U(Lp)S0U(L1p)|ψ, where L1p is a Lorentz transformation that takes particle with momentum p to particle rest frame: L1pp=(M,0). But there is a subtle technicality here: Lorentz transformation demands a covariant 4-vector whereas J is only a 3-vector. One important observation is that the Pauli-Lubanski vector has the same expectation value as M(0,J) at p=0. We simply put ψ|(0,Sp)|ψ=ψ|U(Lp)(0,S0)U(L1p)|ψ=1Mψ|U(Lp)WU(L1p)|ψ. Therefore, (0,Sp)=1MU(Lp)WU(L1p)=1ML1pW.
Lorentz transformation Lp is not unique. Let Lp and Lp be two such Lorentz transformations. L1pLp(M,0)=(M,0). So L=L1pLp belongs to the little group L={LSO(3,1)|L(M,0)}. Conversely, let LpSO(3,1) be some Lorentz transformation that Lp(M,0)=p, LL, LpL also takes (M,0) to p. For the Lorentz group, the little group for massive states M2>0, L=SO(3) is the 3d rotation group. This rotation is called the (generalized) Melosh-Wigner rotation.

It is convenient to choose Lp to be the standard boost (rotationaless boost): Lp00=p0/MLpi0=Lp0i=pi/MLpij=δij+pipj/(M(p0+M)) To obtain L1p, one simply replaces p with p. For the standard boost Lp, Sp=1M(WpW0p0+M). We note that the zero-component is 1M(p0/MW0pi/MWi)=1M2PμWμ=0, which justifies the notation (0,Sp). To extend to the spin operator, we simply promote momentum p to momentum operator P. Furthermore, S2=(0,S)2=1M2(L1pW)2=1M2W2=s(s+1). It can be checked that S indeed satisfies the commutation relations for the spin operator. Therefore, S=1M(WPW0P0+M) is a spin operator. In the literature, this spin operator is also called the canonical (relativistic) spin [3], or simply the relativistic spin.

A nice feature of the canonical spin is that its longitudinal component ˆPS=ˆPJh is the helicity operator. This is consistent with the non-relativistic quantum mechanics, where J=X×P+S hence hˆPJ=ˆPS.

As we have stated,  there are however other valid spins resulted from rotations of the canonical spin. One prominent example is the light-cone spin. The light-cone representation of a 4-vector a=(a0,a) is defined as a±=a0±a3,a=(a1,a2). ab=a+b++abaa=12ab++12a+bab. The standard light-cone Lorentz boost (rotationaless boost) is, L1p+μ=Mp+ωμ;L1pμ=2pμMMp+ωμ;L1pi+=pip+;L1pi=0;L1pij=δij The corresponding spin operator is S+=W+P+,S=W+P+,S=1M(WPW+P+)  SLC=(S,S) or (S,S+).

The light-cone spin projection S+=J3+εijBiPjP+ is kinematic, while S is dynamical. It is not difficult to understand this the light-front spin projection, if one note that MμνXμPνXνPμ. Then the transverse boost Bi=M+i=X+PiXiP+=XiP+ at x+=0, the light-front quantization surface. Therefore Xi=BiP+. And hence S+=J3εijXiPj. The second part X1P2X2P1=Lz is an orbital angular momentum in z (or longitudinal) direction. The expression makes perfect sense by stating spin is angular momentum minus orbital angular momentum S+=J+L+, where I have replace z direction with longitudinal direction.

For light-cone dynamics, one nice feature of the light-cone spin is that it incorporates the helicity along longitudinal momentum P+, because the longitudinal momentum is easier to access for light-cone dynamics. Another advantages of the light-cone spin is that for massless particles, Wμ=sPμ, the light-cone spin gives non-vanishing result SLC=sˆz. Whereas for the canonical spin, spin vector for massless particles has to be defined separately. The light-cone spin and canonical spin can be related by a Melosh rotation. Inspired by the light-cone spin, the canonical spin may be better termed as the equal-time spin.

Ji and Mitchell have constructed a spin operator within an interpolation angular that gives the equal-time spin and light-cone spin the instant and light-front limit, respectively [7].

Angular Momentum Decomposition

In non-relativistic quantum mechanics, the angular momentum can be decomposed into an orbital part and a spin part: J=X×P+S. This can be generalized into relativistic dynamics through the angular momentum tensor: Mμν=Lμν+Sμν. with [Sμν,Pλ]=0 where Lμν=i(pμνppνμp) is the orbital angular momentum tensor. It may also be defined from some position operator Xμ by Lμν=12{Xμ,Pμ}12{Xν,Pμ}. The 3-vector angular momentum is defined as Ji=12ϵijkMjk. Similarly, the spin operator may be defined as Si=12ϵijkSjk. In addition, define a dipole vector Di=S0i. It is easy to see, εμνκλLνκPλ=0. Therefore, Wμ=12εμνκλSνκPλ. So, W0=SP, W=D×P+P0S. If Sμν is linear in terms of Wμν, the general form of it is Sμν=εμνκρWκ(aPρ+bηρ)  where ηρ is a constant 4-vector, a,b are scalars. Substitute this back to Pauli-Lubanski vector, we obtain: aP2+bηP=1. The corresponding spin vector is S=(aP0+bη0)WW0(aP+bη)D=W×(aP+bη) The supplementary condition gives additional constraint of the spin tensor Sμν. There are three popular SSCs:
  1. (Møller): Sμνην=0;
  2. (Fokker-Synge-Pryce, Covariant): SμνPν=0;
  3. (Newton-Wigner, Canonical): MSμνην+SμνPν=0;
Each SSC gives a definition of the spin tensor.

  1. (Møller): Sμν=εμνρκWρηκ/ηP,(a=0,b=1/ηP);
  2. (Fokker-Synge-Pryce, Covariant): Sμν=εμνρκWρPκ/P2, (a=1/P2,b=0);
  3. (Newton-Wigner, Canonical): Sμν=εμνρκWρ(ηκ+Pκ/M)/(M+ηP),
    (a=1/(M(M+Pη)),b=1/(M+Pη));
Case 1: S is the equal-time spin operator. Then S2=W2/M2. We conclude that
a=1M(P0±M),bη0=1P0±M,η=0. This corresponds to the Newton-Wigner SSC. And the resultant spin vector is just the canonical spin operator that we have obtained in the previous section.

Case 2: S is not the equal-time spin operator, but λS+μD is. Let's evaluate S212SμνSμν=S2+D2 for Fokker-Synge-Pryce spin tensor, because its a genuine Lorentz scalar. S2=W2/M2=s(s+1). Apparently, S2s(s+1). Let's redefine a spin operator, S=S±D=1M2(P0WW0P±W×P) then S2=s(s+1). It can be checked that S satisfies the SO(3) Lie algebra.

Case 3: the light-cone spin. 

Newton-Wigner Position Operator

XNW=12{K,1P0}1MP×WP0(P0+M) The nice part of the Newton-Wigner operator is that it satisfies [XiNW,Pj]=iδij,[XiNW,XjNW]=0J=XNW×P+S

Field Decomposition

Recall the Lorentz transformation of a quantum field is, (Λφ)a(x)=bDab(Λ)φb(Λ1x)[φa(x),Mμν]=i(xμνxνμ)φa(x)+Sμνabφb(x) where (Λφ)a(x)U(Λ1)φ(x)aU(Λ),U(Λ)=ei2ωμνMμν,D(Λ)=ei2ωμνSμν. It's easy to recognize that D(Λ) is a finite-dimensional representation of the Lorentz group. According to Noether theorem, the conserved current is, J=d3xˉψγ0(r×(i)+12Σ)ψIt seems natural to write J=X×P+12Σ or Mμν=Lμν+Sμν, namely, Sμν=Sμν. But this is not strictly correct. Spin tensor Sμν is an Hermitian operator in Hilbert space whereas Sμν is only a finite-dimensional linear operator. In fact, as we have stated in the beginning, Sμν cannot be Hermitian. It will be utterly wrong to identify Sμν as the spin tensor, although we'll see below, Sμν does tell us some information about the spin. We should note that most of the trouble is caused by the boosts. As spin is closely related to the rotation property, the finite-dimensional Sμν does tell us a great deal of information.

Finite-Dimensional Representations

The finite-dimensional irreducible representations of the Lorentz group are identified with Casimir elements of the complex Lie algebra Ni±=12(Ji±iKi),i=1,2,3[Ni±,Nj±]=iϵijkNk±,[Ni+,Nj]=0 The Casimirs are N2+ and N2. For irreducible representations, N2±=n±(n±+1), n±=0,12,1,32,. We'll use (n+,n) to identify each irrep. The dimension of this irrep is (2n++1)(2n+1).

The simplest non-trivial irreps are the 2d spinor representations (12,0) and (0,12), known as the left-handed and right-handed Weyl spinors, respectively. Note that they are NOT field representations. For spinor representations, MμνL=i4(σμˉσνσνˉσμ)MμνR=i4(ˉσμσνˉσνσμ) where σμ=(I,σ),ˉσμ=(I,σ). Since Weyl spinors are not in field representation, all of angular momentum is intrinsic. They obviously have total spin 1/2. The spin operator is S=J=12σ.

The reducible 4d representation (12,0)(0,12) is called the Dirac spinor. Dirac spinor also has spin 1/2. It is worth mentioning the angular momentum (hence spin) is written in the infamous γ-matrices Mμν=i4[γμ,γν] The spin operator is S=12(σσ)12Σ.

The irreducible 4d representation (12,12) is the vector representation. The matrix elements of the group generators are (Mμν)αβ=i(δμαδνβδναδμβ). The spin operator is [Si]jk=iϵijk.

Infinite-Dimensional Representations

The Poincaré group is represented by unitary operator U(Λ,a): U(Λ,a)|p,σ=eipaσCσ,σ(Λ,p)|Λp,σ where |p,σ is shortcut for |M,p,s,σ. In Wigner classification, the representation of the Lorentz group is U(Λ,a)|p,σ=(NpNΛp)eipaσD(s)σ,σ(W(Λ,p))|Λp,σ where W(Λ,p)=L1(Λp)ΛL(p) is the Wigner rotation with respect to the standard vector k=(M,0), Wk=k, the group element of the little group  wk of vector k. D(W) is the representation of the little group. In the massive particle case, wk is the 3d rotation group SO(3). L(p)k=p is some standard Lorentz transformation. Np is a normalization factor that can be chosen to be Np=1.

The infinitesimal transformation U(1+δω)1i2δμνMμν. Therefore, Mμν=i(pμνppνμp)+SμνSμν=L(p)μαL(p)νβσαβ12(pμ(νpL1(p))ακL(p)κβpν(μpL1(p))ακL(p)κβ)σβα where σμν is the generator of the 3d rational group SO(3) in representation labeled by total spin s. Here Mμν has been decomposed into two parts, with Lμνi(pμνppνμp) apparently being the orbital angular momentum. The spin operator depends on momentum pμ of the particle as well as on the choice of the Lorentz transformation L(p) that L(p)k=p. That means there is not unique number of definitions of the spin operator.

Dirac Theory - the "Relativistic Wave Equation Theory"

Finally, we have to address the Dirac Theory. I first want to quote Schwinger and Weinberg to remind the readers:

"The picture of an infinite sea of negative energy electrons is now best regarded as a historical curiosity and forgotten." - Julia Schwinger

"... quantum field theory is the way it is because it is the only way to reconcile the principles of quantum mechanics with those of special relativity." - Steven Weinberg

Nevertheless, the success of Dirac theory on describing the spin-1/2 relativistic single-particle state is worth touring the theory and finding the corresponding spin operators.

Recall that Dirac spinor is a finite-dimensional representation. It has not space-time dependence. It can be used to well describe an electron with some standard momentum ps. The Dirac spinor has a spin S=J=12Σ. To obtain the electron wavefunction and operators with other momentum pμ, one can simply do a Lorentz boost L(ps,p)ps=p to a frame that the electron has a momentum p. It is customary to choose ps=(M,0). Similar to what we have analysed above, there are infinite many definition of the spin vector corresponding to different choice of the Lorentz boost L(ps,p). We first note that the generators of the Lorentz group are J=12Σ=(σσ)K=i2γ0γ=i2α=i2(σσ)
Then the canonical spin is, S=12M(P0Σ+iγ0γ×PPPΣP0+M)

Foldy-Wouthuysen Spin Operator

In the Foldy-Wouthuysen representation the positive modes and negative modes of the Dirac theory decouples. The transformation is very useful for obtaining the relativistic correction of a non-relativistic theory. In free theory, the transformation that carries the Dirac theory to the Foldy-Wouthuysen representation is F(p)=exp[iγˆpθ]=cosθ+γˆpsinθ where θ=12arctan|p|M.

The Hamiltonian in the FW representation becomes, HD=γ0γp+γ0mHFW=γ0p0
In the massless limit, FW representation becomes the chiral representation.

The FW spin is defined as the inverse-transformed spinor spin F(p)SFWF1(p)=12Σ. Then, SFW=12P0(MΣiγ0γ×P+PPΣP0+M)


references:

[1]: N. N. Bogolubov, A. A. Logunov, I. T. Todorov, Introduction to axiomatic quantum field theory. Mathematics Physics Monograph, no. 18, W. A. Benjamin, Inc., Reading, Massachusetts, 1975, xxvi + 708 pp.
[2]: Steven Weinberg, The Quantum Theory of Fields, p. 635. ISBN 0521550017. Cambridge, UK: Cambridge University Press, June 1995
[3]: W. N. Polyzou, W. Gloeckle, H. Witala, Spin in relativistic quantum theory, arXiv:1208.5840v1
[4]: L. L. Foldy and S. A. Wouthuysen, On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic Limit, PRL 78, p. 29, (1950)
[5]: T. D.  Newton and E. P. Wigner, Localized states for elementary systems, Reviews of Modern Physics, 21, p. 400 (1949) url: http://rmp.aps.org/pdf/RMP/v21/i3/p400_1
[6]: Gordon N. Fleming, Covariant Position Operators, Spin, and Locality, Physical Review 137, p. 188, (1965)
[7]: Chueng-Ryong Ji and Chad Mitchell, Poincaree Invariant Algebra From Instant to Light-Front Quantization, Phys.Rev. D 64, p. 085013, (2001); arXiv:hep-ph/0105193v1

To Be Continued ...

.

Jul 11, 2013

Connect to Remote MathKernels

Suppose you have a powerful workstation in your office and a laptop at home. Suppose you prefer to working in your comfortable apartment. But now you have some work with Mathematica to do. Your laptop is convenient to visualize data but incapable of doing the heavy computing tasks. You may think of ssh X11 forwarding:
ssh -X username@hostname.domain
The problem is, with limited network bandwith, X11 forwarding is simply not stable to do a typical Mathematica task. The good news is Mathematica allows you to do use remote MathKernel(s) but local  FrontEnd.

1. Connect to Remote MathKernel(s);




Unfortunately, the default (Basic Options or Advanced Options) configuration does not always work for typical family networks (router, firewall, NAT etc). This error message may pop up:


The problem is that your laptop's IP address is not known. For example, you may hide behind a wireless router and a firewall. The way to solve the problem is to tell MathKernels your port and IP address (your machine may not have one!). If you do have an IP,  replace `ipadress` with it.

You can get your public IP simply by googling "my ip". Of course, your public IP may not be the IP of your machine.

If you are using a dynamic IP address, remember your IP usually changes within a few days.

In the presence of firewall and router, your IP address may not exist. Instead, your router has a public IP (again, could be dynamic or static), say, 12.34.5.6 And your machine was assigned to a local IP address, say, 192.168.1.2 The remote MathKernel(s) will have problem talking to your local machine. What you need is a secure link.

VPN

The first choice is a VPN. With a VPN, you can happily work with the abstract address such as foo.vpn.thirdparty.com , regardless of the actual change of your IP address.


The mathssh is a Mathematica implementation of ssh secure shell built on Java. One can also use ssh directly: `java` -jar `mathssh` -> ssh


SSH Reverse Tunneling 

SSH reverse tunneling (aka. SSH tunnel) is secure method. Interested users can check out reference [3] and the man page of ssh -R option. A Free Package was provided by Sascha Kratky. Both Unix-like (Mac & Linux) and Windows versions are provided (See Ref [5]).

Port Forwarding

Port forwarding is another unsafe approach. Namely you can forward your router's port 22 (the very port that ssh uses), to your local IP say 192.168.1.2.

Warning: port forwarding is not recommended because it is not safe to expose your ssh port. Your computer will be under frequent attack (just like a public server). To improve security, you may want to disable the root login on the ssh server.

vim /etc/ssh/sshd_config
# Add or uncomment this line:
PermitRootLogin no

Your public IP can be checked out simply by googling "my ip". Your local IP can be obtained from (replace wlan0 by eth0 if you are using wired connection):



Ref
[0]: How to | Connect to a Remote Kernel
[1]: Remote kernel for Mathematica via home router
[2]: FAQ Remote Kernels
[3]: Reverse SSH tunnel or connecting to computer behind NAT router
[4]: Manually Creating a Remote Kernel Connection
[5]: Remote Kernel Strategies by Sascha Kratky


2. BTW: Connect to Remote Parallel Kernels

Setting up remote Parallel Kernels is much easier, if the master kernel is local. The master and slaves can simply communicate through ssh.


Jul 2, 2013

On the Parallelism of Symmetric Sparse Matrix Vector Multiplication

Symmetric Sparse Matrix 

Sparse matrices are the matrices populated primarily with zeros.  Iterative linear solvers, such as the Lanczos-Arnoldi procedure, are among the most important computational tools for solving large symmetric (Hermitian) sparse linear problems. The fundamental operation in the iterative methods is the sparse matrix vector multiplication (SpMV).

Sparse matrix can be stored in several formats. The most flexible one is the coordinate storage scheme (COO). In this scheme, one use three arrays (two integral and one scalar):
row(nnz), col(nnz), mat(nnz)
to store the row indices, column indices and the values of the nonzero matrix elements. It's easy to convert a coordinate storage scheme into other schemes such as the compressed column/row scheme (CCS/CRS). For SpMV, coordinate scheme is as efficient as CCS/CRS. It demands 2nnz+nnz memory comparing with CCS/CRS (nnz+n)+nnz, where n is the dimensionality of the matrix and nnz is the number of nonzeros, and n2nnzn for most useful cases. Only half of the matrix is stored, because the matrix is symmetric (Hermitian).


Serial Algorithm

subroutine SpMV(nnz, row, col, mat, n, x, y, ierr)
implicit none
! A*x + y -> y
! n is the dimension of the vector x, y
! nnz is the number of nonzeros
integer, intent(in) :: nnz, n, row(nnz), col(nnz)
double precision, intent(in) :: mat(nnz), x(n)
double precision, intent(inout) :: y(n)
integer, intent(out) :: ierr
integer :: i

  do i = 1, nnz

    y(row(i)) = y(row(i)) + mat(i)*x(col(i))
    if(row(i) .ne. col(i)) then
      y(col(i)) = y(col(i)) + x(row(i))*conjg(mat(i))
    end if

  end do
  ierr = 0
end subroutine


Parallel Algorithm

Here we only talk about MPI parallelism. In reality, it is most efficient to use both MPI and OpenMP. The OpenMP parallelism can be used in the series part.

2D cyclic distribution

A parallel SpMV depends on the the distribution of the processors across the processors. 1D distribution has a serious load balancing problem. An simple and popular scheme is the 2D cyclic distribution.

Left: A naive 2D distribution. Center: A cyclic 2D distribution. Right: A cyclic 2D distribution with non-square block matrices.
In this scheme, the np processors are aligned as a nd×nd 2D grid. For symmetric matrices, only half of the grid is needed. Then, for np processors, nd(nd+1)/2npndfloor(1+np12). It is useful to assign each processor a 2D coordinate (I,J), where I,J=1,2,3,,nd. We will discuss the processor assignment later.

The vector x is stored on the diagonal and the matrix A is distributed across the processor grid. More specifically, vector element xi is stored in processor (I,I), where I = \mod(i, n_d) + 1. Matrix element A_{ij} is distributed to processor (I, J), I = \mod(i, n_d)+1, J = \mod(j,n_d)+1, forming a local block matrix A_{IJ}.

The local blocks need not be square matrix, except the diagonal blocks. Let n_I be the dimensionality of the I-th diagonal blocks. It is easy to see, the row and column dimensionality of the block A_{IJ} is n_I and n_J, respectively.  n_I = \text{floor}\left(\frac{n-1}{n_d}\right) + \vartheta\left(n-\text{floor}\left(\frac{n-1}{n_d}\right)\cdot n_d - I \ge 0\right)

Processor Assignment: Diagonal Major (DM) Scheme

In the DM scheme, the processors are assigned to the matrix blocks according to their diagonal coordinate:

Matrix block A_{IJ} is on processor p = F(I-J-1)+J-1, where F(x) = (x+1) (2 n_d - x)/2. In practice, one knows the local process id p and wants to determine which matrix block is stored on it:  \begin{split} I &= 1+p+1+G(p+1)-F(G(p+1)-1); \\ J &= p+1-F(G(p+1)-1). \end{split}  where G(x) = \text{floor}\left(n_d - \sqrt{\left(n_d+\frac{1}{2}\right)^2-2x} - \frac{1}{2}\right).

To store the node information, we introduce a FORTRAN module (a struct is a similar structure in C):
module nodeinfo
implicit none

integer :: nproc, myrank, myrow, mycol, ndiag, idiag;
integer :: proc_comm, row_comm, col_comm;

contains

subroutine initnodeinfo
use mpi
implicit none
integer :: ierr

! determine ndiag, nproc and rank
  call mpi_comm_size(mpi_comm_world, nproc, ierr)
  call mpi_comm_rank(mpi_comm_world, myrank, ierr)
  ndiag = floor(0.5*sqrt(1.+8.*nproc)-0.5)
  
  color = 0
  if(ndiag*(ndiag+1)/2>rank) then
    color = 1
  endif

  call mpi_comm_split(mpi_comm_world, color, myrank, proc_comm, ierr)
  
  call mpi_comm_size(proc_comm, nproc, ierr)
  call mpi_comm_rank(proc_comm, myrank, ierr)

! determine myrow and mycol i.e. (I,J) the block ID
  mycol = 1+rank-F(G(myrank,ndiag),ndiag)
  myrow = G(myrank,ndiag)+mycol+1
  idiag = myrow - mycol

! create row_comm and col_comm, diag_comm
  call mpi_comm_split(proc_comm, myrow, abs(idiag), row_comm, ierr)
  call mpi_comm_split(proc_comm, mycol, abs(idiag), col_comm, ierr)
  call mpi_comm_split(proc_comm, idiag, myrank, diag_comm, ierr)
  
end subroutine
  
integer function F(x, nd)
implicit none
integer, intent(in) :: x, nd

  F = (x+1)*(2*nd-x)/2

end function

integer function G(x, nd)
implicit none
integer, intent(in) :: x, nd

  G = floor(nd-0.5-sqrt((nd+0.5)**2-2*x)-0.5)

end function

integer function nloc(n, nd, i)
implicit none
integer, intent(in) :: n, nd, i

  nloc = (n-1)/nd
  if(n.ge.((n-1)/nd)*nd+i) then
    nloc = nloc + 1
  end if

end function

end module

The problem of the DM scheme is that the number of processors in each row & column communicator is not the same. Indeed, it is not well balanced: some rows & columns have n_d (ndiag) processors, some of them have only 1. This problem can be solved by using the so-called Balanced Column Major scheme.


Processor Assignment: Balanced Column Major (BCM) Scheme


In the BCM scheme, each row or column has \frac{n_d+1}{2} blocks.

The corresponding matrix blocks distribution on processor p is,
if p\le \text{ceiling}\left(\frac{n_d+1}{2}\right)\left[ n_d+1-\text{ceiling}\left(\frac{n_d+1}{2}\right)\right]-1 then \begin{split} J &= \text{floor}\left(\frac{p}{\text{ceiling}\left(\frac{n_d+1}{2}\right)}\right)+1; \\ I &= \mod\left( p, \text{ceiling}\left(\frac{n_d+1}{2}\right)\right)+J; \end{split}
else,
\begin{split} J &= n_d+2-\text{ceiling}\left(\frac{n_d+1}{2}\right)+\text{floor}\left( \frac{q}{\text{floor}\left(\frac{n_d+1}{2}\right)}\right); \\ I &= \mod\left( \mod\left(q, \text{floor}\left(\frac{n_d+1}{2}\right)\right) +J-1, n_d\right)+1; \end{split} where q = p-\text{ceiling}\left(\frac{n_d+1}{2}\right)\left[n_d+1-\text{ceiling}\left(\frac{n_d+1}{2}\right)\right].

The nodeinfo.mod can be modified accordingly.

The Parallel SpMV Algorithm Based on the 2D cyclic Distribution

A = L + D + L^t, where L is a strict lower triangular matrix and D is a diagonal matrix. T = L+D.
A x + y \to y \Rightarrow T x + (x^t L)^t + y \to y
subroutine SpMV(nnz, row, col, mat, ncol, x, yt, nrow, y, xt, ierr)
use nodeinfo
use mpi
implicit none
! A*x + y -> y
! A = L + D + L^t, where L is a strict lower triangular matrix, 
! D is a diagonal matrix, L^t is the transpose of L.
! Let T = L+D; A = T + L^t.  
! A*x = T*x + L^t*x = T*x + (x^t*L)^t
! 
! x/y is the local vector. Only x/y on the diagonal are initialized/output
! xt/yt is the local vector of the transpose vector of x/y
! note their local dimensionalities nrow, ncol are not the same as the original vector.
integer :: nnz, ncol, nrow, ierr, row(nnz), col(nnz);
double precision :: x(ncol), yt(ncol), y(nrow), xt(nrow), mat(nnz);
intent(in) :: nnz, ncol, nrow, row, col, mat;

! (L+D)*x + y -> y
  call mpi_bcast(x, ncol, mpi_real8, 0, col_comm, ierr)
  
! local SpMV
  if(idiag.eq.0) then
    xt = y
  end if
  do i = 1, nnz
    xt( 1+(row(i)-1)/ndiag ) = xt( 1+(row(i)-1)/ndiag ) + mat(i)*x( 1+(col(i)-1)/ndiag )
  end do

  call mpi_reduce(xt, y, nrow, mpi_real8, mpi_sum, 0, row_comm, ierr)

! x^t * L -> y^t
  if(idiag.eq.0) then
    xt = x
    x = y
  end if
  call mpi_bcast(xt, nrow, mpi_real8, 0, row_comm, ierr)
  
! local SpMV
  do i = 1, nnz
    if( row(i) .eq. col(i) ) then
      x( 1+(col(i)-1)/ndiag ) = x( 1+(col(i)-1)/ndiag ) + xt( 1+(row(i)-1)/ndiag )*conjg(mat(i))
    end if
  end do

  call mpi_reduce(x, yt, ncol, mpi_real8, mpi_sum, 0, col_comm, ierr)

  if(idiag.eq.0) then
    y = y + yt
    x = xt
  endif

end subroutine