$\hbar = c = 1$ : July 2013

Jul 13, 2013

On the Spin in Relativistic Dynamics

Spin is an intrinsic property of a particle associated with rotational symmetry. Roughly speaking, it is the intrinsic part of the angular momentum. Measurement of angular momentum $\vec{J}$ differs in different reference frames. It is tempted to decompose it $\vec{J} = \vec{L} + \vec{S}$, where $\vec{L}$ called orbital angular momentum, $\vec{S}$ called spin angular momentum or spin for short. Orbital angular momentum should vanish in the particle rest frame $\vec{P} = 0$. In non-relativistic quantum mechanics, it is defined as $\vec{L} = \vec{X} \times \vec{P}$. But it is not clear a priori that such a decomposition is always available in relativistic dynamics. Another way of defining spin, is to measure the angular momentum in the particle rest frame. In relativistic dynamics, there is no unique Lorentz transformation that transforms a momentum state to the particle rest frame. Ambiguity thus exists for the definition of spin.

Nevertheless, spin can be defined formally as a vector operator that satisfying the following conditions: \[ \left[ S^i, S^j \right] = i\varepsilon^{ijk} S^k; \quad (i,j,k = 1,2,3) \\ \left[ S^i, P^j \right] = 0; \qquad \qquad \qquad \qquad \quad (\mathbf{*})\\ \vec{S} = \vec{J} \qquad \text{ if } \vec{P} = \vec{p}_c. \qquad \qquad \] The last condition should be understood within momentum states $\vec{S} \left.| p, \sigma\right> = \vec{J} \left.| p, \sigma\right> \quad \text{if } \vec{p} = \vec{p}_c$. The spin operator $\vec{S}$ described above is not covariant. The canonical example of $\vec{p}_c$ is $\vec{0}$.

Spin as an operator must also depend on the specific realization of the space-time symmetry (the Poincaré symmetry), i.e. the representation. According to Wigner theorem, a symmetry transformation in quantum mechanics should be realized either as unitary operator or as anti-unitary operator. Because Lorentz group is non-compact, all unitary representation has to be infinite dimensional, namely fields. Nevertheless, a relativistic theory still can have a finite-dimensional representation. The prominent example is the Dirac theory of relativistic electrons.

Poincaré algebra

Consider the Poincaré algebra, \[
\left[ P^\mu, P^\nu \right] = 0; \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\
\left[ P^\lambda, M^{\mu\nu} \right] = i \left( g^{\lambda\mu} P^\nu - g^{\lambda\nu} P^\mu \right); \quad \qquad \qquad \qquad \qquad \quad \\
\left[ M^{\lambda\rho}, M^{\mu\nu} \right] = i\left( g^{\lambda\nu} M^{\rho\mu} + g^{\rho\mu} M^{\lambda\nu} - g^{\lambda\mu}M^{\rho\nu} - g^{\rho\nu} M^{\lambda\mu} \right);
\] $P^\mu$ and $M^{\mu\nu}$ are the 10 generators of the Poincaré algebra. The six independent components of $M^{\mu\nu}$ are, $J^k \equiv \frac{1}{2} \epsilon^{ijk}M^{ij}, (i,j,k = 1,2,3)$ the angular momenta; $K^i \equiv M^{0i}, (i = 1,2,3)$ the boosts.

The metric tensor is \[
g^{\mu\nu} = g_{\mu\nu} = \begin{pmatrix} 1 & & & \\ & -1 & & \\ & & -1 & \\ & & & -1 \\ \end{pmatrix}
\]
$P^2 = P_\mu P^\mu = \mathscr{M}^2$ is a Casimir element, known as invariant mass squared. In this post, we only consider massive case $\mathscr{M}^2 > 0$ for simplicity.

Pauli-Lubanski Pseudo Vector

\[ W^\mu = -\frac{1}{2} \varepsilon^{\mu\nu\kappa\rho} M_{\nu\kappa} P_\rho, \] where $\varepsilon^{\mu\nu\kappa\lambda}$ is the Levi-Civita tensor. It can be shown, $W^0 = \vec{J}\cdot\vec{P}$, $\vec{W} = \vec{K}\times\vec{P} + P^0 \vec{J}$. Pauli-Lubanski vector satisfies,

$P_\mu W^\mu = 0$;
$\left[ P^\mu, W^\nu \right]= 0$;
$\left[ M^{\mu\nu}, W^\kappa \right] = i\left( g^{\kappa\mu} W^\nu - g^{\kappa\nu} W^\mu\right)$;
$\left[ W^\mu, W^\nu\right] = i \varepsilon^{\mu\nu\kappa\rho} W_\kappa P_\rho$;

$W^\mu$ are the generators of a stabilizer group for each $P^\mu$. For irreducible representations, $W^2 = W_\mu W^\mu$ is a Casimir operator (Casimir element). To evaluate its scalar value, we can study it in the center-of-momentum frame where $\vec{P} = 0 $: $W^\mu = \mathscr{M} (0, \vec{J})$. So $W^2 = - \mathscr{M}^2 \vec{J}^2 = -\mathscr{M}^2 s(s+1)$. Note that $P^2$ and $W^2$ are the only two independent polynomial invariant operators that commute will all the operators (the Casimir elements). Here the total spin appears naturally. But $W^\mu/\mathscr{M}$ (or $\vec{W}/\mathscr{M}$) does not satisfies the angular momentum commutation relations, hence it is still not a spin operator.

It is convenient to define a Pauli-Lubanski tensor, \[
W^{\mu\nu} = \frac{1}{i} \left[ W^\mu, W^\nu \right] \\
= - \frac{1}{-g} \left\{ M^{\mu\nu} P^2 - M^{\mu\lambda} P_\lambda P^\nu + M^{\nu \lambda} P_\lambda P^\mu \right\}
\]

Before we continue our discussion on spin operator, let's see how the Casimir elements help to identify irreducible representations. Casimir elements $P^2$ and $W^2$ belong to a set of mutually commuting operators. $\{ P^2, W^2, P^\mu, W^0 \}$ is one possible set of mutually commuting operators. It is also customary to choose $h = \frac{W^0}{|\vec{P}|} = \hat{P}\cdot \vec{J}$, the helicity, instead of $W^0$. Therefore, particles (irreps.) can be identified by their invariant mass, momentum and spin, helicity, $\left.| \mathscr{M}, p^\mu, s, h \right>$. It is also possible to choose $S_z$ (spin projection in $z$ direction) which we'll defined later to identify the irreps. In fact, the $z$-direction is often chosen along the longitudinal direction $\hat{P}$. In that case, spin projection $S_z$ is the same as the helicity operator $h$.

"Relativistic Spin Operator" via Lorentz Transformed Pauli-Lubanski Vector

As we have stated in the beginning, it seems reasonable to measure the spin operator by a Lorentz transformation to the particle rest frame. In the literature spin defined in this way is called the "relativistic spin". We shall explore the idea in this section.

Let $\vec{S}_p$ be the spin operator that depends on momentum $p$. Now we know its operator value at $\vec{p} = 0$, $\vec{S}_0 = \vec{J}$. To define its operator value at arbitrary momentum $\vec{p}$, we require $\left< \psi|\right. \vec{S}_p \left.| \psi \right> = \left<\psi|\right. U(L_p) \vec{S}_0 U(L_p^{-1})\left.|\psi\right>$, where $L_p^{-1}$ is a Lorentz transformation that takes particle with momentum $p$ to particle rest frame: $L_p^{-1} \cdot p = (\mathscr{M},\vec{0})$. But there is a subtle technicality here: Lorentz transformation demands a covariant 4-vector whereas $\vec{J}$ is only a 3-vector. One important observation is that the Pauli-Lubanski vector has the same expectation value as $\mathscr{M}(0, \vec{J})$ at $\vec{p}= 0$. We simply put $\left< \psi|\right. (0, \vec{S}_p) \left.| \psi \right> = \left<\psi|\right. U(L_p) (0, \vec{S}_0) U(L_p^{-1})\left.|\psi\right> = \frac{1}{\mathscr{M}}\left<\psi|\right. U(L_p) W U(L_p^{-1})\left.|\psi\right>$. Therefore, \[
(0, \vec{S}_p) = \frac{1}{\mathscr{M}} U(L_p) W U(L_p^{-1}) = \frac{1}{\mathscr{M}}L^{-1}_p \cdot W.
\]

Lorentz transformation $L_p$ is not unique. Let $L_p$ and $L'_p$ be two such Lorentz transformations. $L_p^{-1}L'_p \cdot (\mathscr{M},\vec{0}) = (\mathscr{M},\vec{0})$. So $L = L_p^{-1}L'_p$ belongs to the little group $\mathscr{L} = \{L \in SO(3,1)| L\cdot (\mathscr{M},\vec{0})\}$. Conversely, let $L_p \in SO(3,1)$ be some Lorentz transformation that $L_p \cdot (\mathscr{M},\vec{0}) = p$, $\forall L \in \mathscr{L}$, $L_p L$ also takes $(\mathscr{M},\vec{0})$ to $p$. For the Lorentz group, the little group for massive states $\mathscr{M}^2 > 0$, $\mathscr{L} = SO(3)$ is the 3d rotation group. This rotation is called the (generalized) Melosh-Wigner rotation.

It is convenient to choose $L_p$ to be the standard boost (rotationaless boost): \[
{L_p}^0_{\;0} = p^0/\mathscr{M} \qquad \qquad \qquad \qquad \\
{L_p}^i_{\;0} = {L_p}^0_{\; i} = p^i/\mathscr{M} \qquad \qquad \quad \\
{L_p}^i_{\; j} = \delta^{ij} + p^i p^j/(\mathscr{M}(p^0+\mathscr{M}))
\] To obtain $L^{-1}_p$, one simply replaces $\vec{p}$ with $-\vec{p}$. For the standard boost $L_p$, $\vec{S}_p = \frac{1}{\mathscr{M}}\left( \vec{W} - \vec{p} \frac{W^0}{p^0+\mathscr{M}}\right)$. We note that the zero-component is $\frac{1}{\mathscr{M}}\left( p^0/\mathscr{M} W^0 - p^i/\mathscr{M} W^i \right) = \frac{1}{\mathscr{M}^2} P_\mu W^\mu = 0$, which justifies the notation $(0, \vec{S}_p)$. To extend to the spin operator, we simply promote momentum $p$ to momentum operator $P$. Furthermore, $\vec{S}^2 = - (0, \vec{S})^2 = - \frac{1}{\mathscr{M}^2} ( L^{-1}_p W)^2 = - \frac{1}{\mathscr{M}^2} W^2 = s(s+1)$. It can be checked that $\vec{S}$ indeed satisfies the commutation relations for the spin operator. Therefore, $\vec{S} = \frac{1}{\mathscr{M}}\left( \vec{W} - \vec{P} \frac{W^0}{P^0+\mathscr{M}}\right)$ is a spin operator. In the literature, this spin operator is also called the canonical (relativistic) spin [3], or simply the relativistic spin.

A nice feature of the canonical spin is that its longitudinal component $\hat{P}\cdot \vec{S} = \hat{P}\cdot \vec{J} \equiv h $ is the helicity operator. This is consistent with the non-relativistic quantum mechanics, where $\vec{J} = \vec{X}\times \vec{P} + \vec{S}$ hence $ h \equiv \hat{P}\cdot \vec{J} = \hat{P}\cdot\vec{S}$.

As we have stated, there are however other valid spins resulted from rotations of the canonical spin. One prominent example is the light-cone spin. The light-cone representation of a 4-vector $a = (a^0,\vec{a})$ is defined as $a^\pm = a^0 \pm a^3, a^\perp = (a^1,a^2)$. $a\cdot b = a_+ b^+ + a_- b^- - a^\perp \cdot a^\perp = \frac{1}{2} a^- b^+ + \frac{1}{2} a^+ b^- - a^\perp\cdot b^\perp$. The standard light-cone Lorentz boost (rotationaless boost) is, \[
{L^{-1}_p}^+_{\;\mu} = \frac{\mathscr{M}}{p^+}\omega_\mu; \qquad \qquad \qquad \qquad \qquad \qquad \\
{L^{-1}_p}^-_{\;\mu} = 2\frac{p_\mu}{\mathscr{M}}-\frac{\mathscr{M}}{p^+}\omega_\mu; \qquad \qquad \qquad \qquad \\
{L^{-1}_p}^i_{\;+} = - \frac{p^i}{p^+}; \quad
{L^{-1}_p}^i_{\;-} = 0; \quad
{L^{-1}_p}^i_{\; j} = \delta^{ij}
\] The corresponding spin operator is \[
S^+ = \frac{W^+}{P^+}, \quad
S^- = -\frac{W^+}{P^+}, \quad
\mathbf{S}^\perp = \frac{1}{\mathscr{M}}\left( \mathbf{W}^\perp - \mathbf{P}^\perp \frac{W^+}{P^+} \right)
\] $\vec{S}_{LC} = (S^-, \mathbf{S}^\perp) \text{ or } (\mathbf{S}^\perp, S^+)$.

The light-cone spin projection $S^+ = J^3 + \varepsilon^{ij} \frac{B^i P^j}{P^+}$ is kinematic, while $\mathbf{S}^\perp$ is dynamical. It is not difficult to understand this the light-front spin projection, if one note that $M^{\mu\nu} \equiv X^\mu P^\nu - X^\nu P^\mu$. Then the transverse boost $B^i = M^{+i} = X^+ P^i - X^i P^+ = - X^i P^+ $ at $x^+ = 0$, the light-front quantization surface. Therefore $X^i = - \frac{B^i}{P^+}$. And hence $S^+ = J^3 - \varepsilon^{ij} X^i P^j$. The second part $X^1 P^2 - X^2 P^1 = L_z$ is an orbital angular momentum in $z$ (or longitudinal) direction. The expression makes perfect sense by stating spin is angular momentum minus orbital angular momentum $S^+ = J^+ - L^+$, where I have replace $z$ direction with longitudinal direction.

For light-cone dynamics, one nice feature of the light-cone spin is that it incorporates the helicity along longitudinal momentum $P^+$, because the longitudinal momentum is easier to access for light-cone dynamics. Another advantages of the light-cone spin is that for massless particles, $W^\mu = s P^\mu$, the light-cone spin gives non-vanishing result $\vec{S}_{LC} = s \hat{z}$. Whereas for the canonical spin, spin vector for massless particles has to be defined separately. The light-cone spin and canonical spin can be related by a Melosh rotation. Inspired by the light-cone spin, the canonical spin may be better termed as the equal-time spin.

Ji and Mitchell have constructed a spin operator within an interpolation angular that gives the equal-time spin and light-cone spin the instant and light-front limit, respectively [7].

Angular Momentum Decomposition

In non-relativistic quantum mechanics, the angular momentum can be decomposed into an orbital part and a spin part: \[ \vec{J} = \vec{X}\times\vec{P} + \vec{S}. \] This can be generalized into relativistic dynamics through the angular momentum tensor: \[ M^{\mu\nu} = L^{\mu\nu} + S^{\mu\nu}. \] with \[
[S^{\mu\nu}, P^\lambda ] = 0
\] where $\mathcal{L}^{\mu\nu} = i (p^\mu \partial_p^\nu - p^\nu \partial^\mu_p)$ is the orbital angular momentum tensor. It may also be defined from some position operator $X^\mu$ by $L^{\mu\nu} = \frac{1}{2}\left\{X^\mu, P^\mu \right\} - \frac{1}{2}\left\{ X^\nu, P^\mu \right\}$. The 3-vector angular momentum is defined as $ J^i = \frac{1}{2}\epsilon^{ijk} M^{jk} $. Similarly, the spin operator may be defined as $S^i = \frac{1}{2}\epsilon^{ijk} S^{jk} $. In addition, define a dipole vector $D^i = S^{0i}$. It is easy to see, $\varepsilon^{\mu\nu\kappa\lambda} L_{\nu\kappa}P_\lambda = 0$. Therefore, $W^\mu = -\frac{1}{2}\varepsilon^{\mu\nu\kappa\lambda}S_{\nu\kappa}P_\lambda$. So, $W^0 = \vec{S} \cdot \vec{P}$, $\vec{W} = \vec{D} \times \vec{P} + P^0 \vec{S}$. If $S^{\mu\nu}$ is linear in terms of $W^{\mu\nu}$, the general form of it is \[
S^{\mu\nu} = \varepsilon^{\mu\nu\kappa\rho} W_\kappa (a P_\rho + b \eta_\rho)
\] where $\eta^\rho$ is a constant 4-vector, $a,b$ are scalars. Substitute this back to Pauli-Lubanski vector, we obtain: $a P^2 + b \eta \cdot P = 1$. The corresponding spin vector is \[
\vec{S} = (a P^0 + b \eta^0) \vec{W} - W^0 (a \vec{P} + b \vec{\eta}) \\
\vec{D} = \vec{W} \times (a \vec{P} + b \vec{\eta})
\] The supplementary condition gives additional constraint of the spin tensor $S^{\mu\nu}$. There are three popular SSCs:

(Møller): $S^{\mu\nu} \eta_\nu = 0$;
(Fokker-Synge-Pryce, Covariant): $S^{\mu\nu} P_\nu = 0$;
(Newton-Wigner, Canonical): $\mathscr{M} S^{\mu\nu}\eta_\nu + S^{\mu\nu}P_\nu = 0$;

Each SSC gives a definition of the spin tensor.

(Møller): $S^{\mu\nu} = \varepsilon^{\mu\nu\rho\kappa}W_\rho\eta_\kappa /\eta\cdot P$,($a = 0, b = 1/\eta\cdot P$);
(Fokker-Synge-Pryce, Covariant): $S^{\mu\nu} = \varepsilon^{\mu\nu\rho\kappa}W_\rho P_\kappa / P^2 $, ($a = 1/P^2, b = 0$);
(Newton-Wigner, Canonical): $S^{\mu\nu} = \varepsilon^{\mu\nu\rho\kappa}W_\rho\left( \eta_\kappa + P_\kappa/\mathscr{M} \right) /(\mathscr{M} + \eta\cdot P)$,
($a = 1/(\mathscr{M}(\mathscr{M}+P\cdot \eta)), b = 1/(\mathscr{M}+P\cdot \eta)$);

Case 1: $\vec{S}$ is the equal-time spin operator. Then $\vec{S}^2 = -W^2/\mathscr{M}^2$. We conclude that
$ a = \frac{1}{\mathscr{M}(P^0 \pm \mathscr{M})}, b\eta^0 = \frac{1}{P^0 \pm \mathscr{M}}, \vec{\eta} = 0$. This corresponds to the Newton-Wigner SSC. And the resultant spin vector is just the canonical spin operator that we have obtained in the previous section.

Case 2: $\vec{S}$ is not the equal-time spin operator, but $\lambda \vec{S} + \mu \vec{D}$ is. Let's evaluate $\mathscr{S}^2 \equiv -\frac{1}{2}S^{\mu\nu}S_{\mu\nu} = \vec{S}^2 + \vec{D}^2$ for Fokker-Synge-Pryce spin tensor, because its a genuine Lorentz scalar. $\mathscr{S}^2 = -W^2/\mathscr{M}^2 = s(s+1)$. Apparently, $\vec{S}^2 \ne s(s+1)$. Let's redefine a spin operator, \[
\vec{S'} = \vec{S} \pm \vec{D} = \frac{1}{\mathscr{M}^2}\left( P^0 \vec{W} - W^0 \vec{P} \pm \vec{W} \times \vec{P} \right)
\] then $\vec{S'}^2 = s(s+1)$. It can be checked that $\vec{S'}$ satisfies the $SO(3)$ Lie algebra.

Case 3: the light-cone spin.

Newton-Wigner Position Operator

\[ \vec{X}_{NW} = -\frac{1}{2}\left\{\vec{K}, \frac{1}{P^0} \right\} - \frac{1}{\mathscr{M}}\frac{\vec{P}\times\vec{W}}{P^0(P^0+\mathscr{M})} \] The nice part of the Newton-Wigner operator is that it satisfies \[
\left[ X^i_{NW}, P^j \right] = i \delta^{ij}, \quad
\left[ X^i_{NW}, X^j_{NW} \right] = 0 \\
\vec{J} = \vec{X}_{NW} \times \vec{P} + \vec{S} \]

Field Decomposition

Recall the Lorentz transformation of a quantum field is, \[
(\Lambda \varphi)_a(x) = \sum_{b}D_{ab}(\Lambda) \varphi_b(\Lambda^{-1} x) \\
\Rightarrow \qquad
[ \varphi_a(x), M^{\mu\nu} ] = -i\left( x^\mu \partial^\nu - x^\nu \partial^\mu \right) \varphi_a(x) + \mathcal{S}_{ab}^{\mu\nu} \cdot \varphi_b(x)
\] where $( \Lambda \varphi)_a (x) \equiv U(\Lambda^{-1}) \varphi(x)_a U(\Lambda), U(\Lambda) = e^{-\frac{i}{2}\omega_{\mu\nu}M^{\mu\nu}}, D(\Lambda) = e^{-\frac{i}{2}\omega_{\mu\nu}\mathcal{S}^{\mu\nu}}$. It's easy to recognize that $D(\Lambda)$ is a finite-dimensional representation of the Lorentz group. According to Noether theorem, the conserved current is, \[
\vec{J} = \int \mathrm{d}^3x \bar\psi \gamma^0 \left( \vec{r}\times (-i\nabla) + \frac{1}{2}\vec{\Sigma} \right) \psi
\]It seems natural to write $\vec{J} = \vec{X} \times \vec{P} + \frac{1}{2}\vec{\Sigma}$ or $M^{\mu\nu} = \mathcal{L}^{\mu\nu} + \mathcal{S}^{\mu\nu}$, namely, $S^{\mu\nu} = \mathcal{S}^{\mu\nu}$. But this is not strictly correct. Spin tensor $S^{\mu\nu}$ is an Hermitian operator in Hilbert space whereas $\mathcal{S}^{\mu\nu}$ is only a finite-dimensional linear operator. In fact, as we have stated in the beginning, $\mathcal{S}^{\mu\nu}$ cannot be Hermitian. It will be utterly wrong to identify $\mathcal{S}^{\mu\nu}$ as the spin tensor, although we'll see below, $\mathcal{S}^{\mu\nu}$ does tell us some information about the spin. We should note that most of the trouble is caused by the boosts. As spin is closely related to the rotation property, the finite-dimensional $\mathcal{S}^{\mu\nu}$ does tell us a great deal of information.

Finite-Dimensional Representations

The finite-dimensional irreducible representations of the Lorentz group are identified with Casimir elements of the complex Lie algebra \[
N^i_\pm = \frac{1}{2}(J^i \pm i K^i), \quad i = 1,2,3 \\
\left[N^i_\pm, N^j_\pm \right] = i \epsilon^{ijk} N^k_\pm, \left[N^i_+, N^j_-\right] = 0
\] The Casimirs are $N_+^2$ and $N_-^2$. For irreducible representations, $N_\pm^2 = n_\pm (n_\pm+1)$, $n_\pm = 0, \frac{1}{2}, 1, \frac{3}{2}, \cdots$. We'll use $(n_+, n_-)$ to identify each irrep. The dimension of this irrep is $(2 n_++1) (2n_-+1)$.

The simplest non-trivial irreps are the 2d spinor representations $(\frac{1}{2}, 0)$ and $(0, \frac{1}{2})$, known as the left-handed and right-handed Weyl spinors, respectively. Note that they are NOT field representations. For spinor representations, \[
M^{\mu\nu}_L = \frac{i}{4}\left( \sigma^\mu \bar\sigma^\nu - \sigma^\nu \bar\sigma^\mu \right) \\
M^{\mu\nu}_R = -\frac{i}{4}\left( \bar\sigma^\mu \sigma^\nu - \bar\sigma^\nu \sigma^\mu \right) \\
\] where $\sigma^\mu = (I, \vec\sigma), \bar\sigma^\mu = (I, -\vec\sigma)$. Since Weyl spinors are not in field representation, all of angular momentum is intrinsic. They obviously have total spin 1/2. The spin operator is $ \vec{S} = \vec{J} = \frac{1}{2} \vec{\sigma}$.

The reducible 4d representation $(\frac{1}{2}, 0)\otimes(0,\frac{1}{2})$ is called the Dirac spinor. Dirac spinor also has spin 1/2. It is worth mentioning the angular momentum (hence spin) is written in the infamous $\gamma$-matrices \[
M^{\mu\nu} = \frac{i}{4}\left[ \gamma^\mu, \gamma^\nu \right]
\] The spin operator is $\vec{S} = \frac{1}{2} \begin{pmatrix} \vec{\sigma} & \\ & \vec{\sigma} \\ \end{pmatrix} \equiv \frac{1}{2} \vec{\Sigma}$.

The irreducible 4d representation $(\frac{1}{2}, \frac{1}{2})$ is the vector representation. The matrix elements of the group generators are \[
( M^{\mu\nu} )_{\alpha\beta} = -i \left( \delta^\mu_{\;\alpha} \delta^{\nu}_{\;\beta} - \delta^{\nu}_{\;\alpha} \delta^\mu_{\;\beta} \right).
\] The spin operator is $\left[S^i \right]_{jk} = -i \epsilon^{ijk}$.

Infinite-Dimensional Representations

The Poincaré group is represented by unitary operator $ U(\Lambda,a) $: \[
U(\Lambda, a) \left.| p, \sigma \right> = e^{-i p\cdot a} \sum_{\sigma'} C_{\sigma,\sigma'}(\Lambda, p) \left.|\Lambda\cdot p, \sigma' \right>
\] where $\left.|p,\sigma\right>$ is shortcut for $\left.|\mathscr{M}, p, s, \sigma \right>$. In Wigner classification, the representation of the Lorentz group is \[
U(\Lambda, a) \left.| p, \sigma \right> = \left( \frac{N_p}{N_{\Lambda\cdot p}} \right) e^{-i p\cdot a} \sum_{\sigma'} D^{(s)}_{\sigma,\sigma'}(W(\Lambda, p)) \left.|\Lambda\cdot p, \sigma'\right>
\] where $W(\Lambda, p) = L^{-1}(\Lambda \cdot p) \Lambda L(p)$ is the Wigner rotation with respect to the standard vector $k = (\mathscr{M}, \vec{0})$, $W\cdot k = k$, the group element of the little group $w_k$ of vector $k$. $D(W)$ is the representation of the little group. In the massive particle case, $w_k$ is the 3d rotation group $SO(3)$. $L(p)\cdot k = p$ is some standard Lorentz transformation. $N_p$ is a normalization factor that can be chosen to be $N_p = 1$.

The infinitesimal transformation $U(1 + \delta \omega) \simeq 1 - \frac{i}{2} \delta_{\mu\nu} M^{\mu\nu}$. Therefore, \[
M^{\mu\nu} = i(p^\mu \partial_p^\nu - p^\nu \partial_p^\mu) + S^{\mu\nu} \\
S^{\mu\nu} = L(p)^\mu_{\;\alpha} L(p)^\nu_{\; \beta} \sigma^{\alpha\beta} - \frac{1}{2}\left( p^\mu (\partial_p^\nu L^{-1}(p))^\alpha_{\;\kappa} L(p)^\kappa_\beta - p^\nu (\partial_p^\mu L^{-1}(p))^\alpha_{\;\kappa} L(p)^\kappa_\beta \right) \sigma_\alpha^{\;\beta}
\] where $\sigma^{\mu\nu}$ is the generator of the 3d rational group $SO(3)$ in representation labeled by total spin $s$. Here $M^{\mu\nu}$ has been decomposed into two parts, with $\mathcal{L}^{\mu\nu} \equiv i(p^\mu \partial_p^\nu - p^\nu \partial_p^\mu)$ apparently being the orbital angular momentum. The spin operator depends on momentum $p^\mu$ of the particle as well as on the choice of the Lorentz transformation $L(p)$ that $L(p) \cdot k = p$. That means there is not unique number of definitions of the spin operator.

Dirac Theory - the "Relativistic Wave Equation Theory"

Finally, we have to address the Dirac Theory. I first want to quote Schwinger and Weinberg to remind the readers:

"The picture of an infinite sea of negative energy electrons is now best regarded as a historical curiosity and forgotten." - Julia Schwinger

"... quantum field theory is the way it is because it is the only way to reconcile the principles of quantum mechanics with those of special relativity." - Steven Weinberg

Nevertheless, the success of Dirac theory on describing the spin-1/2 relativistic single-particle state is worth touring the theory and finding the corresponding spin operators.

Recall that Dirac spinor is a finite-dimensional representation. It has not space-time dependence. It can be used to well describe an electron with some standard momentum $p_s$. The Dirac spinor has a spin $\vec{S} = \vec{J} = \frac{1}{2}\vec{\Sigma}$. To obtain the electron wavefunction and operators with other momentum $p^\mu$, one can simply do a Lorentz boost $L(p_s, p) \cdot p_s = p$ to a frame that the electron has a momentum $p$. It is customary to choose $p_s = (\mathscr{M}, \vec{0})$. Similar to what we have analysed above, there are infinite many definition of the spin vector corresponding to different choice of the Lorentz boost $L(p_s, p)$. We first note that the generators of the Lorentz group are \[
\vec{J} = \frac{1}{2} \vec{\Sigma} = \begin{pmatrix} \vec{\sigma} & \\ & \vec{\sigma} \\ \end{pmatrix} \\
\vec{K} = \frac{i}{2}\gamma^0 \vec{\gamma} = \frac{i}{2}\vec{\alpha} = -\frac{i}{2}\begin{pmatrix} \vec{\sigma} & \\ & -\vec{\sigma} \\ \end{pmatrix}
\]
Then the canonical spin is, \[
\vec{S} = \frac{1}{2\mathscr{M}}\left( P^0 \vec{\Sigma} + i \gamma^0 \vec{\gamma} \times \vec{P} - \vec{P} \frac{\vec{P}\cdot \vec\Sigma}{P^0+\mathscr{M}} \right)
\]

Foldy-Wouthuysen Spin Operator

In the Foldy-Wouthuysen representation the positive modes and negative modes of the Dirac theory decouples. The transformation is very useful for obtaining the relativistic correction of a non-relativistic theory. In free theory, the transformation that carries the Dirac theory to the Foldy-Wouthuysen representation is \[
F(p) = \exp\left[ -i \vec\gamma \cdot \hat p \theta \right] \\
= \cos \theta + \vec{\gamma} \cdot \hat p \sin \theta
\] where $\theta = \frac{1}{2}\arctan \frac{|\vec p|}{\mathscr{M}}$.

The Hamiltonian in the FW representation becomes, \[
H_D = \gamma^0 \vec\gamma \cdot \vec p + \gamma^0 m \to H_{FW} = \gamma^0 p^0
\]
In the massless limit, FW representation becomes the chiral representation.

The FW spin is defined as the inverse-transformed spinor spin $F(p) \vec{S}_{FW} F^{-1}(p) = \frac{1}{2}\vec{\Sigma}$. Then, \[
\vec{S}_{FW} = \frac{1}{2 P^0}\left( \mathscr{M} \vec\Sigma - i \gamma^0 \vec{\gamma} \times \vec{P} + \vec{P} \frac{\vec{P}\cdot \vec\Sigma}{P^0+\mathscr{M}} \right)
\]

references:

[1]: N. N. Bogolubov, A. A. Logunov, I. T. Todorov, Introduction to axiomatic quantum field theory. Mathematics Physics Monograph, no. 18, W. A. Benjamin, Inc., Reading, Massachusetts, 1975, xxvi + 708 pp.
[2]: Steven Weinberg, The Quantum Theory of Fields, p. 635. ISBN 0521550017. Cambridge, UK: Cambridge University Press, June 1995
[3]: W. N. Polyzou, W. Gloeckle, H. Witala, Spin in relativistic quantum theory, arXiv:1208.5840v1
[4]: L. L. Foldy and S. A. Wouthuysen, On the Dirac Theory of Spin 1/2 Particles and Its Non-Relativistic Limit, PRL 78, p. 29, (1950)
[5]: T. D. Newton and E. P. Wigner, Localized states for elementary systems, Reviews of Modern Physics, 21, p. 400 (1949) url: http://rmp.aps.org/pdf/RMP/v21/i3/p400_1
[6]: Gordon N. Fleming, Covariant Position Operators, Spin, and Locality, Physical Review 137, p. 188, (1965)
[7]: Chueng-Ryong Ji and Chad Mitchell, Poincaree Invariant Algebra From Instant to Light-Front Quantization, Phys.Rev. D 64, p. 085013, (2001); arXiv:hep-ph/0105193v1

To Be Continued ...

Jul 11, 2013

Connect to Remote MathKernels

Suppose you have a powerful workstation in your office and a laptop at home. Suppose you prefer to working in your comfortable apartment. But now you have some work with Mathematica to do. Your laptop is convenient to visualize data but incapable of doing the heavy computing tasks. You may think of ssh X11 forwarding:

ssh -X username@hostname.domain

The problem is, with limited network bandwith, X11 forwarding is simply not stable to do a typical Mathematica task. The good news is Mathematica allows you to do use remote MathKernel(s) but local FrontEnd.

1. Connect to Remote MathKernel(s);

Unfortunately, the default (Basic Options or Advanced Options) configuration does not always work for typical family networks (router, firewall, NAT etc). This error message may pop up:

The problem is that your laptop's IP address is not known. For example, you may hide behind a wireless router and a firewall. The way to solve the problem is to tell MathKernels your port and IP address (your machine may not have one!). If you do have an IP, replace `ipadress` with it.

You can get your public IP simply by googling "my ip". Of course, your public IP may not be the IP of your machine.

If you are using a dynamic IP address, remember your IP usually changes within a few days.

In the presence of firewall and router, your IP address may not exist. Instead, your router has a public IP (again, could be dynamic or static), say, 12.34.5.6 And your machine was assigned to a local IP address, say, 192.168.1.2 The remote MathKernel(s) will have problem talking to your local machine. What you need is a secure link.

VPN

The first choice is a VPN. With a VPN, you can happily work with the abstract address such as foo.vpn.thirdparty.com , regardless of the actual change of your IP address.

The mathssh is a Mathematica implementation of ssh secure shell built on Java. One can also use ssh directly: `java` -jar `mathssh` -> ssh

SSH Reverse Tunneling

SSH reverse tunneling (aka. SSH tunnel) is secure method. Interested users can check out reference [3] and the man page of ssh -R option. A Free Package was provided by Sascha Kratky. Both Unix-like (Mac & Linux) and Windows versions are provided (See Ref [5]).

Port Forwarding

Port forwarding is another unsafe approach. Namely you can forward your router's port 22 (the very port that ssh uses), to your local IP say 192.168.1.2.

Warning: port forwarding is not recommended because it is not safe to expose your ssh port. Your computer will be under frequent attack (just like a public server). To improve security, you may want to disable the root login on the ssh server.

vim /etc/ssh/sshd_config
# Add or uncomment this line:
PermitRootLogin no

Your public IP can be checked out simply by googling "my ip". Your local IP can be obtained from (replace wlan0 by eth0 if you are using wired connection):

Ref
[0]: How to | Connect to a Remote Kernel
[1]: Remote kernel for Mathematica via home router
[2]: FAQ Remote Kernels
[3]: Reverse SSH tunnel or connecting to computer behind NAT router
[4]: Manually Creating a Remote Kernel Connection
[5]: Remote Kernel Strategies by Sascha Kratky

2. BTW: Connect to Remote Parallel Kernels

Setting up remote Parallel Kernels is much easier, if the master kernel is local. The master and slaves can simply communicate through ssh.

Jul 2, 2013

On the Parallelism of Symmetric Sparse Matrix Vector Multiplication

Symmetric Sparse Matrix

Sparse matrices are the matrices populated primarily with zeros. Iterative linear solvers, such as the Lanczos-Arnoldi procedure, are among the most important computational tools for solving large symmetric (Hermitian) sparse linear problems. The fundamental operation in the iterative methods is the sparse matrix vector multiplication (SpMV).

Sparse matrix can be stored in several formats. The most flexible one is the coordinate storage scheme (COO). In this scheme, one use three arrays (two integral and one scalar):

row(nnz), col(nnz), mat(nnz)

to store the row indices, column indices and the values of the nonzero matrix elements. It's easy to convert a coordinate storage scheme into other schemes such as the compressed column/row scheme (CCS/CRS). For SpMV, coordinate scheme is as efficient as CCS/CRS. It demands $2 n_{nz} + n_{nz}$ memory comparing with CCS/CRS $(n_{nz} + n) + n_{nz}$, where $n$ is the dimensionality of the matrix and $n_{nz}$ is the number of nonzeros, and $n^2 \gg n_{nz} \gg n$ for most useful cases. Only half of the matrix is stored, because the matrix is symmetric (Hermitian).

Serial Algorithm

subroutine SpMV(nnz, row, col, mat, n, x, y, ierr)
implicit none
! A*x + y -> y
! n is the dimension of the vector x, y
! nnz is the number of nonzeros
integer, intent(in) :: nnz, n, row(nnz), col(nnz)
double precision, intent(in) :: mat(nnz), x(n)
double precision, intent(inout) :: y(n)
integer, intent(out) :: ierr
integer :: i

  do i = 1, nnz

    y(row(i)) = y(row(i)) + mat(i)*x(col(i))
    if(row(i) .ne. col(i)) then
      y(col(i)) = y(col(i)) + x(row(i))*conjg(mat(i))
    end if

  end do
  ierr = 0
end subroutine

Parallel Algorithm

Here we only talk about MPI parallelism. In reality, it is most efficient to use both MPI and OpenMP. The OpenMP parallelism can be used in the series part.

2D cyclic distribution

A parallel SpMV depends on the the distribution of the processors across the processors. 1D distribution has a serious load balancing problem. An simple and popular scheme is the 2D cyclic distribution.

Left: A naive 2D distribution. Center: A cyclic 2D distribution. Right: A cyclic 2D distribution with non-square block matrices.

In this scheme, the $n_p$ processors are aligned as a $n_d \times n_d$ 2D grid. For symmetric matrices, only half of the grid is needed. Then, for $n_p$ processors, $ n_d (n_d + 1)/2 \le n_p \implies n_d \le \text{floor}\left( \frac{\sqrt{1+n_p}-1}{2}\right)$. It is useful to assign each processor a 2D coordinate $(I, J)$, where $I,J = 1,2,3,\cdots, n_d$. We will discuss the processor assignment later.

The vector $x$ is stored on the diagonal and the matrix $A$ is distributed across the processor grid. More specifically, vector element $x_i$ is stored in processor $(I, I)$, where $I = \mod(i, n_d) + 1$. Matrix element $A_{ij}$ is distributed to processor $(I, J)$, $I = \mod(i, n_d)+1, J = \mod(j,n_d)+1$, forming a local block matrix $A_{IJ}$.

The local blocks need not be square matrix, except the diagonal blocks. Let $n_I$ be the dimensionality of the $I$-th diagonal blocks. It is easy to see, the row and column dimensionality of the block $A_{IJ}$ is $n_I$ and $n_J$, respectively. \[ n_I = \text{floor}\left(\frac{n-1}{n_d}\right) + \vartheta\left(n-\text{floor}\left(\frac{n-1}{n_d}\right)\cdot n_d - I \ge 0\right) \]

Processor Assignment: Diagonal Major (DM) Scheme

In the DM scheme, the processors are assigned to the matrix blocks according to their diagonal coordinate:

Matrix block $A_{IJ}$ is on processor $ p = F(I-J-1)+J-1$, where $F(x) = (x+1) (2 n_d - x)/2$. In practice, one knows the local process id $p$ and wants to determine which matrix block is stored on it: \[
\begin{split}
I &= 1+p+1+G(p+1)-F(G(p+1)-1); \\
J &= p+1-F(G(p+1)-1).
\end{split}
\] where $G(x) = \text{floor}\left(n_d - \sqrt{\left(n_d+\frac{1}{2}\right)^2-2x} - \frac{1}{2}\right)$.

To store the node information, we introduce a FORTRAN module (a struct is a similar structure in C):

module nodeinfo
implicit none

integer :: nproc, myrank, myrow, mycol, ndiag, idiag;
integer :: proc_comm, row_comm, col_comm;

contains

subroutine initnodeinfo
use mpi
implicit none
integer :: ierr

! determine ndiag, nproc and rank
  call mpi_comm_size(mpi_comm_world, nproc, ierr)
  call mpi_comm_rank(mpi_comm_world, myrank, ierr)
  ndiag = floor(0.5*sqrt(1.+8.*nproc)-0.5)
  
  color = 0
  if(ndiag*(ndiag+1)/2>rank) then
    color = 1
  endif

  call mpi_comm_split(mpi_comm_world, color, myrank, proc_comm, ierr)
  
  call mpi_comm_size(proc_comm, nproc, ierr)
  call mpi_comm_rank(proc_comm, myrank, ierr)

! determine myrow and mycol i.e. (I,J) the block ID
  mycol = 1+rank-F(G(myrank,ndiag),ndiag)
  myrow = G(myrank,ndiag)+mycol+1
  idiag = myrow - mycol

! create row_comm and col_comm, diag_comm
  call mpi_comm_split(proc_comm, myrow, abs(idiag), row_comm, ierr)
  call mpi_comm_split(proc_comm, mycol, abs(idiag), col_comm, ierr)
  call mpi_comm_split(proc_comm, idiag, myrank, diag_comm, ierr)
  
end subroutine
  
integer function F(x, nd)
implicit none
integer, intent(in) :: x, nd

  F = (x+1)*(2*nd-x)/2

end function

integer function G(x, nd)
implicit none
integer, intent(in) :: x, nd

  G = floor(nd-0.5-sqrt((nd+0.5)**2-2*x)-0.5)

end function

integer function nloc(n, nd, i)
implicit none
integer, intent(in) :: n, nd, i

  nloc = (n-1)/nd
  if(n.ge.((n-1)/nd)*nd+i) then
    nloc = nloc + 1
  end if

end function

end module

The problem of the DM scheme is that the number of processors in each row & column communicator is not the same. Indeed, it is not well balanced: some rows & columns have $n_d$ (ndiag) processors, some of them have only 1. This problem can be solved by using the so-called Balanced Column Major scheme.

Processor Assignment: Balanced Column Major (BCM) Scheme

In the BCM scheme, each row or column has $\frac{n_d+1}{2}$ blocks.

The corresponding matrix blocks distribution on processor $p$ is,
if $ p\le \text{ceiling}\left(\frac{n_d+1}{2}\right)\left[ n_d+1-\text{ceiling}\left(\frac{n_d+1}{2}\right)\right]-1$ then \[ \begin{split}
J &= \text{floor}\left(\frac{p}{\text{ceiling}\left(\frac{n_d+1}{2}\right)}\right)+1; \\ I &= \mod\left( p, \text{ceiling}\left(\frac{n_d+1}{2}\right)\right)+J;
\end{split} \]
else,
\[ \begin{split} J &= n_d+2-\text{ceiling}\left(\frac{n_d+1}{2}\right)+\text{floor}\left( \frac{q}{\text{floor}\left(\frac{n_d+1}{2}\right)}\right); \\ I &= \mod\left( \mod\left(q, \text{floor}\left(\frac{n_d+1}{2}\right)\right) +J-1, n_d\right)+1; \end{split} \] where $q = p-\text{ceiling}\left(\frac{n_d+1}{2}\right)\left[n_d+1-\text{ceiling}\left(\frac{n_d+1}{2}\right)\right]$.

The nodeinfo.mod can be modified accordingly.

The Parallel SpMV Algorithm Based on the 2D cyclic Distribution

$A = L + D + L^t$, where $L$ is a strict lower triangular matrix and $D$ is a diagonal matrix. $T = L+D$.
\[ A x + y \to y \Rightarrow T x + (x^t L)^t + y \to y \]

subroutine SpMV(nnz, row, col, mat, ncol, x, yt, nrow, y, xt, ierr)
use nodeinfo
use mpi
implicit none
! A*x + y -> y
! A = L + D + L^t, where L is a strict lower triangular matrix, 
! D is a diagonal matrix, L^t is the transpose of L.
! Let T = L+D; A = T + L^t.  
! A*x = T*x + L^t*x = T*x + (x^t*L)^t
! 
! x/y is the local vector. Only x/y on the diagonal are initialized/output
! xt/yt is the local vector of the transpose vector of x/y
! note their local dimensionalities nrow, ncol are not the same as the original vector.
integer :: nnz, ncol, nrow, ierr, row(nnz), col(nnz);
double precision :: x(ncol), yt(ncol), y(nrow), xt(nrow), mat(nnz);
intent(in) :: nnz, ncol, nrow, row, col, mat;

! (L+D)*x + y -> y
  call mpi_bcast(x, ncol, mpi_real8, 0, col_comm, ierr)
  
! local SpMV
  if(idiag.eq.0) then
    xt = y
  end if
  do i = 1, nnz
    xt( 1+(row(i)-1)/ndiag ) = xt( 1+(row(i)-1)/ndiag ) + mat(i)*x( 1+(col(i)-1)/ndiag )
  end do

  call mpi_reduce(xt, y, nrow, mpi_real8, mpi_sum, 0, row_comm, ierr)

! x^t * L -> y^t
  if(idiag.eq.0) then
    xt = x
    x = y
  end if
  call mpi_bcast(xt, nrow, mpi_real8, 0, row_comm, ierr)
  
! local SpMV
  do i = 1, nnz
    if( row(i) .eq. col(i) ) then
      x( 1+(col(i)-1)/ndiag ) = x( 1+(col(i)-1)/ndiag ) + xt( 1+(row(i)-1)/ndiag )*conjg(mat(i))
    end if
  end do

  call mpi_reduce(x, yt, ncol, mpi_real8, mpi_sum, 0, col_comm, ierr)

  if(idiag.eq.0) then
    y = y + yt
    x = xt
  endif

end subroutine