Dec 24, 2014

The Golden-Thompson Inequality and the Upper bound of Quantum Partition Function


Let $A$ and $B$ be two Hermitian matrices. The famous Baker-Campell-Hausdorff formula says, \[
\exp\big( A + B \big) = \exp \big( A \big) \exp\big( B \big) \exp\big( - [A, B]/2 \big) \exp\big( [A, [A, B]]/6 + [B, [A,B]]/3 \big)\cdots
\] In general, $ \exp(A+B) \ne \exp (A) \exp(B) $. Applying the trace-determinant relation $\det \exp A = \exp \mathrm{tr} A$, however, we still have \[
\det\exp(A+B) = \det \exp(A) \exp(B).
\]

In a similar fashion, the Golden-Thompson theorem states, \[
\mathrm{tr}\exp(A+B) \le \mathrm{tr} \exp(A) \exp(B),
\] the equality holds iff $[A, B] = 0$.

As an application, let's consider $-\beta^{-1}A = \hat p^2/(2m), -\beta^{-1}B = V(\hat q)$, and $[\hat q, \hat p] = i$. $-\beta^{-1}(A + B) = \hat p^2/(2m) + V(\hat q) = \hat H$ is the single-particle Hamiltonian. Apparently, $Z(\beta) = \mathrm{tr} \exp( -\beta \hat H )$ is the partition function of the one-body system. Now consider the right-hand side: \[
\begin{split}
& \mathrm{tr} \exp(-\beta \hat p^2/(2m)) \exp(-\beta V(\hat q) ) \\
& = \int dq \int \frac{dp}{2\pi} \langle q | \exp(-\beta \hat p^2/(2m)) | p \rangle \langle p | \exp(-\beta V(\hat q) ) | q \rangle \\
& = \int dx \int \frac{dp}{2\pi} \exp(-\beta p^2/(2m))  \exp(-\beta V(q) )  \langle q | p \rangle \langle p | q \rangle \\
& = \int dx \int \frac{dp}{2\pi} \exp(-\beta ( p^2/(2m) + V(q)) ) \\
& =  \int dx \int \frac{dp}{2\pi} \exp( -\beta H ) = Z_\text{cl}(\beta).
\end{split}
\] The last line is the partition function of a classical single-particle system.

Therefore, the Golden-Thompson theorem says, the classical partition function is the upper bounds of the quantum partition function, $Z(\beta) \le Z_\text{cl}(\beta)$. Replace the single-particle operator and single-particle basis with the many-body ones, or even second quantized fields, it is straight-forward to show the general relation between quantum and classical partition functions.


Oct 29, 2014

Disturbing a Quantum Field

Lesson from Condensed Matter Physics (CMP)

In condensed matter systems, it is useful to study the collective response to external fields. For example, when adding an electric field $\vec E$ to a metal, a current $\vec J$ would be generated. We call the response the conductivity, which is unique for each material, thus reflects internal properties of the system. Quantitatively, the conductivity is defined in terms of the external field and the respond current \[
J(t, \vec x) = \int \mathrm d^3x' \int_{-\infty}^t \mathrm dt' \sigma(t-t', \vec x - \vec x') E(t', \vec x').
\] The upper limit of the temporal integral is due to causality. In general, the conductivity may depends on the external field $\vec E$ (non-linear effect). In the weak field limit $E \to 0$,  however, it is independent to $E$ (linear response) and provides an important probe for the property of the material. There are also other linear responses of the condensed matter systems. The Green-Kubo formula (Melville Green 1954, Ryogo Kubo 1957) provides a neat recipe to compute the linear response of the condensed matter system to an external field.

Suppose a weak external field $F(t, \vec x)$ couples to a condensed matter system through $H_\mathrm{int} = \int \mathrm d^3x F(t, \vec x) A(t, \vec x)$. The system was originally described by the density operator $\varrho =Z^{-1} \exp \big[ -\beta (H - \mu N) \big]$, where $Z = \mathrm{tr} \exp \big[ -\beta (H - \mu N) \big]$. The linear response theory concerns the expectation value of an observable $B(t, \vec x)$: \[
\langle  B(t, \vec x) \rangle_F - \langle  B(t, \vec x) \rangle= \mathrm{tr} \left[ \big(\varrho_F(t)-\varrho(t)\big) B(t, \vec x) \right] \equiv \int \mathrm dt' \int \mathrm d^3x' \chi_{AB}(t-t',\vec x-\vec x') F(t', \vec x')
\] where $\varrho_F(t)$ is the density matrix in the perturbed system. The Green-Kubo formula asserts that the linear transportation coefficient \[
\chi_{AB}(t-t', \vec x - \vec x') = -i \Theta(t-t') \langle [B(t, \vec x), A(t', \vec x')] \rangle.
\] For example, the conductivity is related to the current-current correlation function*, \[
\sigma_{AB}(t-t', \vec x - \vec x') = -i \Theta(t-t') \langle [\partial_t^{-1}J(t, \vec x), \partial_{t'}^{-1}J(t', \vec x')] \rangle.
\]

*In general, the current is a four-vector hence the response is a tensor. One should consider the contribution from all components, which gives rise to several transport coefficients: the longitudinal conductivity, the density responses as well as the Hall conductivity etc.

Quantum Field Theory (QFT)

The vacuum state of QFT $|\Omega\rangle$ is intrinsically many-body (even the free field theory!). Let's disturb the QFT vacuum and measure the linear response.


Let $J_r(x)$ be a classical source. We couple it to the quantum field $\varphi(x)$ through $\mathscr H_\mathrm{int}(x) = A_r(x) J_r(x)$, where $A$ is a local operator constructed from $\varphi$. Then we measure the vacuum expectation value (VEV) of an observable $B_s(x)$ (To measure this observable, we may, for example, couple the field to a classical field. That's why we often consider the case $A = B$, which we sill talk about later.): $\langle B_s(x) \rangle_J \equiv Z_J^{-1} \langle \Omega | B_s(x) | \Omega \rangle_J$, where $Z_J = \langle \Omega | \Omega\rangle_J$ is the perturbed partition function. In the weak source limit ($J\to 0$), the response should be linear, similar to the condensed matter system: \[
\langle B_s(x) \rangle = \int \mathrm d^4 x' \chi_{sr}^{BA}(x,x') J_r(x') + \mathcal O(J^2).
\] Let $Z$ be the unperturbed partition function. In the weak external field limit $J \to 0$, \[
Z_J = Z \Big(1 - i \int \mathrm d^4x \langle A_r(x) \Omega \rangle J_r(x) - \frac{1}{2} \int \mathrm d^4x\mathrm d^4y \langle \mathcal T\big\{ A_r(x) A_s(y) \big\} \rangle J_r(x) J_s(y) \Big).
\] and $\langle \Omega | B_s(x) | \Omega \rangle_J$ is, \[
 \langle \Omega | B_s(x) | \Omega \rangle_J = \langle \Omega | B_s(x) | \Omega \rangle
  -i \int \mathrm d^4 x' \langle \Omega | \mathcal T \big\{B_s(x)A_r(x')\big\} | \Omega\rangle J_r(x').
\] Then the vacuum expectation value (VEV) \[
\langle B_s(x) \rangle_J
= \langle B_s(x) \rangle + i \int \mathrm d^4x' \langle A_r(x') \rangle \langle B_s(x) \rangle J_r(x') - i \int \mathrm d^4 x' \langle \mathcal T \big\{B_s(x) A_r(x')\big\} \rangle J_r(x').
\] It is convenient to work with "renormalized" operators that have vanishing VEV without the presence of the external source. For example, we may define: $B^R_s(x) = B_s(x) - \langle B_s(x)\rangle$. It is easy to see,\[
\langle B^R_s(x) \rangle_J
=- i \int \mathrm d^4 x' \langle \mathcal T \big\{B^R_s(x) A^R_r(x')\big\} \rangle J_r(x').
\] Therefore, the transport coefficient $\chi(x,x') \propto \langle \mathcal T \big\{B^R_s(x) A^R_r(x')\big\} \rangle$. Note that causality is preserved by time-ordering operator $\mathcal T$.

From now on, we'll assume all the operators have been properly renormalized, unless elsewhere stated.

Example 1: Field Propagation

$A = B = \varphi$, where $\varphi(x)$ is the renormalized field such that $\langle \Omega |\varphi(x)|\Omega\rangle = 0$. $\langle\varphi(x)\rangle_J$ represents the amplitude for finding a physical particle in the disturbed vacuum. \[
\langle\varphi_a(x)\rangle_J =- i \int \mathrm d^4 x' \langle \mathcal T \big\{\varphi_a(x) \varphi_b(x')\big\} \rangle J_b(x'),
\] Here $D_{ab}(x-x') \equiv i\langle \mathcal T \big\{\varphi_a(x) \varphi_b(x')\big\} \rangle$ is nothing but the Feynman propagator. This means sense physically: the classical source $J$ creates a physical particle at $x'$, and then the particle propagate to $x$ to be detected.

Note that we are not doing perturbation theory. The graphical representations are not necessarily Feynman diagrams.

Example 2: Vacuum Polarization

Let $A = B = J^\mu(x)$, where $J^\mu$ is the electromagnetic current. We couple a classical electromagnetic field $\mathcal A_\mu(x) e^{-\epsilon |x^0|}, (\epsilon\to 0^+)$ to the vacuum and measure the current: \[
\langle J^\mu(x) \rangle_{\mathcal A} = -i \int \mathrm d^4x' \left< \mathcal T \big\{J^\mu(x)J^\nu(x') \big\}  \right> \mathcal A_\nu(x') e^{-\epsilon |x'^0|}.
\] The linear transport coefficient is called the polarization tensor: \[
\Pi^{\mu\nu}(x-x') \equiv \left< \mathcal T \big\{J^\mu(x)J^\nu(x') \big\}  \right>.
\] Consider a free Dirac field, \[
\psi(x) = \sum_{s=\pm} \int\frac{\mathrm d^3 k}{(2\pi)^32\omega_p} \Big[ u_s(k) b_s(k) e^{ik\cdot x} + v_s(k) d^\dagger_s(k) e^{-ik\cdot x} \Big].
\] The electromagnetic current is $J^\mu(x) = \bar\psi(x)\gamma^\mu\psi(x)$. Applying Wick theorem, the polarization tensor is \[
\Pi^{\mu\nu}(x-x') = \mathrm{tr} \Big[ \gamma^\mu S(x-x') \gamma^\nu S(x-x')  \Big] - \mathrm{tr} \Big[\gamma^\mu S(x-x)\Big] \mathrm{tr} \Big[\gamma^\nu S(x'-x')\Big].
\]

Now consider a perturbative spinor electrodynamics. Let's denote the free Dirac action as $S_0$, the
interaction as $S_\mathrm{int} = \int\mathrm d^4x \bar\psi(x)\gamma^\mu\psi(x) A_\mu(x)$.

 
Here we are only concerning the linear effect. One may well ask the question of the induced current by a strong classical source field. The problem is called the Schwinger effect. It turns out, in the semi-classical approximation, the partition function is \[
Z_\mathcal{A} \equiv \langle \Omega | \Omega \rangle_\mathcal{A}  \approx e^{iS_\mathrm{eff}} \] Therefore, the pair production probability (or rather the vacuum decay probability) \[
P = 1 - e^{-2\mathrm{Im}S_\mathrm{eff}}. \] Considering only the one-loop effect in a constant E-field, Schwinger calculated the vacuum decay rate (Phys. Rev. 82, 664 (1951)), \[
\frac{\mathrm dN}{\mathrm dV \mathrm dt} = \frac{(eE)^2}{4\pi^3}\sum_{n=1}^\infty \frac{1}{n^2} e^{-n\pi E_c/E} \] where $E_c = \frac{m_e^2c^3}{e\hbar} \sim 10^{18} \text{V/m}$ is a super-duperly strong field! However, it may be found in: a) heavy ion collision; b) magnetar; c) condensed matter emulated QED (e.g., graphene); d) high energy lasers (still a long way to go).

Example 3: Hadron Structure

Let $A = B = J^\mu(x)$, where $J^\mu$ is the electromagnetic current. The linear response can also be used to study a bound state. The key is to "create" and "annihilate" a bound state from the vacuum with the field operator, \[
|\psi(z)\rangle \equiv \lim_{z^0\to-\infty}\psi(z) |\Omega\rangle, \\
\langle\psi(y) | \equiv \lim_{y^0\to+\infty} \langle \Omega | \bar\psi(y).
\] We couple a classical electromagnetic field $\mathcal A_\mu(x) e^{-\epsilon |x^0|}, (\epsilon\to 0^+)$ to a physical particle through the minimal coupling $\mathcal A_\mu J^\mu$. Then we measure the charge distribution: \[
\langle \psi(y) | J^\mu(x) |\psi(z)\rangle_{\mathcal A} = -i \int \mathrm d^4x' \left< \psi(y)\right| \mathcal T \big\{J^\mu(x)J^\nu(x') \big\} \left| \psi(z)\right> \mathcal A_\nu(x') e^{-\epsilon |x'^0|}
\] The particle propagation before and after the experiment is no interest to us. Let's do Fourier transform:\[
\psi(z) |\Omega\rangle = \int \frac{\mathrm d^3 p}{(2\pi)^32\omega_p} \tilde\psi(p) e^{ip\cdot x} |p,\sigma\rangle, \quad (\omega_p = \sqrt{m^2+\mathbf p^2})
\] Therefore, let's study the current distribution of plane wave modes: \[
\langle p',\sigma' | J^\mu(x) | p,\sigma \rangle_{\mathcal A} = -i \int \mathrm d^4x' \langle p',\sigma' | \mathcal T \big\{J^\mu(x)J^\nu(x') \big\} | p,\sigma \rangle \mathcal A_\nu(x') e^{-\epsilon |x'^0|}
\]


Beyond Linear Response

The linear response can be use to formulate the perturbation theory. Let's disturb a scalar field $\varphi(x)$ with a classical source $j$. The new partition function is, \[
\begin{split}
Z_j &= \int \mathcal D_\varphi \, \exp \Big( iS + i\int\mathrm d^4x\, j(x) \varphi(x) \Big)
=\langle\Omega\mid \mathcal T \exp\Big( i\int \mathrm d^4x\, j(x)\varphi(x)\Big) \mid \Omega\rangle\\
&= \sum_{n=0}^\infty \frac{i^n}{n!}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) \langle\Omega\mid\mathcal T\varphi(x_1)\varphi(x_2)\cdots \varphi(x_n)\mid\Omega\rangle \\
&= \sum_{n=0}^\infty \frac{i^n}{n!}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) G(x_1, x_2, \cdots, x_n)
\end{split}
\] where $G(x_1,x_2,\cdots,x_n) = \langle\Omega\mid\mathcal T\varphi(x_1)\varphi(x_2)\cdots \varphi(x_n)\mid\Omega\rangle$ is the $n$-point correlation function (aka. $n$-point causal correlator). The translational symmetry implies $G(x_1,\cdots,x_n) = G(x_1-a,\cdots,x_n-a)$. In the momentum space, \[
G(x_1, \cdots, x_n) = \int\frac{\mathrm d^4p_1}{(2\pi)^4} \cdots \frac{\mathrm d^4p_n}{(2\pi)^4} \exp\Big( ip_1\cdot x_1+\cdots + ip_n\cdot x_n\Big) G(p_1,\cdots,p_n)
\] The translational symmetry implies $G(p_1,\cdots,p_n) = (2\pi)^4\delta^4(p_1+\cdots+p_n)\tilde G(p_1,\cdots,p_n)$.

The correlators $G(x_1,x_2,\cdots,x_n)$ can be represented by the sum of graphs subject to $n$ external legs, $G(x_1,\cdots,x_n) = \sum_{g} D_g(x_1,\cdots, x_n)$, known as the Feynman diagrams.
A diagrammatic representation of a n-point causal correlator

Next, it is convenient to work with the irreducible diagrams (represented by connected graphs), $C$, and the corresponding correlator $G_c(x_1,x_2,\cdots,x_n)$. Apparently, any diagram $D_g$ with topology $g$ can be written as a product of its connected components, \[
\frac{1}{n_g!} \frac{i^{n_g}}{|\mathrm{Aut}\,g|}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_{n_g}\, j(x_1) \cdots j(x_{n_g}) D_g(x_1, \cdots, x_{n_g}) \\
=  \prod_{s\in S_g} \frac{1}{m_s!} \bigg( \frac{1}{n_s!}\frac{i^{n_s}}{|\mathrm{Aut}\, s|}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_{n_g}\, j(x_1) \cdots j(x_{n_g}) D_s \bigg)^{m_s}
\] where $S_g$ be the set of connected subgraphs of graph $g$, $m_s$ is the multiplicity of $s\in S_g$ in $g$, that is, $g$ consists of $m_s$ copies of $s$; $n_s$ evaluates the number of external legs in graph $s$ and $n_g = \sum_{s\in S_g} m_s n_s$. The factor $\frac{1}{n_s!}$ came from the multinomial coefficients ${n_g \choose n_{s_1}, n_{s_2}, \cdots}}$, which represents the number of ways to split the $n_g$ external sources.

Then the disturbed partition function $Z_j$, \[
\begin{split}
Z_j &= \sum_{n=0}^\infty \frac{i^n}{n!}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) G(x_1, \cdots, x_n) \\
&= \sum_{n=0}^\infty \frac{i^n}{n!} \int\mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) \sum_{g, n_g=n} \frac{1}{|\mathrm {Aut}\,g| } D_g(x_1, \cdots, x_n) \\
&= \sum_{g} \prod_{s\in S_g} \frac{1}{m_s!} \bigg(\frac{1}{n_s!}\frac{i^{n_s}}{|\mathrm{Aut}\, s|} \int\mathrm d^4x_1\,\cdots\mathrm d^4x_{n_s}\, j(x_1) \cdots j(x_{n_s}) D_s \bigg)^{m_s} \\
\end{split}
\] In the last line, $s \sim \frac{1}{n_s!}\frac{i^{n_s}}{|\mathrm{Aut}\, s|} \int\mathrm d^4x_1\,\cdots\mathrm d^4x_{n_s}\, j(x_1) \cdots j(x_{n_s}) D_s$ is the expression for connect graph $s$. Compare the last line with multinomial expansion $(x_1+x_2+\cdots + x_k)^n = \sum_{m_1,m_2,\cdots,m_k} {n \choose m_1,m_2,\cdots,m_k} x_1^{m_1}x_2^{m_2}\cdots x_k^{m_k}$. Therefore, we can change the order of summation and multiplication and \[
Z_j = \exp \sum_{n=0}^\infty \frac{i^n}{n!} \int \mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) \sum_{g\in C, n_c=n} D_c(x_1,x_2,\cdots,x_n).
\] Here $C$ is the collection of all connected Feynman diagrams. This result is the linked cluster theorem, which relates the free energy with the connect diagrams, \[
iW_j = \ln Z_j = \sum_{n=0}^\infty \frac{i^n}{n!}\int\mathrm d^4x_1\,\cdots\mathrm d^4x_n\, j(x_1) \cdots j(x_n) G_c(x_1, x_2, \cdots, x_n).
\] This theorem can be proven more rigorously from induction, by defining the connected correlator $G_c(x_1, \cdots, x_n) = G(x_1, \cdots, x_n) - \sum_{P}\prod_{p\in P} G_c(\{x\}_p)$.

Consider the vacuum expectation value (VEV) of the field $\varphi(x)$, \[
\langle \varphi(x) \rangle_j = Z_j^{-1} \frac{1}{i}\frac{\delta}{\delta j(x)} Z_j
= \frac{\delta}{\delta j(x)} W_j
\] Let's do a Legendre transformation to introduce a new quantity (the minus sign is the convention): \[
-\Gamma_j = \int \mathrm d^4x\, \delta/\delta j(x) W_j \cdot j(x) - W_j
\]

Oct 23, 2014

Cosmology and the First Law of Thermal Dynamics


According to the standard cosmology, our universe is expanding. During the process, the content of the Universe is also changing. For example, about 13.7 billion years ago, the Universe was dominated by dark matter; however, today it is the dark energy that rules the sky. According to the standard cosmological model, Lambda-CMD, the energy density of the dark energy $\rho_\Lambda = \frac{\Lambda}{8\pi G}$ ($\Lambda$ is Einstein's cosmological constant) does not change during the expansion of the Universe. That is why the dark energy is increasing as the Universe becomes larger. There is no known  interaction between the dark energy and the ordinary matter (including dark matter) except for gravity - which according to the general relativity is merely the curvature of the Universe. There is no exchange of energy. So where does this large amount of energy come from? Does the first law of thermal dynamics (the law of energy conservation) still hold? Given the "dark" nature of the dark energy, one may turn to consider physical matters, such as photons - the cosmic microwave background radiation (CMB). As the Universe expands, the photons are redshifted, i.e., their frequencies $\nu$ become smaller $\nu/(1+z), z>0$. Since the Universe is extremely empty, the loss of photons due to reaction is negligible. Therefore, the total energy of the radiation $E = N_\gamma h \nu$ would decrease as the Universe expands. So, again, where does the energy go?

Fig. 1, Left: an artist's impression of the chronology of the Universe. Right: the contents of the Universe according to WMAP project. Credit: NASA / WMAP Science Team
The answer is in the Friedmann's equations themselves. Recall the Robertson-Walker metric \[
\mathrm ds^2 = \mathrm dt^2 - a^2(t) \left( \frac{\mathrm dr^2}{1-k r^2} + r^2 \mathrm d\theta^2 + r^2 \sin^2\theta\mathrm d\phi^2 \right) \] where, $k$ represent the curvature of the space, and $a(t)$ is the scale factor that satisfies the Friedmann's equations state, \[
\begin{split}
\left(\frac{\dot a}{a} \right)^2 &= \frac{8\pi G}{3} \rho + \frac{\Lambda}{3} - \frac{k}{a^2} \\
\frac{\ddot a}{a} &= -\frac{4\pi G}{3} (\rho + 3p) + \frac{\Lambda}{3}
\end{split}
\] Take the derivative on the first equation and substitute the second one into it and then we get, \[
\dot \rho = -3\frac{\dot a}{a} (\rho + p ) \Rightarrow \mathrm d\rho = -3\frac{\mathrm da}{a}(\rho + p).
\] The volume of the Universe expands as $V = V_0 a^3$ or $\frac{\mathrm dV}{V} = 3 \frac{\mathrm da}{a}$. Here $\rho$ is the energy density of the matters (baryon matter, radiation and dark matter etc). Then the internal energy is $U = \rho V$ or $\mathrm dU = \mathrm d\rho V + \rho \mathrm dV$. Plug in the expression for $\mathrm dV$ and $\mathrm d\rho$. We arrive, \[
\mathrm dU + p \mathrm dV = 0! \] This is precisely the first law of thermal dynamics (for matters) for an adiabatic expansion. Therefore, the loss of radiation energy is due to the work of the CMB ratiation done to the Universe, which makes the Universe expands.

Now, what about the dark energy? Where does the dark energy come from? Simply consider the internal energy of the dark energy: $\mathrm d U_\Lambda = \rho_\Lambda \mathrm d V$. This is because the energy density of the dark energy is constant. Add the work term, $\mathrm dU_\Lambda + p_\Lambda \mathrm dV= (\rho_\Lambda + p_\Lambda) \mathrm dV$. Now we need a relation that relates the energy density and the pressure, which is called the equation of state (EoS). To obtain the EoS for dark energy, we can look at the second Friedmann's equation. If the cosmological constant is treated as some energy, the right-hand side of the equation should be written as \[
\frac{\ddot a}{a} = -\frac{4\pi G}{3} (\rho + 3p) -\frac{4\pi G}{3} (\rho_\Lambda + 3p_\Lambda)
\] Namely, $4\pi G (\rho_\Lambda + 3p_\Lambda) = -\Lambda$. But $\rho_\Lambda = \frac{\Lambda}{8\pi G}$. Therefore, $p_\Lambda = - \rho_\Lambda$. This is the EoS for the dark energy, which says the dark energy has negative pressure! Turn back to the thermal dynamical equation, clearly $\mathrm dU_\Lambda + p_\Lambda \mathrm dV = 0$. Again, the dark energy also obeys the first law of thermal dynamics. It turns out the dark energy was created from the negative pressure and the work done by the Universe.



Sep 13, 2014

Universe in the Size-Mass Diagram

Introduction

The fundamental goal of science is to describe the material world. Our knowledge of the natural world can be categorized in different ways. The conventional wisdom is division of fields, such as physics, chemistry, biology etc. Fields of subject is not the only logical way to organize our knowledge of the Universe. Chronology, for example, presents the topic in a unified theme, time. The Universe was created from a quantum singularity called the Big Bang. As time went on, fundamental forces and matters were created and formed stars, planets, galaxies and clusters. Since the birth the Earth, the focus can be shifts to the geological history and then biological history of the Earth. At last, the latest chapter is the history of humanity.

Fig. 1, The Chronology of the Universe.

Complementing to time, the structure of matter is another unified theme to understand our Universe. In this description, physics, again, plays a particular role, as it offers insights to most natural entities including the smallest (subatomic particles) and the largest (cosmos itself). Traditionally, the study of physics can be divided into two paradigms: the first one concerns the fundamental laws of the Universe; the second one concerns the evolution and, particularly, the structure of physical systems governed by the law of physics. The fundamental rules of the nature, to our best knowledge today, are summarized to the standard model (SM) of particle physics and Einstein's general theory of relativity (GR). As it turns out, the SM recipe has to include what we call the elementary particles such as electron, quark and photons, whose structure are completely unknown. Then, in the study of matter structures, the fundamental laws of physics (including the elementary particles) are taken as the first principle.

Fig. 2, The Recipe of the Microscopic World - the Standard Model of Particle Physics. LEFT: the prescription, RIGHT: the ingredients.

Size-Mass Diagram

Our universe has been an extremely diverse "ecosystem". Matters come in vastly different scales in terms of either size or mass or any other characteristics. To provide a schematic view of the structures of matters, we need some universal physical characteristics. Mass is the first on the list. Einstein's special theory of relativity unified mass and energy (related by $E = mc^2$). Therefore, our "mass" would naturally includes "energy". For the second property, we chose the characteristic size of the system. Every matter has a finite spatial extension, except for elementary particles, which are the first principle by definition. Moreover, the size of a physical system is usually closely related to the structure. Knowing the mass and the size, the structure of the system is roughly known. But the size-mass diagram (see Fig. 3) can reveal more than the mass and size hierarchies themselves. Matters share the same structural mechanism tend to align in a line.

Fig. 3, Size-Mass Diagram.

Elements

The ancients thought the world was made of five elements. Not until the modern era, it was realized that there are hundreds of elements, making of most matters we are familiar with. Matters from elements lie between the cyan line (gas) and the brown line (condensed matter - such as liquids and solids) in the diagram. Hydrogen atom is the smallest and best known atom. Here is also where most of the science delve. Physics focus on systems in the scale of single atom and below. Molecules constitute the realm of chemistry. Carbon based large molecules (such as proteins) lay the foundation of biology (or so-called life science). Biologists concerns a vast domain from virus to human being. Human community is also the subject of research - the so-called social sciences - arguably the most renowned ones, which anthropology, sociology, economy, linguistics and political sciences. The largest coherent community is our world, which includes 70 billion homo sapiens populating the land surface of the Globe. Our engineering, extends more vast scales than ourselves. Transistors in newest commercial silicon chips are only 14 nm in size, representing the state-of-the-art micro-fabrication. One of the largest man-made projects, the USS Nimitz aircraft carrier is of 333 meters, 100 million kilograms. Anything greater than humanity was considered deity. Large amount of matter, mountains and lakes for example, are the subject of geology. Moving up the scale, large objects are usually in the sky. Moon is the satellite of our planet. Mars, Venus and others are peer planets. Sun is a star. Gravity becomes important in the structure.

Greatness and Beyond

Heaven has its own hierarchy. Moon circles around the Earth. Earth and other planets circle around the Sun. This is the area of astronomy and astrophysics. Not until Issac Newton, the hidden force behind stars and planets are known, the gravity. The gravity of Sun pulls Earth, Mars, Venus and other planets around it and makes a planetary system. Stars are also bound by gravity. Gazillions of stars form a galaxy. Galaxies make up groups, clusters and then superclusters. Superclusters connects into filaments and great walls. These structures extends vast distance in the vacuum. Even light, the record-holder of speed, takes millions of year to travel across them. The large scale structure ends at about 100 Mpc or $3\times 10^{24}$ meters. This is known as the end of greatness. Above 100 Mpc, the universe looks pretty smooth and isotropic. Then there rises the study of the Universe itself, cosmology.

But stars and galaxies may not the whole story of the Universe. Study of the galaxy motion and other evidence suggests the Universe has its dark side.

A New Front

The 20th century witness the discovery of a new front in the subatomic scale. In the microscopic realm, objects (microscopic objects are usually called particles) start getting peculiar as we move close to the line of Compton wavelength (green line). The Compton wavelength is define as $\lambda_c = \frac{h}{mc}$, where $m$ is the rest mass, $h$ and $c$ are two constants - Planck constant and the speed of light (Why don't we call it Einstein's constant or Maxwell's constant or Michelson's constant?). Compton wavelength is a universal limit on the measurement of the position of a particle. Due to quantum mechanics, the position of a particle around and below the Compton wavelength scale becomes fuzzy. This phenomenon is called "zitterbewegung", German for "trembling motion". Therefore, if the size of a particle is near or smaller than its Compton wavelength, it has to be described by quantum mechanics. If the (special) relativistic effect is also prominent, quantum field theory applies. Microscopic particles, from the hydrogen atom to the family of hadrons, all reside in this realm. Particularly, the elementary particles, who have been treated as point-like particles, also obey quantum mechanics and Einstein's special theory of relativity. In fact, the entire playbook of these particles, the standard model of particle physics is a manifestation of the quantum field theory.

To study the structure of a microscopic system, the probe has to be smaller than its Compton wavelength. The experimental probes used in these fields are high-energy beams (including light - from laser to X-Ray and $\gamma$-Ray etc al.). Physicists like labeling their probes by their energy $E$ instead of their wavelength $\lambda$. The particle of the beams are highly relativistic such that $\lambda$ and $E$ obey the de Broglie relation: $\lambda = \frac{hc}{E}$, which is similar to the Compton wavelength as the rest energy $E_0 = m c^2$. Therefore, to probe proton ($m_p \simeq 1 \mathrm{GeV}/c^2$), it is necessary to use  > GeV beams. The German probe HERA, for example, used 27.5GeV electron beams. These probes need so much energy that billion-dollar facilities are built to accelerate the probing particle beams - that's why they are called accelerators.

Fig. 4, Evolution of Large Colliders. from: Jordan Nash, "Current and Future Developments in Accelerator Facilities".
The standard model describes physics around 100 GeV or $10^{-17}$ meters. LHC, the largest collider ever built, with maximum energy 8 TeV (or 8000 GeV - to present day - it will be upgraded to 14 TeV), is dedicated to study the standard model and beyond. Physics beyond TeV scale is only subjected to speculation.

It is believed that the strong and electroweak (electromagnetic and weak force are unified at > 246 GeV scale, known as the electroweak scale) forces are unified at $10^{16}$ GeV, the GUT (Grand Unification Theory ) scale. As the Compton wavelength decreases, the mass of the particle increases. Both factors increase the local gravitational field $f \sim \frac{GM}{r^2}$. At the length scales we know today, the local gravitation field of a microscopic particle is negligible. However, at about ~$10^{19}$ GeV scale (or $10^{-35}$ meters, known as the Planck scale), the local gravity field of the particles would be strong enough to challenge our current understanding of the law of physics. In fact, the space and time below the Planck scale is so twisted that we are not sure if it is still meaningful to talk about size.

The Horizon

The gray line describes black holes. This line is defined by the event horizon (Schwartzschild radius for most objects) $r_s = \frac{2G M}{c^2}$. Inside the event horizon, the gravity is so strong that even light can not get out. Therefore, no information inside the event horizon can be revealed to the outside observers, as its name suggests. It is believed that inside the black hole nothing can stop matter from forever collapsing towards the center. Einstein's general theory of relativity forbids us even to across the line of black holes. Along this line, there lives many monsters of the Universe (or rather black hole candidates). Cygnus X-1 is one of the most famous black hole. Here X stands for X-Ray. Like other compact objects, stellar black holes are usually powerful X-Ray sources. This is because objects are heated up (due to the loss of gravitational energy) as they falling into compact stars. The list of black holes extends from the extremely compact stellar objects to the less compact yet super-massive black holes sitting in the center of galaxies. At one end of the line, there lies the observable universe! It suggests that we may be living inside a super-duper black hole. The line intersects with the Compton wavelength line at the other end. It has been suggested that general relativity has to be modified to accommodate the quantum effect. The hypothetical particle sitting at the intersection is called a Planck particle, which is also a microscopic black hole.

White dwarfs and neutron stars are other examples of the compact objects. In a white dwarf, gravity is balanced by the degenerate press from the electron. If the mass of a star exceeds 1.456 solar mass, known as the Chandrasekhar limit, the star would collapse into a neutron star. Neutron stars are made of nuclear matters. The density of neutron stars are similar to that of nuclei, the microscopic members of the nuclear matter family. In fact, neutron stars can be viewed as gigantic nuclei. A lot of their properties can be estimated (or rather extrapolated) from nuclei. On the other hand, neutron stars do show strong local gravity. This makes them look like black holes.



Aug 2, 2014

$\beta$ and $\gamma^0$

$\gamma^0$ with the other three gamma matices $\vec\gamma$, furnishes a representation of the Clifford algebra $\mathrm{C\ell}_{1,3}(\mathbb R)$ \[\{ \gamma^\mu, \gamma^\nu \} = 2 g^{\mu\nu}, \] where $\{X, Y\} = XY + YX$, $g^{\mu\nu}$ is the Minkowski space metric.

The parity matrix $\beta$ is a spinor representation of the parity operator: \[
P^{-1} \Psi(x) P = \beta \Psi(\mathcal P\cdot x)
\] where $\mathcal P^\mu_\nu = \mathrm{diag}\{1, -1, -1, -1\}$ is the 4-vector representation of the parity operator. The operator $P$ here, of course, furnishes a field representation, with the help of $\beta$ and $\mathcal P$. Then $\beta$ satisfies the parity relations: $\beta^2 = 1$, $\beta^{-1} \vec\gamma \beta = -\vec\gamma$, $\beta^{-1} \gamma^0 \beta = \gamma^0$.

It is popular to take $\beta = \gamma^0$. But this may not always be the correct one. The reason is the metric tensor g. There are two popular sign conventions of $g^{\mu\nu}$: $\mathrm{diag}\{+1, -1, -1, -1\}$ and $\mathrm{diag}\{-1, +1, +1, +1\}$. Under the first convention, $\gamma^0 \gamma^0 = 1$. $\beta$ is readily to be chosen as $\gamma^0$. It is easy to check all the parity relations are satisfied. Under the second conventions, however, $\gamma^0 \gamma^0 =-1$, which means $\beta$ should be $\pm i\gamma^0$.

Of course, all unitary representations in quantum theory is only determined up to an over all phase factor: \[
P^{-1} \Psi(x) P = \exp(i\eta) \beta \Psi(\mathcal P\cdot x), \quad (\eta \in \mathbb R)
\] It is possible to choose phase factors such that $\beta = \gamma^0$ always holds. For example, for $g_{00} = +1$, we choose $\exp(i\eta) = 1$; for $g_{00}=-1$, we choose $\exp(i\eta) = i$. This is essentially what is done in the literature, such as Peskin & Schroeder, Mark Srednicki etc. But keep in mind that, the phase factors for T, C, and P are not all independent. It is important to be self-consistent at the end of the day. Weinberg's book keeps the phase factors open.

Jul 25, 2014

Charge Symmetry

Classical Example


Under the charge conjugation (C for short), charge $q \to -q$, electric field $\mathbf E \to - \mathbf E$, but the solution of the dynamical equation, the position vector, remains invariant $\mathbf r(t) \to \mathbf r(t)$.

In the theories particle and antiparticle do not mix (e.g. Schroedinger's Equation, Pauli's Equation), charge symmetry is almost trivial.

Dirac Theory

Dirac equation, \[
(i\partial_\mu \gamma^\mu - eA_\mu\gamma^\mu + m)\psi(x) = 0
\] describes the relativistic motion of electrons as well as positrons. $e = 1.602176565(35)\times 10^{-19} \mathrm{C}$ is just a positive number, so-called elementary charge. In general, the solution $\psi(x)$ contains both the electron state and the positron state. Here we take $g_{\mu\nu} = \mathrm{diag}\{-,+,+,+\}$.

In free particle theory where $A = 0$, there are four independent plane wave solutions: two positive energy solutions describe electrons $u_s(p) e^{+ip\cdot x}, \; s=\pm\frac{1}{2}$ and two negative energy solutions describe the positrons $v_s(p) e^{-ip\cdot x},\; s=\pm\frac{1}{2}$, $E_{\mathbf p} = \sqrt{\mathbf p^2+m^2}$. A general solution (a wave packet) is a superposition of the two pieces: $\psi(x) \equiv \psi^+(x) + \psi^-(x)$ where \[
\psi^+(x) = \sum_{s=\pm}\int \frac{\mathrm{d}^3p}{(2\pi)^32E_{\mathbf p}} b_s(\mathbf p) u_s(p) e^{+ip\cdot x}, \\
\psi^-(x) = \sum_{s=\pm}\int \frac{\mathrm{d}^3p}{(2\pi)^32E_{\mathbf p}} d^*_s(\mathbf p) v_s(p) e^{-ip\cdot x}. \] $b, d^*$ are some c-number smooth functions.

As a relativistic wave function approach, the charge conjugation would be implemented as a "unitary" spinor matrix: $C \bar C \triangleq C (\beta C^\dagger \beta) \equiv 1$. This matrix should transform a plane wave electron to a plane wave positron or vice versa, i.e. $C u_s(p)e^{+ip\cdot x} \sim v_s(p) e^{-ip\cdot x}$. We can immediately see that this is not possible if the charge conjugation spares the exponential function. We conclude that in Dirac theory, the charge conjugation must be implemented as an anti-unitary spinor operator. Thus, we require \[
C \left( u_s(p) e^{+ip\cdot x}\right) =\eta_c v^*_s(p) e^{-ip\cdot x} \\
C \left( v_s(p) e^{-ip\cdot x} \right) = \xi_c u^*_s(p) e^{+ip\cdot x} \] where $|\eta_c| = |\xi_c| = 1$. For simplicity, we'll take these constant phases to unity. The charge conjugated field is denoted as, \[
C\psi(x) \equiv \psi^c(x) = \sum_{s=\pm}\int \frac{\mathrm{d}^3p}{(2\pi)^32E_{\mathbf p}} \Big[ b^*_s(\mathbf p) v^*_s(p) e^{-ip\cdot x} + d_s(\mathbf p) u^*_s(p) e^{+ip\cdot x} \Big]
\]
It is conventional to define the unitary part of $C$ by a new spinor matrix $\mathcal C$ (curly C),
\[ C \psi(x) \triangleq \mathcal C \beta \psi^*(x) \equiv \mathcal C \bar\psi^T(x) \] (Here adding a $\beta$ is just a convention.). It is easy to see, $\mathcal C$ is unitary $\mathcal{C} \bar{\mathcal{C}}= 1$. An example of the choice of $\mathcal C$ is (Srednicki p. 242, 2007), \[
\mathcal C =
\begin{pmatrix}
0 & -1 & 0 & 0 \\
+1 & 0 & 0 & 0 \\
0 & 0 & 0 & +1 \\
0 & 0 & -1 & 0
\end{pmatrix}
\] The theory must be invariant under the charge conjugation, or $\mathcal L \overset{C}{\to} \mathcal L$, which implies $\mathcal C \gamma_\mu = - \gamma_\mu^T \mathcal C$. Applying to the $u,v$ spinors, $\mathcal C \bar u^T_s(p) = v_s(p), \mathcal C \bar v^T_s(p) = u_s(p)$.

Heuristically, the transformation can be viewed as a two-step procedure:
  1. swap the particle species, by exchanging $u_s(p)$ and $v_s(p)$ (and of course also the sign of the charge);
  2. reverse the time, by conjugating the exponential factor. 

As stated before, the E&M field transforms under the charge conjugation as $A^\mu \overset{C}{\to} -A^\mu$. Then the charge conjugated of the Dirac wave equation is: \[
(i\partial_\mu \gamma^\mu {\color \red +} eA_\mu\gamma^\mu + m){\color \red {\psi^c(x)}} = 0
\] Note the sign change. It is easy to check that if $\psi(x)$ satisfies the original Dirac wave equation,
$\psi^c(x)$ satisfies this equation, which just confirms that the Dirac theory is invariant under the charge conjugation.

To compare with the classical charge conjugation, the dynamical equation (the equation of motion) is still invariant under the charge conjugation. But the solution of it would change under the charge conjugation, $\psi(x) \overset{C}{\to} \psi^c(x)$.

Quantum field theory

In quantum field theory, charge conjugation is implemented as a unitary operator, which we shall call $C$ (not be confused with the anti-unitary spinor operator introduced in the previous section). The definition is simple (We have also taken a particular choice of a possible phase factor. See Weinberg, 2005): \[
C^{-1} b_s(p) C = d_s(p), \quad C^{-1} d_s(p) C = b_s(p), \quad C^{-1} a_\lambda(p) C = - a_\lambda(p), \] where $b_s(p), d_s(p), a_\lambda(p)$ are the electron, positron and photon annihilation operators, respectively.

For the free Dirac field, \[

\psi(x) = \sum_{s=\pm}\int \frac{\mathrm{d}^3p}{(2\pi)^32E_{\mathbf p}} \Big[ b_s(\mathbf p) u_s(p) e^{+ip\cdot x} + d^\dagger_s(\mathbf p) v_s(p) e^{-ip\cdot x} \Big].
\] It can be shown, after some algebra (e.g., Srenicki p.225) that for the quantum fields, \[

C^{-1} \psi(x) C = \mathcal C \bar\psi^T(x) \equiv  \psi^c(x), \quad C^{-1} A^\mu(x) C = - A^\mu(x), \quad C^{-1}\varphi(x)C = \varphi^*(x). \] Then, it can be shown immediately that the Lagrangian is invariant under the charge conjugation.

Alot of notations are abused, although they are different quantities (quantum field theoretical vs. Dirac relativistic wavefunctional). Superficially, this resembles the Dirac equation result introduced in the previous section, although we should point out that here $\psi(x)$ is a field operator instead of a relativistic wave function. But this similarity means that we can do it one way or another. The same answer would be obtained. That's also the reason of the heavy abuse of notations.


To compare with the classical case and the Dirac case, the solution of the dynamical equation changes, $\psi(x) \overset{C}{\to} \psi^c(x)$. Notation-wise, this is similar to the Dirac case. See Table 1.
Table 1




Figure 1. $\langle p s \left| J^\mu(x) \right| p' s'\rangle \equiv e^{-i(p-p')\cdot x} \bar u_s(p)e \Gamma^\mu u_{s'}(p')$




A theory invariant under charge conjugation does not automatically implies a symmetric solution. The symmetry could be broken by either explicit theory (e.g. weak interaction, $\theta$-term etc.), or spontaneous symmetry breaking. In general, the charge distribution, the matrix element of the charge current operator between two physical states can be written as, \[
\langle p s \left| J^\mu(x) \right| p' s' \rangle \equiv e \bar u_s(p) \Gamma^\mu u_{s'}(p') \exp[-i(p-p')\cdot x].
\] Here $\Gamma^\mu = \Gamma^\mu(p-p')$ is some spinor operator. $\Gamma^\mu$ can be represented by the spinor basis, \[ \Gamma^\mu(q) = F_1(q^2) \gamma^\mu + F_2(q^2) \frac{i}{2m_e} \sigma^{\mu\nu} q_\nu + F_3(q^2) \frac{1}{2m_e}\gamma_5 \sigma^{\mu\nu} q_\nu,
\] here $q = p-p'$, $\sigma^{\mu\nu} = \frac{i}{2} [ \gamma^\mu, \gamma^\nu ]$. Under the charge conjugation,
$\gamma^\mu \overset{C}{\to} \mathcal C^{-1}(\gamma^\mu)^T \mathcal C = -\gamma^\mu$;
$\sigma^{\mu\nu} \overset{C}{\to} \mathcal C^{-1}(\sigma{^\mu\nu})^T \mathcal C = -\sigma^{\mu\nu}$;
$\gamma_5\sigma^{\mu\nu} \overset{C}{\to}\mathcal C^{-1}(\gamma_5\sigma{^\mu\nu})^T \mathcal C = -\gamma_5\sigma^{\mu\nu}$.
    The first term corresponds to charge with the charge normalization $F_1(0) = 1$. The second term represents the magnetic moment. $F_2(0) = \frac{\alpha}{2\pi} + \mathcal O(\alpha^2)$. The third term represents an electric dipole moment $d_e = \left|\frac{1}{2m_e}F_3(0)\right| < 10^{-27} e\cdot$ cm.


    Jul 9, 2014

    The Gravity Train Revisited

    Introduction 

    The gravity train is a conceptual of design of transportation making use of the gravity of the planet, first proposed by 17th century scientist Robert Hooke in a letter to Issac Newton. It features a tunnel through the Earth.

    Suppose a straight tunnel was drill through the Earth. A train operating solely on gravity starts from one end of the tunnel. Assuming there is no friction inside the tunnel, and the density of the Earth is uniform. The problem asks how long it takes for the train to arrive at the other end.

    In the classic solution, the spin (rotation) of the Earth is not considered. (Strictly speaking, the rotation effect can be factorized in by using a locally measured gravitational acceleration $g$). The answer then is $t = \pi \sqrt{\frac{R_E}{g}} \simeq 42 \mathrm{min.}$, regardless of the exact location of the tunnel, where $R_E \simeq 6371 \mathrm{km}$ is the radius of the Earth, $g \simeq 9.8 \mathrm{m/s^2}$ is the gravitational acceleration. The rotation of the planet introduces a second time scale: $T_s \simeq 24 \mathrm{hr} \simeq 17 T_G$, where  $T_G = 2\pi \sqrt{\frac{R_E}{g}}$. The separation of the scales validates the neglecting of the Earth spin in the classical solution. Nevertheless, in this essay we want to look at the correction effect introduced by the spin of the Earth.

    Fig. 1

    General Settings

    Let the center of the Earth be the origin of our coordinate system. The tunnel is characterized by the position of the center of the tunnel $\vec d$, and the orientation of the tunnel $\hat \tau$, $\vec d \cdot \hat \tau = 0$. Let $\vec{r} = \vec{d} + x \hat{\tau}$ be the position of the train, where $x$ is the distance of the train from the center of the tunnel, $\vec{v} = \dot x\hat\tau$ the velocity of the train. The spin of the Earth is described by an angular velocity vector $\vec\Omega$. Suppose $\Omega \cos \theta \equiv \vec \Omega \cdot \hat \tau$, $\Omega d \cos\phi = \vec\Omega \cdot \vec d$.

    In the reference frame of the Earth (a non-inertial frame), there are four forces acting on the train: the gravity $\vec F_G = -\frac{GM_Em}{R^3_E} \vec r \equiv -mg \frac{\vec r}{R_E}$, the centrifugal force $\vec F_e = - m \vec\Omega \times (\vec\Omega \times \vec r)$, the Coriolis force $\vec F_c = 2 m \vec v \times \vec \Omega$, and the contact force. Since there is no friction, the contact force simply counter all forces in the direction that perpendicular to the tunnel. We only need to consider the force along the tunnel. $\hat \tau \cdot \vec F_G = -m g \frac{x}{R_E}$, $\hat \tau \cdot \vec F_e = m \Omega^2 x - m \Omega^2 (d \cos\phi + x \cos \theta)$, $\hat \tau \cdot \vec F_c = 0$. The total force along the tunnel is $F = -mg\frac{x}{R_E} + m \Omega^2 ( (1-\cos\theta)x - d\cos\phi)$.

    The equation of motion thus reads, \[
    \ddot x = \left( -\frac{g}{R_E} + \Omega^2 (1-\cos\theta) \right) x - \Omega^2 d \cos\phi
    \] The solution is a harmonic oscillator with angular frequency $\omega = \frac{g}{R_E} - \Omega^2 (1-\cos\theta)$ or period $T = 2\pi \sqrt{\frac{R_E}{g - \Omega^2R_E(1-\cos\theta)}}$. The center of the harmonic oscillator is not in the middle of the tunnel. In stead, it is shifted away from there by $\Delta = \frac{\Omega^2dR_E\cos\phi}{g - \Omega^2 R_E(1-\cos\theta)}$.

    To calculate the travel time, we need to consider two cases. If the train starts from the high altitude side, it arrives at the low altitude terminal in time $t = \pi \sqrt{\frac{R_E}{g - \Omega^2R_E(1-\cos\theta)}}\left(1 - \frac{1}{\pi}\arccos\frac{\sqrt{R_E^2-d^2}-\Delta}{\sqrt{R_E^2-d^2}+\Delta} \right)$. If the train starts from low altitude terminal (farther from the axis of the Earth) with zero velocity, it would never reach the high altitude side. Some minimum velocity is needed in order to arrive the other side, $v_{\min} = 2 \Omega \sqrt{\left( g - \Omega^2R_E (1-\cos\theta) \right) d\cos\phi} \sqrt[4]{1-\frac{d^2}{R^2_E}}$. by doing so, it also takes time $t$ to arrive at the destination.

    Discussions

    The spin of the Earth affects the problem in two ways. The correction in the period of the oscillator is about $\sim \frac{\Omega^2 R_E}{g}\sin^2\frac{\theta}{2} = 0.345\% \sin^2\frac{\theta}{2}$. The second correction is through the displacement of the center of the harmonic oscillator. We compare $\Delta$ and $l = \sqrt{R_E^2 - d^2}$ half the length of the tunnel. $\Delta \simeq \frac{\Omega^2 R_E}{g} \cos\phi d = 0.345371\% \cos\phi \sqrt{R_E^2 - l^2}$. Compare $\Delta_{\max}$ (with $\phi = 0$) and $l$ (see Fig. 2). The correction from $\Delta$ is only small for tunnels either of thousands of kilometers long or well oriented (i.e. choosing a good $\phi$). Thousands of kilometers long tunnel would be for transcontinental express.

    Fig. 2, the numbers for the tunnel length $l$ and maximum shift of the harmonic oscillator center $\Delta_{\max}$ are in kilometers.

    We have only considered the straight tunnels. It turns out, straight tunnels are not the fastest ones for gravity train. The fastest tunnels are the brachistochrone tunnels. The correction due to the Earth rotation can be considered accordingly.

    Jun 23, 2014

    Zeros of the Laguerre Polynomial

    The zeros of the Laguerre polynomials $L_n(x)$ are useful in the Gauss-Laguerre quadrature: \[
    \int_0^\infty \exp(-x) \; f(x) = \sum_{k=1}^n w_k f(x_k) + \frac{n!^2}{(2n)!} f^{(2n)}(\xi)
    \] where $x_k$ is the $k$-th root of the Laguerre polynomial, $0 < x_1 < x_2 < \cdots < x_n$ and \[
    w_k = \frac{x_k}{(n+1)^2 \left[ L_{n+1}(x) \right]^2}.
    \] The figure below is a illustration.

    Fig. 1, A illustration of the Gauss-Laguerre quadrature. The widths of the boxes equal the quadrature weights. The area of the boxes approximates the area of the function.

    The general strategy to compute the zeros of a function numerically is to get a decent analytical estimation and then improve it with Newton iteration or similar methods.

    The Laguerre polynomial $L_n(x)$ has $n$ real non-degenerate zeros. These zeros lies within $\frac{1.20241}{2n+1} < x_1 < \cdots < x_n < 2n-2+\sqrt{1+4(n-1)^2 \cos^2\frac{\pi}{n+1}}$ according to Ismail and Li (1992) [1]. Individual zero can also be bounded $ \frac{j^2_{0,k}}{4n+2} < x_k < \frac{2k+1}{2n+1}(2k+1+\sqrt{(2k+1)^2+\frac{1}{4}})$, where $j_{n,m}$ is the $m$-th zero of the Bessel function $J_n(x)$ Abramowitz and Stegun 22.16 [2]. \[
    j_{n,m} = \theta_{n,m} - \frac{4n^2-1}{8\theta_{n,m}} - \frac{4(4n^2-1)(28n^2-31)}{3(8\theta_{n,m})^3} - \frac{32(4n^2-1)(1328n^4-3928n^2+3779)}{15(8\theta_{n,m})^5} - \mathcal O(\theta_{n,m}^7)
    \] where $\theta_{n,m} = (m+\frac{n}{2}-\frac{1}{4})\pi$. Finally, a fairly decent estimation [3], \[
    x_k = \frac{j_{0,k}^2}{4n+2}\left( 1 + \frac{j^2_{0,k}-2}{12(2n+1)^2} \right) + \mathcal O(1/n^5).
    \] Numerical recipe provides another (poor but somewhat useful) estimation [4] \[
    x_1 = \frac{3}{1+2.4 n}, x_2 = x_1 + \frac{15}{1+0.9 n}, x_k = x_{k-1} + \frac{1+2.55(k-2)}{1.9 (k-2)}.
    \]
    Fig. 2, Estimating the zeros of $L_{16}(x)$ using various results.

    Once we obtain the initial estimations, we can use adaptive method to update them. There are several methods available in the market [5]. Newton-Raphson iteration is the most popular one. Secant method, Halley's method and Schroeder's method are its close cousins. Given the lower and upper bounds, the binary search (or bisection method) is also useful. Brent's method is a combination of the secant method and binary search. Laguerre's method is also a valuable method designed particularly for polynomials. Jenkins-Traub algorithm is a more sophisticated method for polynomials.

    Table: root-finding algorithms used by some popular softwares. (Taken from Mekwi 2001 [5])

    In this particular problem, we want to obtain the zeros of the Laguerre polynomial. The Newton procedure is \[
    x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x))}.
    \] Laguerre polynomial oscillates very rapidly at large $n$ and $x$.  This causes a severe overshoot problem in the iteration process at large $k$ where the estimation formula works poorly. The problem can be partly cured by including a damping function $\rho(x) = \exp{-x/2}$. Then the Newton iteration becomes \[
    x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) - \frac{1}{2}x L_n(x)}.
    \] In general, one can tune the iteration procedure with relaxation parameters: \[
    x \to x - \omega \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) - \lambda x L_n(x)}.
    \] However, the problem persists at large $n$. This problem is not limited to the Newton method. The fundamental issue is that the estimation formula becomes less accurate at large $k$. It is more likely to start from the wrong "valley" or hit small $L'_n(x)$. Both cases result in converging to the wrong zero.

    Fig. 3a, Newton iteration.
    Fig. 3b,  Newton iteration improved by relaxation parameters.


    One simple solution of this problem is to lower the order of the polynomial as more and more zeros are found. Suppose we have already got the first $m, (m<n)$ zeros. Let $L_n(x) = (x-x_1)\cdots(x-x_m)p_m(x)$. Then the rest of the zeros of $L_n(x)$ contains in $p_m(x)$. $p_m$ is a lower order polynomial and behaves much smoother than $L_n$. We shall use $p_m$ to do the Newton iteration instead of $L_n$. The iteration procedure becomes \[
     x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\sum_{i=1}^{m}\frac{x}{x-x_i} }
    \] Or even, \[
     x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\sum_{i=1,i\ne m}^{n}\frac{x}{x-x_i} }
    \] This method can also be tuned with relaxation parameters: \[
     x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\left(\lambda x + \mu \sum_{i=1,i\ne m}^{n}\frac{x}{x-x_i}\right) }
    \]

    This method is in fact share the same spirit (and of course weakness) with the Bairstow's method, which works with generic polynomial thus divides the main polynomial by $x^2 + ax + b$. This method is essentially the Aberth-Ehrlich's method [6]. (Of course it has been known before!)

    Note that we always express everything (derivatives) in terms of the Laguerre polynomials, which are evaluated from the recurrence relations: \[
    L_0(x) = 1, \; L_1(x) = 1-x, \quad L_{n+1}(x) = \frac{2n+1-x}{n+1}L_n(x) - \frac{n}{n+1} L_{n-1}(x)
    \] To incorporate the damping factor $\exp(-x/2)$, we can add it to $L_0 \to \exp(-x/2)$ and $L_1 \to (1-x) \exp(-x/2)$. Alternatively, one can compute the ratio $L_n(x)/L'_n(x) = \frac{x}{n(1-r_n)}$, $r_n \equiv \frac{L_{n-1}(x)}{L_n(x)} $ directly: \[
    r_1 = \frac{1}{1-x}, \; r_{n}^{-1}(x) = \frac{2n-1-x}{n} - \frac{n-1}{n} r_{n-1}
    \] which can be written as a (finite) continued fraction: \[
    n r_n(x) = \frac{n^2}{2n-1-x-}\frac{(n-1)^2}{2n-3-x-}\cdots \frac{1}{1-x}, \; \frac{L_n(x)}{L'_n(x)} =\frac{x}{n-n r_n(x)}
    \]

    Below are some applications of this method comparing with the zeros obtained from Mathematica's NSolve function. This method shows more stability at large $n$ and $k$.







    References:

    [1] M.E.H. Ismail, X. Li, Bounds on the extreme zeros of orthogonal polynomials, Proc. Amer. Math. Soc., 115 (1992), pp. 131–140.

    [2]  M. Abramowitz and I. Stegun, eds. (1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, 22.16.

    [3] M. Abramowitz and I. Stegun, eds. (1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, 9.5.12.

    [4] W. H. Press, S. A. Teukolsky, W. T. Wetterling and B. P. Flannery, Numerical Recipes: The Art of Scientific Computing (2007), 3rd Edition, Cambridge University Press.

    [5] Wankere R. Mekwi, Iterative Methods for Roots of Polynomials, MSc thesis (2001), University of Oxford, Trinity.

    [6] O. Aberth,  Iteration methods for finding all zeros of a polynomial simultaneously, Math. Comp. 27, pp. 339-344 (1973); L. W. Ehrlich, A modified Newton method for polynomials, Comm. ACM 10, pp. 107-108, (1967).

    Jun 20, 2014

    Numerical Differentiation

    Let $f(x)$ be a smooth function. We want to take its derivative at some point $x$, numerically. In this post, we assume the function is sampled on an equal length mesh. $h$ is the step, $0 \le h \ll 1$. We consider the leading order and higher order methods. Here by order we mean the order of residual error, $\mathcal O(h^n)$.

    Leading Order Finite Differences

    There are three standard way: the forward difference, the backward difference and the central (or symmetric) difference. They are defined as following: \[
    f'(x) \simeq \frac{f(x+h)-f(x)}{h} + \mathcal O (h), \\
    f'(x) \simeq \frac{f(x)-f(x-h)}{h} + \mathcal O (h), \\
    f'(x) \simeq \frac{f(x+h)-f(x-h)}{2h} + \mathcal O (h^2)
    \] The numerical accuracy can be seen from the Taylor expansion: \[
    f(x+h) = f(x) + h f'(x) + \frac{h^2}{2}f''(x) + \frac{h^3}{3!}f'''(x) + ... \\
    f(x-h) = f(x) - h f'(x) + \frac{h^2}{2}f''(x) - \frac{h^3}{3!}f'''(x) + ....
    \] This method, used recursively, also produces higher order numerical differentiation. Central difference for example: \[
    f^{(n)}(x) \simeq \frac{f^{(n-1)}(x+h) - f^{(n-1)}(x-h)}{2h}
    \] Therefore [theorem 1], \[
    f^{(n)}(x) = \frac{1}{h^n}\sum_{k = 0}^n {n \choose k} (-1)^k f(x+(n/2-k)h) + \frac{n h^2}{4!}f^{(n+2)}(\xi);
    \] The accuracy can be proven by substituting in the Taylor expansion. Just note (Stirling number cf. The Stirling number S(n,k) is the number of ways to partition a set of n objects into k groups.) \[
    S(m,n) \equiv \frac{1}{n!}\sum_{k=0}^n {n \choose k} (-1)^{n-k} k^m = \left\{ \begin{array}{ll} 0, & m<n \\ 1, &m=n \\  n(n+1)/2, &m=n+1\\ S(n-1,m-1) + m S(n-1,m), & \\ \end{array}\right. \] When $n = 1 \bmod 2$, the meshes are on half integers. One can use $h \to 2h$ to recover the integral grid. The series sometimes is defined as an operator known as the central difference operator (Abramowitz and Stegun, $\S$25.1, "Differences"): \[
    \delta_n^{2k} f = \sum_{j=0}^{2k} (-1)^j { 2k \choose j} f_{n+k-j}, \\
    \delta_{n+\frac{1}{2}}^{2k+1} f = \sum_{j=0}^{2k+1} (-1)^j { 2k+1 \choose j} f_{n+k+1-j}.
    \] Then our theorem 1 can be written as: $ \frac{d^n}{d x^n} f(x) = \frac{\delta_0^{n}}{\Delta x^n}  f(x) + \mathcal O(\Delta x^2)$.

    High Order Finite Differences

    In the previous section, we have discussed approximate the differentiation numerically up to leading order $\mathcal O(h)$. The higher order numerical differentiations are rarely used but nevertheless helpful to know. Inspired by the central difference series, in general, we can approximate the $n$-th order derivative by series \[
    f^{(n)}(x) = \sum_k c_k^{(n,r)} f(x + k h) + \mathcal O(h^r)
    \] Obviously, the larger $n$ and/or $r$ is, the more terms we need. We can use the Lagrange polynomial to approximate the function $f(x)$ up to $f^{(n+r)}(\xi)$ and then take the $n$-th order derivative of the Lagrange polynomial. Then \[
    f^{(n)} (x) = \sum_k l_k^{(n)}(x) f(x+kh),
    \] where $l_i(x) = \prod_{j\ne i} \frac{x-x_j}{x_i-x_j}$.

    example: the finite-difference approximation of the first order derivative ($f_n = f(x+nh)/h$): \[
    f'(x) \simeq  \sum_{k=1}^{n} \frac{(-1)^{k-1} (n)_k}{(n+1)^{(k)}}\frac{f(x+kh) - f(x-kh)}{kh},
    \] where $(x)_k = x(x-1)\cdots(x-k+1)$ is the falling factorial and $x^{(k)} = x(x+1)\cdots(x+k-1)$ is the rising factorial.
    second order derivatives ($f_n = f(x+nh)/h^2$): \[
    f''(x) \simeq 2\sum_{k=1}^n \frac{(-1)^{k-1} (n)_k}{(n+1)^{(k)}} \frac{ f(x+kh)+f(x-kh) }{k^2h^2} -  \frac{2H_{n,2}}{ h^2} f(x),
    \] where $H_{n,2} = \sum_{k=1}^n \frac{1}{k^2}$ is called harmonic number.
    third order derivatives ($f_n = f(x+nh)/h^3$):

    In Mathematica, function InterpolationPolynomial[] provides the Lagrange polynomial. The coefficients can be computed as following:
    L[K_] := InterpolatingPolynomial[
      Table[{k, Subscript[f, k]}, {k, -K, K}], x]
    
    fd[K_, n_] := Collect[#, Variables[#]] & @(D[L[K], {x, n}] /. x -> 0)
    

    Apr 2, 2014

    The Cyclic Data Distribution

    The cyclic distribution of data is a good strategy to solve the load balance problem in parallel computing. Here we discuss the 1D cyclic distribution.

    cyclic distribution

    Let $v$ be a 1D vector, $\dim v = N$. $P$ is the number of processors and $r$ denotes the rank of a processor, $0 \le r < P$. Suppose the global index of the vector is $I, (I = 0, 1, \cdots N-1)$ and the local index is $i, (i = 0, 1, \cdots n_r-1)$. Then in the 1D cyclic data distribution, \[
        I = i \cdot P + r \;; \quad r = I \bmod P, \; i = \lfloor I/P \rfloor.  
    \] The size of the local array on processor $r$ is $n_r = \left\lfloor \frac{N-1-r}{P} \right\rfloor$ + 1. The cyclic distribution is a permutation. We rarely use the global id after permutation, it still may be useful to know: $ I' = (I\bmod P) * \lfloor N/P \rfloor + \lfloor I/P \rfloor + \min(I\bmod P, N\bmod P)$.

    example:

    $v = (0, 1,2,3,4,5,6,7,8)$, $N = 9, P = 2$. The global ids are in bold font.

    r \ i | 0  1  2  3  4    |  nr
     - - - - - - - - - - - - - - - -   
     0   | 0  2  4  6  8    |  5
     1   | 1  3  5  7        |  4

    Fig. 1-1, the original matrix
    Fig 1-2, the matrix after cyclic redistribution on each dimension

    block cyclic distribution

    The block cyclic distribution is another scheme that can reconcile the need of load balance and the need of block distribution of data. In addition to the quantities we have introduced before, we need to introduce the block number $b, (b\gt 0)$. In the 1D block cyclic distribution scheme, \[
    I = \left( \left\lfloor i / b \right\rfloor P + r \right) \cdot b + ( i \bmod  b) \;; \quad
    r = \left\lfloor (I \bmod bP ) / b \right\rfloor, \;
    i = \left\lfloor I / b P \right\rfloor \cdot b + (I \bmod b).  \] The size of the local array on processor $r$ is $n_r = \left\lfloor \frac{N-1-r\cdot b}{b P}\right\rfloor b + b + \delta \left(r, \left\lfloor (N-1) / b \right\rfloor \bmod P \right) \cdot \left( (N-1) \bmod b  + 1 - b \right) $.

    example:

    $v = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)$, $N = 10, P = 2, b = 3$. The global ids are in bold font.

    r \ i | 0  1  2   3  4  5 |  nr
     - - - - - - - - - - - - - - - -   
     0   | 0  1  2   6  7  8  |  6
     1   | 3  4  5   9          |  4

    Mar 1, 2014

    The Butterfly Theorem

    Fig. 1, C is the midpoint of a chord EF. PQ and UV are two chords passing through C. PV meets EF at X. UQ meets EF at Y. The butterfly theorem says, CX = CY.    
    The Butterfly Theorem is a classical result in Euclidean geometry. It gives a beautiful shape of butterfly (Fig. 2) along with two equal length segments.

    Fig. 2, the butterfly in the butterfly theorem
    Fig. 3, proof of the theorem. O is the center of the circle. M, N are midpoint of the chord PV and UQ respectively. 

    Poof:

    Let M, N be the midpoint of chord PV and UQ respectively. O is the center of the circle.

    Points P, V, Q, U are on the same circle. So $\angle VPQ = \angle VUQ$ and $\angle PVU = \angle PQU$. So  $\triangle CPV \simeq \triangle CUQ$. So $\frac{MV}{NQ} = \frac{PV/2}{UQ/2} = \frac{PV}{UQ} = \frac{VC}{QC}$.

    $\frac{MV}{NQ} = \frac{VC}{QC}$ plus $\angle PVU  = \angle PQU$ implies $\triangle MVC \simeq \triangle CQN$. Thus $\angle VMC = \angle CNQ$.

    OM is perpendicular to PV. ON is perpendicular to UQ. OC is perpendicular to EF (CX, CY). So O, M, X, C are on the same circle. O, C, Y, N are on the same circle. Thus, $\angle XOC = \angle XMC = \angle YNC = \angle YOC$. Note that OC is perpendicular to EF. Therefore, CX = CY.

    The proof of the theorem gives us another butterfly (Fig. 4), which also consists of a pair of similar triangles.

    Fig. 4.