Showing posts with label numerical method. Show all posts
Showing posts with label numerical method. Show all posts

Oct 8, 2016

Numerical Solutions of the Non-Relativistic Yukawa Theory

The Yukawa potential (aka. screened Coulomb potential) $V(r) = -(\alpha/r) \exp(-\mu r)$ is a universal potential for interactions mediated by massive particles in the non-relativistic limit. It is also useful for describing interactions in the medium -- the "screened Coulomb interaction". Thus it is very important to find the solutions for the Yukawa potential in non-relativistic Schroedinger equation: \[
\Big[ -\frac{\nabla^2}{2m} - \frac{\alpha}{r} e^{-\mu r} \Big] \psi(\vec r) = E \, \psi(\vec r),
\] where $m$ is the reduced mass, $\alpha \equiv g^2/(4\pi)$ is the strength of the interaction with $g$ the "charge", $\mu$ is the mass of the exchanged particle, and $\mu=0$ gives the Coulomb potential. Alas, the above eigenvalue problem does not admit analytic solution, except for the Coulomb case. This post describes a numerical method for solving Schroedinger equation with the Yukawa potential.

Fig. 1: Feynman diagram of a boson (wavy line) mediated interaction.

In the momentum space, the eigenvalue equation becomes an integral equation: \[
\frac{\vec p^2}{2m} \psi(\vec p) - \int \frac{\mathrm{d}^3 p'}{(2\pi)^3} \frac{4\pi \alpha}{(\vec p-\vec p')^2+\mu^2} \psi(\vec p') = E\,\psi(\vec p).
\] Here $\psi(\vec p)$ represents the momentum-space wave functions, related to the coordinate-space wave function $\psi(\vec r)$ by a Fourier transformation, \[
\psi(\vec p) = \int \mathrm d^3r \, e^{i \vec p\cdot \vec r} \psi(\vec r). \\
\psi(\vec r) = \int \frac{\mathrm d^3 p}{(2\pi)^3} \, e^{-i \vec p\cdot r} \psi(\vec p).
\] We have abused the notation $\psi$ for wave functions. The wave functions should be normalized according to \[
\int \mathrm d^3 r\, \psi^*(\vec r) \psi(\vec r) = 1, \\
\int \frac{\mathrm d^3 p}{(2\pi)^3} \psi^*(\vec p) \psi(\vec p) = 1.
\]

Dimension Reduction

Before employing numerical methods, let us first put the equation to a dimensionless form. The natural scale in the problem is the Bohr radius ($\hbar = c = 1$): $a \equiv 1/(\alpha m)$. We also define: $\lambda \equiv a\mu$, $\kappa \equiv 2 m E a^2 = 2 E/ (m \alpha^2) \equiv E / E_B$, with $E_B \equiv \frac{1}{2} \alpha^2 m$ being the ground state energy (binding energy) of the Coulomb theory (such as the Hydrogen atom, positronium and so on), and $\vec p \to a \vec p$, $\vec r \to \vec r/a$. Then, the equations becomes, \[
\Big[ - \nabla^2 - \frac{2}{r} e^{-\lambda r} \Big] \psi(\vec r) = \kappa \, \psi(\vec r), \\
\vec p^2 \psi(\vec p) - \int \frac{\mathrm d^3 p'}{(2\pi)^3} \frac{8\pi}{(\vec p - \vec p')^2 + \lambda^2} \psi(\vec p') = \kappa \, \psi(\vec p).
\] The theory has a rotational symmetry and the wave functions can be written as, \[
\psi(\vec r) = R(r) Y_{lm}(\hat r), \quad \psi(\vec p) = \phi(p) Y_{lm}(\hat p).
\] $Y_{lm}$ are the spherical harmonics. They are the eigen function of the angular momentum squared operator $\vec L^2$. Therefore, we only need to solve for the radial part $R(r)$ or $\phi(p)$. They are related by a Hankel transformation: \[
R(r) = \frac{(-i)^\ell}{8\pi^3} \int_0^\infty \mathrm dp \, p^2 j_\ell(pr) \phi(p), \\
\phi(p) = i^\ell \int_0^\infty \mathrm dr \, r^2 j_\ell(pr) R(r).
\] $j_\ell(z)$ is the spherical Bessel function of the first kind. $Y_{lm}$ are normalized: \[
\int \mathrm d^2\Omega(\hat r) Y^*_{l'm'}(\hat r) Y_{lm}(\hat r) = \delta_{ll'}\delta_{mm'}.
\] Similarly, the radial wave function are normalized according to: \[
\int_0^\infty \mathrm d r \, r^2 R^2(r) = 1, \\
\frac{1}{8\pi^3} \int_0^\infty \mathrm d p\, p^2 \phi^2(p) = 1.
\]

Coordinate Space

The coordinate-space Schroedinger equation can be written in the spherical coordinates: \[
\Big[ - \frac{1}{r^2}\frac{\partial }{\partial r}\big( r^2 \frac{\partial }{\partial r} ) + \frac{l(l+1)}{r^2} - \frac{2}{r} e^{-\lambda r} \Big] R(r) = \kappa \, R(r). \\
\Leftrightarrow \Big[- \frac{\partial^2 }{\partial r^2} + \frac{l(l+1)}{r^2} - \frac{2}{r} e^{-\lambda r} \Big] u(r) = \kappa \, u(r),
\] where $u(r) = r R(r)$.

The above differential equations can be discretized and solved numerically. In this post, however, we will focus on the momentum space representation and its numerical solutions.

Momentum Space

In the momentum space, Schroedinger equation can be written as \[
p^2 \phi(p) - \frac{4}{\pi} \int_0^\infty \mathrm dp' \, p'^2 K_\ell (p, p') \phi(p') = \kappa \, \phi(p),
\] where \[
K_\ell(p, p') \equiv \int_0^\infty \mathrm dr \, r \exp(-\lambda r) j_\ell(pr) j_\ell(p' r).
\]
Here are the analytical expressions for the first few $K_\ell$: \[
\begin{split}
(S\text{-wave})\quad K_0(p, p') =& \frac{1}{4 pp'}\log \Big[\frac{(p+p')^2+\lambda^2}{(p-p')^2+\lambda^2}\Big]; \\
(P\text{-wave})\quad K_1(p, p') =& \frac{p^2+p'^2+\lambda^2}{4 p^2p'^2}\log\Big[\frac{(p+p')^2+\lambda^2}{(p-p')^2+\lambda^2}\Big]-\frac{1}{2pp'}\\
(D\text{-wave})\quad K_2(p, p') =& \frac{3p^4+2p^2p'^2+3p'^4+6(p^2+p'^2)\lambda^2+3\lambda^4}{16p^3p'^3}\log\Big[\frac{(p+p')^2+\lambda^2}{(p-p')^2+\lambda^2}\Big]
- \frac{3(p^2+p'^2+\lambda^2)}{8p^2p'^2}\\
\vdots&
\end{split}
\] In the general case, $K_\ell$ can be evaluated numerically using quadrature method.

We can use the quadrature method to approximate the integral. For a general integral, \[
\int \mathrm dx \, f(x) = \sum_{i=1}^N w_i f(x_i) + R[f^{(2N+1)}(\xi)]
\] where $x_i$ and $w_i$ are pre-chosen abscissas and weights. For the radial integral, we can use the Gauss-Legendre quadrature with a mapping function: \[
\theta: (-1, +1) \to (0, \infty); \\
r_i = \theta(x^\text{gl}_i), \quad w_i = w^\text{gl}_i \theta'(x_i)
\] where $x^\text{gl}_i$, and $w^\text{gl}_i$ are the standard Gauss-Legendre quadrature abscissas and weights.

While in principle the converged result is independent of the mapping function, the choice of the mapping function is very important in practice, as it controls the rate of convergence. It should be chosen to cover the extent of the wave function. Because different states have different radial extent, it is often difficult to come up with a universal mapping function optimal for all excited states. Fortunately, in most practical cases, we are interested in the first lowest states.  Common choices include (first, $x$ is mapped to $(0,1)$ ): $r = x/(1-x)$, $r=a \tan (x \pi/2)$, $r = a (1-e^{-bx})/(e-e^x)$ ...

The integral equation becomes, \[
p^2_i \phi(p_i) - \frac{4}{\pi} \sum_j w_j p^2_j K_\ell (p_i, p_j) \phi(p_j) = \kappa \, \phi(p_i).
\] It is convenient to define $v_i \equiv \sqrt{w_i} p_i \phi(p_i)$, with normalization \[
\sum_i v_i^2 = 1.
\] Then, the discretized equation becomes, \[
\sum_j H_{ij} v_j = \kappa \, v_i,
\] where the Hamiltonian matrix $H_{ij}$ reads, \[
H_{ij} = \delta_{ij} p^2_i - (4/\pi) \sqrt{w_i w_j} \, p_i p_j \, K_\ell (p_i, p_j).
\] $H$ is obviously Hermitian. The energy eigenvalue and the wave functions can be obtained by numerically diagonalizing the Hamiltonian matrix.


Fig 2: Visualization of the Hamiltonian matrix, $\ell = 0$ (S-wave), quadrature order $N=512$.


The system can be solved in different quadrature order. The continuum limit can be reach by extrapolating the quadrature order $N\to\infty$.

Energy Eigenvalues 

The obtained energy eigenvalues as a function of the exchanged particle mass $\mu$ are shown in Fig. 3. At small $\mu$, the binding energies approach to the standard Coulomb values $E_n^{(\text{coul})}=-\alpha^2 m / (2n^2)$ where $n=1,2,3,4,\cdots$ is the principle quantum number. The ground state energy is also in good agreement with the upper bound obtained from the variational method, using the Coulomb wave function as trial functions. As $\mu$ increases, bound states gradually disappear. Even the ground state dissociates as $\mu \gtrsim \alpha m$.


Fig. 3: The S-wave energy eigenvalues as a function of $\mu$. The values are obtained by extrapolating results over quadrature grids of the order $N=16, 32, 64, 128, 512, 1024$. The red crosses represent Coulomb energy levels $E_n^{(\text{coul})} = - \alpha^2 m/(2n^2)$, $n=1,2,3,\cdots$. The red dashed line represents the upper bound obtained from variational method, using the Coulomb wave function as trial functions.

Wave Functions

Fig. 4: The obtained ground state S-wave momentum-space wave functions for selected $\mu$.

The momentum-space wave functions can be extracted from the obtained eigenvectors (see Fig. 4 and 5). At small $\mu$, the wave functions approach to the Coulomb result. As $\mu$ increases, the wave function within the momentum space becomes narrower. Furthermore, the change in excited wave functions are more dramatic comparing to the ground state wave function as $\mu$ increases.


Fig. 5: The obtained radially excited S-wave momentum-space wave functions for selected $\mu$.





.

Jun 23, 2014

Zeros of the Laguerre Polynomial

The zeros of the Laguerre polynomials $L_n(x)$ are useful in the Gauss-Laguerre quadrature: \[
\int_0^\infty \exp(-x) \; f(x) = \sum_{k=1}^n w_k f(x_k) + \frac{n!^2}{(2n)!} f^{(2n)}(\xi)
\] where $x_k$ is the $k$-th root of the Laguerre polynomial, $0 < x_1 < x_2 < \cdots < x_n$ and \[
w_k = \frac{x_k}{(n+1)^2 \left[ L_{n+1}(x) \right]^2}.
\] The figure below is a illustration.

Fig. 1, A illustration of the Gauss-Laguerre quadrature. The widths of the boxes equal the quadrature weights. The area of the boxes approximates the area of the function.

The general strategy to compute the zeros of a function numerically is to get a decent analytical estimation and then improve it with Newton iteration or similar methods.

The Laguerre polynomial $L_n(x)$ has $n$ real non-degenerate zeros. These zeros lies within $\frac{1.20241}{2n+1} < x_1 < \cdots < x_n < 2n-2+\sqrt{1+4(n-1)^2 \cos^2\frac{\pi}{n+1}}$ according to Ismail and Li (1992) [1]. Individual zero can also be bounded $ \frac{j^2_{0,k}}{4n+2} < x_k < \frac{2k+1}{2n+1}(2k+1+\sqrt{(2k+1)^2+\frac{1}{4}})$, where $j_{n,m}$ is the $m$-th zero of the Bessel function $J_n(x)$ Abramowitz and Stegun 22.16 [2]. \[
j_{n,m} = \theta_{n,m} - \frac{4n^2-1}{8\theta_{n,m}} - \frac{4(4n^2-1)(28n^2-31)}{3(8\theta_{n,m})^3} - \frac{32(4n^2-1)(1328n^4-3928n^2+3779)}{15(8\theta_{n,m})^5} - \mathcal O(\theta_{n,m}^7)
\] where $\theta_{n,m} = (m+\frac{n}{2}-\frac{1}{4})\pi$. Finally, a fairly decent estimation [3], \[
x_k = \frac{j_{0,k}^2}{4n+2}\left( 1 + \frac{j^2_{0,k}-2}{12(2n+1)^2} \right) + \mathcal O(1/n^5).
\] Numerical recipe provides another (poor but somewhat useful) estimation [4] \[
x_1 = \frac{3}{1+2.4 n}, x_2 = x_1 + \frac{15}{1+0.9 n}, x_k = x_{k-1} + \frac{1+2.55(k-2)}{1.9 (k-2)}.
\]
Fig. 2, Estimating the zeros of $L_{16}(x)$ using various results.

Once we obtain the initial estimations, we can use adaptive method to update them. There are several methods available in the market [5]. Newton-Raphson iteration is the most popular one. Secant method, Halley's method and Schroeder's method are its close cousins. Given the lower and upper bounds, the binary search (or bisection method) is also useful. Brent's method is a combination of the secant method and binary search. Laguerre's method is also a valuable method designed particularly for polynomials. Jenkins-Traub algorithm is a more sophisticated method for polynomials.

Table: root-finding algorithms used by some popular softwares. (Taken from Mekwi 2001 [5])

In this particular problem, we want to obtain the zeros of the Laguerre polynomial. The Newton procedure is \[
x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x))}.
\] Laguerre polynomial oscillates very rapidly at large $n$ and $x$.  This causes a severe overshoot problem in the iteration process at large $k$ where the estimation formula works poorly. The problem can be partly cured by including a damping function $\rho(x) = \exp{-x/2}$. Then the Newton iteration becomes \[
x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) - \frac{1}{2}x L_n(x)}.
\] In general, one can tune the iteration procedure with relaxation parameters: \[
x \to x - \omega \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) - \lambda x L_n(x)}.
\] However, the problem persists at large $n$. This problem is not limited to the Newton method. The fundamental issue is that the estimation formula becomes less accurate at large $k$. It is more likely to start from the wrong "valley" or hit small $L'_n(x)$. Both cases result in converging to the wrong zero.

Fig. 3a, Newton iteration.
Fig. 3b,  Newton iteration improved by relaxation parameters.


One simple solution of this problem is to lower the order of the polynomial as more and more zeros are found. Suppose we have already got the first $m, (m<n)$ zeros. Let $L_n(x) = (x-x_1)\cdots(x-x_m)p_m(x)$. Then the rest of the zeros of $L_n(x)$ contains in $p_m(x)$. $p_m$ is a lower order polynomial and behaves much smoother than $L_n$. We shall use $p_m$ to do the Newton iteration instead of $L_n$. The iteration procedure becomes \[
 x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\sum_{i=1}^{m}\frac{x}{x-x_i} }
\] Or even, \[
 x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\sum_{i=1,i\ne m}^{n}\frac{x}{x-x_i} }
\] This method can also be tuned with relaxation parameters: \[
 x \to x - \frac{x L_n(x)}{n(L_n(x) - L_{n-1}(x)) -  L_n(x)\left(\lambda x + \mu \sum_{i=1,i\ne m}^{n}\frac{x}{x-x_i}\right) }
\]

This method is in fact share the same spirit (and of course weakness) with the Bairstow's method, which works with generic polynomial thus divides the main polynomial by $x^2 + ax + b$. This method is essentially the Aberth-Ehrlich's method [6]. (Of course it has been known before!)

Note that we always express everything (derivatives) in terms of the Laguerre polynomials, which are evaluated from the recurrence relations: \[
L_0(x) = 1, \; L_1(x) = 1-x, \quad L_{n+1}(x) = \frac{2n+1-x}{n+1}L_n(x) - \frac{n}{n+1} L_{n-1}(x)
\] To incorporate the damping factor $\exp(-x/2)$, we can add it to $L_0 \to \exp(-x/2)$ and $L_1 \to (1-x) \exp(-x/2)$. Alternatively, one can compute the ratio $L_n(x)/L'_n(x) = \frac{x}{n(1-r_n)}$, $r_n \equiv \frac{L_{n-1}(x)}{L_n(x)} $ directly: \[
r_1 = \frac{1}{1-x}, \; r_{n}^{-1}(x) = \frac{2n-1-x}{n} - \frac{n-1}{n} r_{n-1}
\] which can be written as a (finite) continued fraction: \[
n r_n(x) = \frac{n^2}{2n-1-x-}\frac{(n-1)^2}{2n-3-x-}\cdots \frac{1}{1-x}, \; \frac{L_n(x)}{L'_n(x)} =\frac{x}{n-n r_n(x)}
\]

Below are some applications of this method comparing with the zeros obtained from Mathematica's NSolve function. This method shows more stability at large $n$ and $k$.







References:

[1] M.E.H. Ismail, X. Li, Bounds on the extreme zeros of orthogonal polynomials, Proc. Amer. Math. Soc., 115 (1992), pp. 131–140.

[2]  M. Abramowitz and I. Stegun, eds. (1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, 22.16.

[3] M. Abramowitz and I. Stegun, eds. (1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, 9.5.12.

[4] W. H. Press, S. A. Teukolsky, W. T. Wetterling and B. P. Flannery, Numerical Recipes: The Art of Scientific Computing (2007), 3rd Edition, Cambridge University Press.

[5] Wankere R. Mekwi, Iterative Methods for Roots of Polynomials, MSc thesis (2001), University of Oxford, Trinity.

[6] O. Aberth,  Iteration methods for finding all zeros of a polynomial simultaneously, Math. Comp. 27, pp. 339-344 (1973); L. W. Ehrlich, A modified Newton method for polynomials, Comm. ACM 10, pp. 107-108, (1967).

Jun 20, 2014

Numerical Differentiation

Let $f(x)$ be a smooth function. We want to take its derivative at some point $x$, numerically. In this post, we assume the function is sampled on an equal length mesh. $h$ is the step, $0 \le h \ll 1$. We consider the leading order and higher order methods. Here by order we mean the order of residual error, $\mathcal O(h^n)$.

Leading Order Finite Differences

There are three standard way: the forward difference, the backward difference and the central (or symmetric) difference. They are defined as following: \[
f'(x) \simeq \frac{f(x+h)-f(x)}{h} + \mathcal O (h), \\
f'(x) \simeq \frac{f(x)-f(x-h)}{h} + \mathcal O (h), \\
f'(x) \simeq \frac{f(x+h)-f(x-h)}{2h} + \mathcal O (h^2)
\] The numerical accuracy can be seen from the Taylor expansion: \[
f(x+h) = f(x) + h f'(x) + \frac{h^2}{2}f''(x) + \frac{h^3}{3!}f'''(x) + ... \\
f(x-h) = f(x) - h f'(x) + \frac{h^2}{2}f''(x) - \frac{h^3}{3!}f'''(x) + ....
\] This method, used recursively, also produces higher order numerical differentiation. Central difference for example: \[
f^{(n)}(x) \simeq \frac{f^{(n-1)}(x+h) - f^{(n-1)}(x-h)}{2h}
\] Therefore [theorem 1], \[
f^{(n)}(x) = \frac{1}{h^n}\sum_{k = 0}^n {n \choose k} (-1)^k f(x+(n/2-k)h) + \frac{n h^2}{4!}f^{(n+2)}(\xi);
\] The accuracy can be proven by substituting in the Taylor expansion. Just note (Stirling number cf. The Stirling number S(n,k) is the number of ways to partition a set of n objects into k groups.) \[
S(m,n) \equiv \frac{1}{n!}\sum_{k=0}^n {n \choose k} (-1)^{n-k} k^m = \left\{ \begin{array}{ll} 0, & m<n \\ 1, &m=n \\  n(n+1)/2, &m=n+1\\ S(n-1,m-1) + m S(n-1,m), & \\ \end{array}\right. \] When $n = 1 \bmod 2$, the meshes are on half integers. One can use $h \to 2h$ to recover the integral grid. The series sometimes is defined as an operator known as the central difference operator (Abramowitz and Stegun, $\S$25.1, "Differences"): \[
\delta_n^{2k} f = \sum_{j=0}^{2k} (-1)^j { 2k \choose j} f_{n+k-j}, \\
\delta_{n+\frac{1}{2}}^{2k+1} f = \sum_{j=0}^{2k+1} (-1)^j { 2k+1 \choose j} f_{n+k+1-j}.
\] Then our theorem 1 can be written as: $ \frac{d^n}{d x^n} f(x) = \frac{\delta_0^{n}}{\Delta x^n}  f(x) + \mathcal O(\Delta x^2)$.

High Order Finite Differences

In the previous section, we have discussed approximate the differentiation numerically up to leading order $\mathcal O(h)$. The higher order numerical differentiations are rarely used but nevertheless helpful to know. Inspired by the central difference series, in general, we can approximate the $n$-th order derivative by series \[
f^{(n)}(x) = \sum_k c_k^{(n,r)} f(x + k h) + \mathcal O(h^r)
\] Obviously, the larger $n$ and/or $r$ is, the more terms we need. We can use the Lagrange polynomial to approximate the function $f(x)$ up to $f^{(n+r)}(\xi)$ and then take the $n$-th order derivative of the Lagrange polynomial. Then \[
f^{(n)} (x) = \sum_k l_k^{(n)}(x) f(x+kh),
\] where $l_i(x) = \prod_{j\ne i} \frac{x-x_j}{x_i-x_j}$.

example: the finite-difference approximation of the first order derivative ($f_n = f(x+nh)/h$): \[
f'(x) \simeq  \sum_{k=1}^{n} \frac{(-1)^{k-1} (n)_k}{(n+1)^{(k)}}\frac{f(x+kh) - f(x-kh)}{kh},
\] where $(x)_k = x(x-1)\cdots(x-k+1)$ is the falling factorial and $x^{(k)} = x(x+1)\cdots(x+k-1)$ is the rising factorial.
second order derivatives ($f_n = f(x+nh)/h^2$): \[
f''(x) \simeq 2\sum_{k=1}^n \frac{(-1)^{k-1} (n)_k}{(n+1)^{(k)}} \frac{ f(x+kh)+f(x-kh) }{k^2h^2} -  \frac{2H_{n,2}}{ h^2} f(x),
\] where $H_{n,2} = \sum_{k=1}^n \frac{1}{k^2}$ is called harmonic number.
third order derivatives ($f_n = f(x+nh)/h^3$):

In Mathematica, function InterpolationPolynomial[] provides the Lagrange polynomial. The coefficients can be computed as following:
L[K_] := InterpolatingPolynomial[
  Table[{k, Subscript[f, k]}, {k, -K, K}], x]

fd[K_, n_] := Collect[#, Variables[#]] & @(D[L[K], {x, n}] /. x -> 0)

Nov 1, 2013

Nonlinear Least Square Regression

The nonlinear least square fit

The nonlinear least square fit (regression) minimizes the target action: \[ S(\theta) = \sum_i w_i (y_i - f(x_i; \theta))^2
\] with positive weights $w_i \ge 0$, for a given set of data $\{ (x_1, y_1), (x_2,y_2),\cdots \}$ and a form function $f(x; \theta) \equiv f(x;\theta_1,\theta_2,\cdots \theta_p)$.

The minima lies in the extrema: \[
0 = \frac{\partial S}{\partial \theta_r} = -2 \sum_i w_i (y_i - f(x_i;\theta))\frac{\partial f(x_i;\theta)}{\partial \theta_r}. \quad r = 1,2,\cdots p \]

Gauss-Newton algorithm

The Newton's method for solving for a local root of a function $\varphi(x)$ is, \[
x_{n+1} = x_n - \frac{\varphi'(x_n)}{\varphi(x_n)}
 \] In this problem, $\varphi(\theta) = S'(\theta)$. Therefore, the Newton's method can be realized as \[
\theta^{(k+1)} = \theta^{(k)} - (H S(\theta^{(k)}))^{-1} \nabla S(\theta^{(k)})
\] where $[H S(\theta)]_{rs} = {\partial^2 S}{\partial \theta_r \partial \theta_s}$ is the Hesse matrix (Jacob matrix) of $S$. It is easy to show \[
\left[ H S \right]_{rs} = 2 \sum_i w_i \frac{\partial f_i}{\partial\theta_r}\frac{\partial f_i}{\partial \theta_s} - 2 \sum_i w_i (y_i - f(x_i)) \frac{\partial^2 f_i}{\partial\theta_r\partial\theta_s}
\] Ignore the second term, as the residue and the second derivative should be small. Then the Gauss-Newton algorithm gives the iteration procedure: \[
\theta^{(k+1)} = \theta^{(k)} + \frac{\mathbf{J}^T\cdot \mathbf{W}\cdot (\mathbf{y}-f(\mathbf{x};\theta^{(k)}) )}{\mathbf{J}^T\cdot\mathbf{W}\cdot\mathbf{J}}
\]

Levenberg-Marquardt algorithm

Levenberg-Marquardt algorithm is one of the most important algorithm based on the Gauss-Newton algorithm. Let $\theta_r, r = 1,2,\cdots p$ be the fitting parameters such as $a,b,c$. Define the Jacobian \[
J_{i,r} = \frac{\partial f(x_i;\theta)}{\partial \theta_r}
\] Define $ W_{ij} = w_i \delta_{ij}$. The iterative procedure is: \[
\mathbf{\theta}^{(k+1)} = \mathbf{\theta}^{(k)} + \alpha \frac{\mathbf{J}^T \cdot \mathbf{W} \cdot (\mathbf{y} - f(\mathbf{x};\mathbf{\theta}^{(k)}))}{\lambda \mathbf{\Lambda} + \mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J}}
\] where $\mathbf{\Lambda}$ is a diagonal matrix, either chosen as the identity matrix or the diagonal term of $\mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J}$, $0<\alpha<1$ is a small positive number.

Therefore, the solution $\bar\theta$ satisfies equation $\mathbf{J}^T_{\bar\theta} \cdot \mathbf{W} \cdot (\mathbf{y} - f(\mathbf{x};\bar{\mathbf{\theta}})) = 0$ or \[
\sum_i w_i  \frac{\partial f(x_i; \bar\theta)}{\partial \theta_r} (y_i - f(x_i; \bar\theta)) = 0
\] The choice of $\lambda$ is very important. Nonzero $\lambda$ can avoid local minimum but large $\lambda$ is likely to slow down the convergence. A useful strategy is to to choose a nonzero $\lambda$ to start with and decrease $\lambda$ as the iteration converges.

Parameter Error Estimation

It is important to attach estimated error bars/bands to the fitted function (or the fitted parameters). Let's write the data as $ y_i = f(x_i) + \xi_i$, where $\xi_i$ are the residuals. If we assume $\xi_i, i=1,2,3,\cdots n$ are $n$ independent random variables in Normal distribution. The likelihood of the fitted parameters can be written as \[
L(\theta_r, \sigma)
= N_\sigma \exp\left[ -\frac{1}{2\sigma^2}\sum_i w_i (y_i-f(x_i;\theta_r))^2\right] \\
\simeq N'_\sigma \exp\left[-\frac{1}{2\sigma^2} \sum_{rs} (\theta_r-\bar\theta_r) A_{rs} (\theta_s-\bar\theta_s) \right]
\] where $A_{rs} = \sum_i w_i  \frac{\partial f(x_i;\bar\theta)}{\partial\theta_r} \frac{\partial f(x_i;\bar\theta)}{\partial\theta_s} = \left[ \mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J} \right]_{rs}
$  and $N_\sigma, N'_\sigma$ are normalization factors.

The variance $\sigma^2$ can be estimated by the estimator \[
\hat{\sigma}^2 = \frac{1}{n-p}\sum_i w_i (y_i - f(x_i; \bar\theta))^2 = \frac{S({\bar\theta})}{n-p}.
 \] The covariance matrix of the parameters is, \[ \text{Cov}(\theta) = \hat\sigma^2 A^{-1}
\] It's also useful to find the prediction band. The width of the band is the variance of function $f(x; \theta)$. The "Delta Method" gives an approximative way to calculate $\text{Var}\left[ f(x; \theta) \right]$ from $\text{Cov}[ \theta ]$. \[
\sigma_f(x) = \text{Var}\left[ f(x; \bar\theta) \right] = \frac{\partial f^T(x;\bar\theta)}{\partial\mathbf{\theta}}\cdot\text{Cov}( \mathbf\theta) \cdot \frac{\partial f(x;\bar\theta)}{\partial\mathbf{\theta}} \\
 = \hat\sigma^2 \sum_{r,s} \frac{\partial f(x;\bar\theta)}{\partial\theta_r}\left[ A^{-1} \right]_{rs}\frac{\partial f(x;\bar\theta)}{\partial\theta_s}
\]
Finally,the distribution of the residuals can be test by the standard $\chi^2$-test.

Example: a power-law fitting

In our example, we will explore fitting some data to the power-law function $f(x) = a + b/x^c $.


Fig. 1: (Color Lines) Black, the theoretical curve; Red, fit with uniform weights; Green fit with weights proportional to $x$



The corresponding covariance matrices for the parameters $a,b,c$.


The diagonal terms gives the variance of the parameters.


Oct 16, 2013

Zeros of the Legendre Polynomials


Legendre functions (Legendre polynomials) are the solutions of the following linear Ordinary Differential Equation (ODE), \[
\frac{\mathrm{d}}{d x}\left[ (1-x^2) \frac{\mathrm{d}}{\mathrm{d}x} P_n(x) \right] + n(n+1)P_n(x) = 0
\]

The Legendre function $P_n(x)$ is a degree $n$ polynomial. It has $n$ distinct roots within (-1,1), $x_{n,k}, k=1,2,\cdots n$, $x_{n,1} < x_{n,2} < \cdots < x_{n,n}$. There is no analytical solution for high order Legendre polynomials. It is useful to know the analytical bounds of the roots.

According to Bruns, Markoff, Stieltjes and Szego [1], the roots satisfy the following inequality, \[
\cos\frac{(n-k+1)\pi}{n+1} < x_{n,k} < \cos\frac{(n-k+\frac{3}{4})\pi}{n+\frac{1}{2}}
 \] for $k = 1, \cdots \left[\frac{n}{2}\right]$. For the other half, recall $x_{n,n-k+1} = -x_{n,k}$.

It may also be useful to give a pair of global bounds, \[
\cos\frac{(n-k+\frac{1}{2})\pi}{n+\frac{1}{2}} < x_{n,k} < \cos\frac{(n-k+1)\pi}{n+\frac{1}{2}}
\] for $k=1,2,\cdots n$, although this is a coarse one.

The roots of the Legendre polynomials also admit asymptotic expansion due to Francesco Tricomi [2]. Let $\theta_{n,k} = \frac{n-k+3/4}{n+1/2}\pi$. Then the $k$-th root (in ascent order) is \[
x_{n,k} = \left\{1 - \frac{1}{8n^2} + \frac{1}{8n^3} - \frac{1}{384n^4}\left( 39 - \frac{28}{\sin^2\theta_{n,k}} \right) + \mathcal{O}(n^{-5})\right\} \cos\theta_{n,k}, \] which can be improve by Newton's algorithm $x_{\nu+1} = x_{\nu} - \frac{1-x_\nu^2}{n} \frac{P_n(x_\nu)}{P_{n-1}(x_\nu) -x_\nu P_n(x_\nu)}$.

Another approximation due to Gatteschi and Pittaluga [3] is \[
x_{n,k} = \cos\left\{ \theta_{n,k} + \frac{1}{16(n+1/2)^2}(\cot\frac{1}{2}\theta_{n,k} - \tan\frac{1}{2}\theta_{n,k}) \right\} + \mathcal{O}(n^{-4}).
\]

Let $x_{n,k}$ be the $k$-th root of $P_n(x)$, $\xi_{n,k}$ be its approximations.

The $x_{n,k} \simeq \cos\theta_{n,k}$ is a pretty good estimator of the location $k$ of Gaussian nodes from $x_{n,k}$.


Last but not least, the zeros are the nodes of the Gaussian quadrature. \[
 \int_{-1}^{+1} \mathrm{d}x f(x) \simeq \sum_{k=1}^n w_{n,k} f(x_{n,k})
\] where $w_{n,k} = \frac{2}{(1-x_{n,k}^2) \left[ P'_n(x_{n,k}) \right]^2}$ are called the Gaussian weights. Gaussian quadrature can be illustrated by the bellow figure ($n = 16$).

References:

[1] Gabriel Szego, Inequalities for the Zeros of Legendre Polynomials and Related Functions, Transactions of the American Mathematical Society, Vol. 39, No. 1 (1936)

[2] F. G. Tricomi, Sugli zeri dei polinomi sferici ed ultrasferici, Ann. Mat. Pura Appl., 31 (1950), pp. 93–97; F.G. Lether and P.R. Wenston, Minimax approximations to the zeros of $P_n(x)$ and Gauss-Legendre quadrature, Journal of Computational and Applied Mathematics Volume 59, Issue 2, 19 May 1995, Pages 245–252

[3] L. Gatteschi and G. Pittaluga, An asymptotic expansion for the zeros of Jacobi polynomials, in Mathematical Analysis. Teubner-Texte Math., 79 (1985), pp. 70–86

Source codes:

John Burkardt (GNU LGPL): http://people.sc.fsu.edu/~jburkardt/cpp_src/quadrule/quadrule.html

pomax.github.io: http://pomax.github.io/bezierinfo/legendre-gauss.html

rosettacode: http://rosettacode.org/wiki/Numerical_integration/Gauss-Legendre_Quadrature

Mathematica:
Needs["NumericalDifferentialEquationAnalysis`"]

GaussianQuadratureWeights[n, a, b, prec]