Loading [MathJax]/jax/output/HTML-CSS/jax.js

Nov 1, 2013

Nonlinear Least Square Regression

The nonlinear least square fit

The nonlinear least square fit (regression) minimizes the target action: S(θ)=iwi(yif(xi;θ))2 with positive weights wi0, for a given set of data {(x1,y1),(x2,y2),} and a form function f(x;θ)f(x;θ1,θ2,θp).

The minima lies in the extrema: 0=Sθr=2iwi(yif(xi;θ))f(xi;θ)θr.r=1,2,p

Gauss-Newton algorithm

The Newton's method for solving for a local root of a function φ(x) is, xn+1=xnφ(xn)φ(xn) In this problem, φ(θ)=S(θ). Therefore, the Newton's method can be realized as θ(k+1)=θ(k)(HS(θ(k)))1S(θ(k)) where [HS(θ)]rs=2Sθrθs is the Hesse matrix (Jacob matrix) of S. It is easy to show [HS]rs=2iwifiθrfiθs2iwi(yif(xi))2fiθrθs Ignore the second term, as the residue and the second derivative should be small. Then the Gauss-Newton algorithm gives the iteration procedure: θ(k+1)=θ(k)+JTW(yf(x;θ(k)))JTWJ

Levenberg-Marquardt algorithm

Levenberg-Marquardt algorithm is one of the most important algorithm based on the Gauss-Newton algorithm. Let θr,r=1,2,p be the fitting parameters such as a,b,c. Define the Jacobian Ji,r=f(xi;θ)θr Define Wij=wiδij. The iterative procedure is: θ(k+1)=θ(k)+αJTW(yf(x;θ(k)))λΛ+JTWJ where Λ is a diagonal matrix, either chosen as the identity matrix or the diagonal term of JTWJ, 0<α<1 is a small positive number.

Therefore, the solution ˉθ satisfies equation JTˉθW(yf(x;ˉθ))=0 or iwif(xi;ˉθ)θr(yif(xi;ˉθ))=0 The choice of λ is very important. Nonzero λ can avoid local minimum but large λ is likely to slow down the convergence. A useful strategy is to to choose a nonzero λ to start with and decrease λ as the iteration converges.

Parameter Error Estimation

It is important to attach estimated error bars/bands to the fitted function (or the fitted parameters). Let's write the data as yi=f(xi)+ξi, where ξi are the residuals. If we assume ξi,i=1,2,3,n are n independent random variables in Normal distribution. The likelihood of the fitted parameters can be written as L(θr,σ)=Nσexp[12σ2iwi(yif(xi;θr))2]Nσexp[12σ2rs(θrˉθr)Ars(θsˉθs)] where Ars=iwif(xi;ˉθ)θrf(xi;ˉθ)θs=[JTWJ]rs  and Nσ,Nσ are normalization factors.

The variance σ2 can be estimated by the estimator ˆσ2=1npiwi(yif(xi;ˉθ))2=S(ˉθ)np. The covariance matrix of the parameters is, Cov(θ)=ˆσ2A1 It's also useful to find the prediction band. The width of the band is the variance of function f(x;θ). The "Delta Method" gives an approximative way to calculate Var[f(x;θ)] from Cov[θ]. σf(x)=Var[f(x;ˉθ)]=fT(x;ˉθ)θCov(θ)f(x;ˉθ)θ=ˆσ2r,sf(x;ˉθ)θr[A1]rsf(x;ˉθ)θs
Finally,the distribution of the residuals can be test by the standard χ2-test.

Example: a power-law fitting

In our example, we will explore fitting some data to the power-law function f(x)=a+b/xc.


Fig. 1: (Color Lines) Black, the theoretical curve; Red, fit with uniform weights; Green fit with weights proportional to x



The corresponding covariance matrices for the parameters a,b,c.


The diagonal terms gives the variance of the parameters.


No comments:

Post a Comment