The nonlinear least square fit
The nonlinear least square fit (regression) minimizes the target action: S(θ)=∑iwi(yi−f(xi;θ))2 with positive weights wi≥0, for a given set of data {(x1,y1),(x2,y2),⋯} and a form function f(x;θ)≡f(x;θ1,θ2,⋯θp).
The minima lies in the extrema: 0=∂S∂θr=−2∑iwi(yi−f(xi;θ))∂f(xi;θ)∂θr.r=1,2,⋯p
Therefore, the solution ˉθ satisfies equation JTˉθ⋅W⋅(y−f(x;ˉθ))=0 or ∑iwi∂f(xi;ˉθ)∂θr(yi−f(xi;ˉθ))=0 The choice of λ is very important. Nonzero λ can avoid local minimum but large λ is likely to slow down the convergence. A useful strategy is to to choose a nonzero λ to start with and decrease λ as the iteration converges.
The minima lies in the extrema: 0=∂S∂θr=−2∑iwi(yi−f(xi;θ))∂f(xi;θ)∂θr.r=1,2,⋯p
Gauss-Newton algorithm
The Newton's method for solving for a local root of a function φ(x) is, xn+1=xn−φ′(xn)φ(xn) In this problem, φ(θ)=S′(θ). Therefore, the Newton's method can be realized as θ(k+1)=θ(k)−(HS(θ(k)))−1∇S(θ(k)) where [HS(θ)]rs=∂2S∂θr∂θs is the Hesse matrix (Jacob matrix) of S. It is easy to show [HS]rs=2∑iwi∂fi∂θr∂fi∂θs−2∑iwi(yi−f(xi))∂2fi∂θr∂θs Ignore the second term, as the residue and the second derivative should be small. Then the Gauss-Newton algorithm gives the iteration procedure: θ(k+1)=θ(k)+JT⋅W⋅(y−f(x;θ(k)))JT⋅W⋅JLevenberg-Marquardt algorithm
Levenberg-Marquardt algorithm is one of the most important algorithm based on the Gauss-Newton algorithm. Let θr,r=1,2,⋯p be the fitting parameters such as a,b,c. Define the Jacobian Ji,r=∂f(xi;θ)∂θr Define Wij=wiδij. The iterative procedure is: θ(k+1)=θ(k)+αJT⋅W⋅(y−f(x;θ(k)))λΛ+JT⋅W⋅J where Λ is a diagonal matrix, either chosen as the identity matrix or the diagonal term of JT⋅W⋅J, 0<α<1 is a small positive number.Therefore, the solution ˉθ satisfies equation JTˉθ⋅W⋅(y−f(x;ˉθ))=0 or ∑iwi∂f(xi;ˉθ)∂θr(yi−f(xi;ˉθ))=0 The choice of λ is very important. Nonzero λ can avoid local minimum but large λ is likely to slow down the convergence. A useful strategy is to to choose a nonzero λ to start with and decrease λ as the iteration converges.
Parameter Error Estimation
It is important to attach estimated error bars/bands to the fitted function (or the fitted parameters). Let's write the data as yi=f(xi)+ξi, where ξi are the residuals. If we assume ξi,i=1,2,3,⋯n are n independent random variables in Normal distribution. The likelihood of the fitted parameters can be written as L(θr,σ)=Nσexp[−12σ2∑iwi(yi−f(xi;θr))2]≃N′σexp[−12σ2∑rs(θr−ˉθr)Ars(θs−ˉθs)] where Ars=∑iwi∂f(xi;ˉθ)∂θr∂f(xi;ˉθ)∂θs=[JT⋅W⋅J]rs and Nσ,N′σ are normalization factors.
The variance σ2 can be estimated by the estimator ˆσ2=1n−p∑iwi(yi−f(xi;ˉθ))2=S(ˉθ)n−p. The covariance matrix of the parameters is, Cov(θ)=ˆσ2A−1 It's also useful to find the prediction band. The width of the band is the variance of function f(x;θ). The "Delta Method" gives an approximative way to calculate Var[f(x;θ)] from Cov[θ]. σf(x)=Var[f(x;ˉθ)]=∂fT(x;ˉθ)∂θ⋅Cov(θ)⋅∂f(x;ˉθ)∂θ=ˆσ2∑r,s∂f(x;ˉθ)∂θr[A−1]rs∂f(x;ˉθ)∂θs
Finally,the distribution of the residuals can be test by the standard χ2-test.
The variance σ2 can be estimated by the estimator ˆσ2=1n−p∑iwi(yi−f(xi;ˉθ))2=S(ˉθ)n−p. The covariance matrix of the parameters is, Cov(θ)=ˆσ2A−1 It's also useful to find the prediction band. The width of the band is the variance of function f(x;θ). The "Delta Method" gives an approximative way to calculate Var[f(x;θ)] from Cov[θ]. σf(x)=Var[f(x;ˉθ)]=∂fT(x;ˉθ)∂θ⋅Cov(θ)⋅∂f(x;ˉθ)∂θ=ˆσ2∑r,s∂f(x;ˉθ)∂θr[A−1]rs∂f(x;ˉθ)∂θs
Finally,the distribution of the residuals can be test by the standard χ2-test.
Example: a power-law fitting
In our example, we will explore fitting some data to the power-law function f(x)=a+b/xc.
No comments:
Post a Comment