Nov 11, 2013

the graph of the international trade relations

Mathematica has provided excellent database of the countries in the function CountryData. I am particularly interested in the trade relations. CountryData[country, "ImportPartners"] and CountryData[country, "ExportPartners"] produce the leading import and export partners of the country. We can construct a graph based on the global trade relations.

trade = 
  Tally[Sort /@ 
              "ImportPartners"] /. _Missing -> Null, 
            CountryData[#, "ExportPartners"] /. _Missing -> Null}], 
          MemberQ[CountryData[], #] &]}] & /@ CountryData[], 1]];

tradegraph = 
   trade[[All, 1, 1]] \[UndirectedEdge] 
    trade[[All, 1, 2]]], 
  EdgeWeight -> trade[[All, 2]]]

Fig. 1, the graph of the international trade relations.

We can make it more accessible by coloring the countries.

by continent:

    trade[[All, 1, 1]] \[UndirectedEdge] 
     trade[[All, 1, 2]]], 
   EdgeWeight -> trade[[All, 2]],
   VertexShapeFunction -> ({ColorData[3, 
           CountryData[#2, "Continent"] ]], Opacity[.8], 
       EdgeForm[{Thin, LightGray}], Disk[#, .08], Darker@Red, 
       Text[Style[CountryData[#2, "CountryCode"], Bold], #1] } &), 
   ImageSize -> 800],
   MapIndexed[{ColorData[3, First@#2], 
         Riffle[StringCases[#1, RegularExpression["[A-Z][a-z]+"]], 
          " "]], Bold, 22], {40, First@#2*60}]} &, 
   ImageSize -> {Automatic, 300}]

Fig. 2,  (Colored by Continent) the graph of the international trade relations.

by trade values, 

Fig. 3,  (Colored by Continent) the graph of the international trade relations.

Fig. 4, colored by the import and export amount separated.

We can explore some further graph structures inside the international trade relations. What we focus here is the the graph communities. The communities characterize the clustering property of a graph. Apply the Mathematica function FindGraphCommunities[], we obtain several trade clusters. The leading 3 communities correspond to the three known trade zones: the Transatlantic Trade Zone including US. and the European Union; the Asia-Pacific Trade Zone including China, Japan, Korea Australia and the ASEANs; the South-America Trade Zone including Brazil, Argentina etc. Our conclusion is further justified if we color the countries inside the trade zone by their continent (Fig. 6).
Fig. 5, the graph communities of the international trade relations.
Fig. 6, trade communities and the geographic distribution. 

Nov 1, 2013

Nonlinear Least Square Regression

The nonlinear least square fit

The nonlinear least square fit (regression) minimizes the target action: \[ S(\theta) = \sum_i w_i (y_i - f(x_i; \theta))^2
\] with positive weights $w_i \ge 0$, for a given set of data $\{ (x_1, y_1), (x_2,y_2),\cdots \}$ and a form function $f(x; \theta) \equiv f(x;\theta_1,\theta_2,\cdots \theta_p)$.

The minima lies in the extrema: \[
0 = \frac{\partial S}{\partial \theta_r} = -2 \sum_i w_i (y_i - f(x_i;\theta))\frac{\partial f(x_i;\theta)}{\partial \theta_r}. \quad r = 1,2,\cdots p \]

Gauss-Newton algorithm

The Newton's method for solving for a local root of a function $\varphi(x)$ is, \[
x_{n+1} = x_n - \frac{\varphi'(x_n)}{\varphi(x_n)}
 \] In this problem, $\varphi(\theta) = S'(\theta)$. Therefore, the Newton's method can be realized as \[
\theta^{(k+1)} = \theta^{(k)} - (H S(\theta^{(k)}))^{-1} \nabla S(\theta^{(k)})
\] where $[H S(\theta)]_{rs} = {\partial^2 S}{\partial \theta_r \partial \theta_s}$ is the Hesse matrix (Jacob matrix) of $S$. It is easy to show \[
\left[ H S \right]_{rs} = 2 \sum_i w_i \frac{\partial f_i}{\partial\theta_r}\frac{\partial f_i}{\partial \theta_s} - 2 \sum_i w_i (y_i - f(x_i)) \frac{\partial^2 f_i}{\partial\theta_r\partial\theta_s}
\] Ignore the second term, as the residue and the second derivative should be small. Then the Gauss-Newton algorithm gives the iteration procedure: \[
\theta^{(k+1)} = \theta^{(k)} + \frac{\mathbf{J}^T\cdot \mathbf{W}\cdot (\mathbf{y}-f(\mathbf{x};\theta^{(k)}) )}{\mathbf{J}^T\cdot\mathbf{W}\cdot\mathbf{J}}

Levenberg-Marquardt algorithm

Levenberg-Marquardt algorithm is one of the most important algorithm based on the Gauss-Newton algorithm. Let $\theta_r, r = 1,2,\cdots p$ be the fitting parameters such as $a,b,c$. Define the Jacobian \[
J_{i,r} = \frac{\partial f(x_i;\theta)}{\partial \theta_r}
\] Define $ W_{ij} = w_i \delta_{ij}$. The iterative procedure is: \[
\mathbf{\theta}^{(k+1)} = \mathbf{\theta}^{(k)} + \alpha \frac{\mathbf{J}^T \cdot \mathbf{W} \cdot (\mathbf{y} - f(\mathbf{x};\mathbf{\theta}^{(k)}))}{\lambda \mathbf{\Lambda} + \mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J}}
\] where $\mathbf{\Lambda}$ is a diagonal matrix, either chosen as the identity matrix or the diagonal term of $\mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J}$, $0<\alpha<1$ is a small positive number.

Therefore, the solution $\bar\theta$ satisfies equation $\mathbf{J}^T_{\bar\theta} \cdot \mathbf{W} \cdot (\mathbf{y} - f(\mathbf{x};\bar{\mathbf{\theta}})) = 0$ or \[
\sum_i w_i  \frac{\partial f(x_i; \bar\theta)}{\partial \theta_r} (y_i - f(x_i; \bar\theta)) = 0
\] The choice of $\lambda$ is very important. Nonzero $\lambda$ can avoid local minimum but large $\lambda$ is likely to slow down the convergence. A useful strategy is to to choose a nonzero $\lambda$ to start with and decrease $\lambda$ as the iteration converges.

Parameter Error Estimation

It is important to attach estimated error bars/bands to the fitted function (or the fitted parameters). Let's write the data as $ y_i = f(x_i) + \xi_i$, where $\xi_i$ are the residuals. If we assume $\xi_i, i=1,2,3,\cdots n$ are $n$ independent random variables in Normal distribution. The likelihood of the fitted parameters can be written as \[
L(\theta_r, \sigma)
= N_\sigma \exp\left[ -\frac{1}{2\sigma^2}\sum_i w_i (y_i-f(x_i;\theta_r))^2\right] \\
\simeq N'_\sigma \exp\left[-\frac{1}{2\sigma^2} \sum_{rs} (\theta_r-\bar\theta_r) A_{rs} (\theta_s-\bar\theta_s) \right]
\] where $A_{rs} = \sum_i w_i  \frac{\partial f(x_i;\bar\theta)}{\partial\theta_r} \frac{\partial f(x_i;\bar\theta)}{\partial\theta_s} = \left[ \mathbf{J}^T \cdot \mathbf{W} \cdot \mathbf{J} \right]_{rs}
$  and $N_\sigma, N'_\sigma$ are normalization factors.

The variance $\sigma^2$ can be estimated by the estimator \[
\hat{\sigma}^2 = \frac{1}{n-p}\sum_i w_i (y_i - f(x_i; \bar\theta))^2 = \frac{S({\bar\theta})}{n-p}.
 \] The covariance matrix of the parameters is, \[ \text{Cov}(\theta) = \hat\sigma^2 A^{-1}
\] It's also useful to find the prediction band. The width of the band is the variance of function $f(x; \theta)$. The "Delta Method" gives an approximative way to calculate $\text{Var}\left[ f(x; \theta) \right]$ from $\text{Cov}[ \theta ]$. \[
\sigma_f(x) = \text{Var}\left[ f(x; \bar\theta) \right] = \frac{\partial f^T(x;\bar\theta)}{\partial\mathbf{\theta}}\cdot\text{Cov}( \mathbf\theta) \cdot \frac{\partial f(x;\bar\theta)}{\partial\mathbf{\theta}} \\
 = \hat\sigma^2 \sum_{r,s} \frac{\partial f(x;\bar\theta)}{\partial\theta_r}\left[ A^{-1} \right]_{rs}\frac{\partial f(x;\bar\theta)}{\partial\theta_s}
Finally,the distribution of the residuals can be test by the standard $\chi^2$-test.

Example: a power-law fitting

In our example, we will explore fitting some data to the power-law function $f(x) = a + b/x^c $.

Fig. 1: (Color Lines) Black, the theoretical curve; Red, fit with uniform weights; Green fit with weights proportional to $x$

The corresponding covariance matrices for the parameters $a,b,c$.

The diagonal terms gives the variance of the parameters.