$\hbar = c = 1$ : QR algorithm as a sophistical power iteration

Jan 14, 2012

QR algorithm as a sophistical power iteration

The QR algorithm is one of the most popular diagonalization algorithms. In this post, we compare it with several variations of the power iteration. Throughout this post, we only consider positive-defined hermitian matrices. If a matrix is not positively defined, one can always apply a shift: $A \to A + s I, \quad s = |\min\{\lambda_i, i = 1, 2, \cdots \}| $.

QR algorithm

Recall the QR decomposition (QR factorization) matrix $A$ factorizes $A$ into the matrix product of a orthogonal matrix $Q$ and a upper triangle matrix $R$, i.e. $A = QR$. QR decomposition can be seen as the reorthogonalization of the column vectors of $A$. Namely, column vectors $a_i \quad (i = 1, \cdots, n)$ of $A = [a_1, a_2, \cdots, a_n]$ are orthogonalized to $Q = [q_1, q_2, \cdots, q_n]$ such that $a_i \in \text{span}(q_1, q_2, \cdots, q_i)$. Based on QR decomposition, QR algorithm to diagonalize a matrix $A$ reads:

$ A_0 = A $;
$ A_{k-1} \to Q_k R_k $ (QR decomposition);
$ A_k = R_k Q_k $.

Note that after procedures (2) and (3) $ A_k = Q^\dagger_k A_{k-1} Q_k $. Hence $A_k \simeq A_{k-1} \simeq A_{k-2} \simeq \cdots A_0 = A$. There is a theorem to guarantee that under certain conditions, the sequence converges to diagonal matrix. We won't discuss the theorem here.

Power iteration

Power iteration is fairly simple. Choose a random vector $v = \sum_i \alpha_i x_i$ where $x_i \quad (i = 1, 2, \cdots, n)$ is eigenvector of $A$. Then
\[
A^k v = \sum_i \alpha_i \lambda_i^k x_i.
\] Therefore, if $k$ is sufficiently large, $\lambda_m \equiv \max \{ \lambda_i, i = 1, 2, \cdots, n \}$ dominates, so $A^k v \to \lambda_m^k x_m$. In practice, the resulted vector each step is normalized.

$ v_0 = \text{ random unit vector }$;
$ v_k = A \cdot v_{k-1} $;
$ v_k \leftarrow v_k / \| v_k \|$.

There are two issues in the power iteration. First of all, if the largest eigen-pair (eigenvalue and eigenvector) is degenerate, the iterated vector can only converge to a vector of the eigen-space. Secondly, it can only generate the largest eigen-pair .

simultaneous power iteration

In order to overcome the first issue, more than one vector can be used simultaneously in iteration. The method is known as simultaneous power iteration. The algorithm can be described as following:

$ V_0 = [ v^{(0)}_1, v^{(0)}_2, \cdots, v^{(0)}_m ] \quad v^{(0)}_i \text{ is random vector }, i = 1, 2, \cdots, m; m \le n$;
$ V_k = A \cdot V_{k-1} $;
$ v^{(k)}_i \leftarrow v^{(k)}_i / \| v^{(k)}_i \| \quad i = 1, 2, \cdots m$.

The resulted vectors can be used to construct the degenerate eigen-space.

orthogonal power iteration

Simultaneous power iteration does not resolve the second issue. In principle, after obtaining the first eigenvector, it can be subtracted from the entire space. Then perform the power iteration for the rest subspace, the second largest eigen-pair can be obtained. However, this method is never practical. Because any numerical error may add the subtracted eigenvector back. In order to keep the calculated eigen-vector off, we can re-orthogonalize the vectors after each iteration. Notice that QR decomposition as described above can be used for the orthogonalization. This iteration scheme is known as orthogonal power iteration. The algorithm is sketched below:

$ V_0 $ is a $n \times n$ random matrix ;
$ V_{k-1} \to Q_k R_k $ ( QR decomposition) ;
$ V_k = A Q_k $.

Power iteration vs. QR algorithm

Now let's compare orthogonal power iteration and QR algorithm. In orthogonal power iteration, define $ A_k \equiv Q_k^\dagger V_k =Q_k^\dagger A Q_k$, $U_k \equiv Q_{k-1}^\dagger Q_k$. It can be seen

$U_k$ is orthogonal;
$A_{k-1} = Q_{k-1}^{\dagger} Q_k R_k = U_k R_k$;
$A_k = Q_k^\dagger Q_{k-1} A_{k-1} Q_{k-1}^\dagger Q_k = U_k^\dagger A_{k-1} U_k$.

In fact, $A_{k-1} = U_k R_k$ is a QR decomposition of $A_{k-1}$. That is to say, in orthogonal power iteration, $V_{k-1}$ is QR-factorized, whereas in QR algorithm, $A_{k-1}$ is QR-factorized. These two methods are equivalent. The operations on $A_k$ yields a sequence identical to those in QR algorithm, if we further set $Q_0 = I \implies V_0 = A$. The benefit of QR algorithm is that matrices $U_k$ and $A_k$ are exactly what we need.

This comparison shows that QR algorithm is actually equivalent to some variation of the power iteration. Therefore we could expect it has the similar performance (convergence rate, memory consumption etc.) with the power iteration.

$\hbar = c = 1$