Power method for obtaining the top eigenvector

The power method (also known as power iteration) is a simple algorithm for obtaining the “top” eigenvector and eigenvalue of a matrix.

Let $A$ be an $n \times n$ matrix with real-valued entries. Assume that:

1. $A$ is diagonalizable, or equivalently, $A$ has $n$ linearly independent eigenvectors $v_1, \dots, v_n$. (From here on, I assume that the $v_i$‘s have unit norm.)
2. We can order the eigenvectors such that the eigenvalues associated with them (i.e. $\lambda_i$ such that $Av_i = \lambda_i v_i$) satisfy the ordering $|\lambda_1 | > |\lambda_2| \geq \dots \geq |\lambda_n|$. What is important here is the strict inequality between $|\lambda_1 |$ and $|\lambda_2|$.

The power method can be described as follows:

1. Initialize $b_0 \in \mathbb{R}^n$ to be some non-zero vector.
2. For each $t = 0, 1, \dots$, set $b_{k+1} = \dfrac{Ab_k}{\| Ab_k \|}$, and set $\mu_{k+1} = b_{k+1}^T A b_{k+1}$.

If $A$ satisfies the two assumptions above and $b_0$ has a non-zero component in the direction of $v_1$, then $b_k$ converges to $v_1$, and $\mu_k$ converges to $\lambda_1$. (This reference points out that if $\lambda_1$ is negative, $\{ b_k \}$ might end up “flip-flopping” between $v_1$ and $-v_1$. One way to avoid this is to always insist on $b_k$ having a non-negative first entry.)

The proof is pretty simple and can be found on the first page of the second reference. Section 6.2 of the third reference has some practical tips for how to use the power method in practice.

Can we say something about the convergence rate of the power method? In essence, the convergence rate depends on how close $|\lambda_1|$ is to $|\lambda_2|$: the larger $\left| \dfrac{\lambda_1}{\lambda_2} \right|$ is, the faster the power method converges. Theorem 5.6 of the last reference makes this precise:

Write our initialization $b_0$ as a linear combination of the eigenvectors: $b_0 = \sum_{i=1}^n \alpha_i v_i$. Assume that $\alpha_1 \neq 0$. Then there exists a constant $C > 0$ such that \begin{aligned} \| b_k - v_1 \|_2 \leq C \left| \frac{\lambda_2}{\lambda_1} \right|^k \end{aligned} for all $k \geq 1$.

What if we are interested in eigenvalues/eigenvectors other than the top ones? We can use the power method to find $\lambda_1$ and $v_1$. It is then easy to show that the matrix $A' = A - \lambda_1 v_1 v_1^T$

has eigenvalues $0, \lambda_2, \dots, \lambda_n$ and the same eigenvectors as $A$. We can then apply the power method to $A'$ to get $\lambda_2$ and $v_2$, and so on. This known as the power method with deflation. It should be noted that the deflation method becomes more inaccurate as we compute smaller and smaller eigenvalues/eigenvectors, because the errors of approximation for the earlier eigenvalues/eigenvectors add up.

References: