Why Gauss-Jordan inverts matrices

I learned in my intro linear algebra class, taken in freshman fall, that Gauss-Jordan elimination inverts¹ matrices. Embarrassingly, until today, I didn’t know why Gauss-Jordan worked. It turns out that the reason it works is simple — in that, after a first course in linalg, you could’ve invented Gauss-Jordan elimination given infinite intelligence.

First, let’s review what Gauss-Jordan elimination is. I’ll rip off Wikipedia’s example. Say we want to invert the following matrix:

\[A = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & -2 \end{bmatrix}.\]

We “augment” it with the identity matrix like so:

\[\left[ \begin{array}{ccc|ccc} 2 & -1 & 0 & 1 & 0 & 0 \\ -1 & 2 & -1 & 0 & 1 & 0 \\ 0 & -1 & -2 & 0 & 0 & 1 \end{array} \right].\]

And then, we keep performing “elementary row operations” on the original matrix and the identity matrix, specifically

swapping rows,
scaling every element in a row by a constant, or
adding a multiple of a row to another row,

until we get the identity matrix on the left side:

\[\left[ \begin{array}{ccc|ccc} 1 & 0 & 0 & \frac{3}{4} & \frac{1}{2} & \frac{1}{4} \\ 0 & 1 & 0 & \frac{1}{2} & 1 & \frac{1}{2} \\ 0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} & \frac{3}{4} \end{array} \right].\]

The matrix on the right is the inverse of the matrix we started with:

\[A^{-1} = \begin{bmatrix} \frac{3}{4} & \frac{1}{2} & \frac{1}{4} \\ \frac{1}{2} & 1 & \frac{1}{2} \\ \frac{1}{4} & \frac{1}{2} & \frac{3}{4} \end{bmatrix}.\]

That’s Gauss-Jordan elimination.

To understand why this algorithm works, we first need to realize that the elementary row operations I mentioned above are matrix multiplications in disguise. (It’ll be a fun exercise to find the matrices for each operation before I spoil the answers below!)

Let’s start with this matrix:

\[\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}.\]

Say we want to swap the first and second rows, like so:

\[\begin{bmatrix} d & e & f \\ a & b & c \\ g & h & i \end{bmatrix}.\]

That’s just left-multiplication with a distorted-looking identity matrix!

\[\begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} = \begin{bmatrix} d & e & f \\ a & b & c \\ g & h & i \end{bmatrix}.\]

Say we want to scale the third row by a constant $k$:

\[\begin{bmatrix} a & b & c \\ d & e & f \\ kg & kh & ki \end{bmatrix}.\]

That’s left-multiplication with an almost-identity matrix:

\[\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & k \end{bmatrix} \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ kg & kh & ki \end{bmatrix}.\]

Say we want to add $l$ times the second row to the third row:

\[\begin{bmatrix} a & b & c \\ d & e & f \\ g + ld & h + le & i + lf \end{bmatrix}.\]

That’s a left-multiplication too:

\[\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & l & 1 \end{bmatrix} \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ g + ld & h + le & i + lf \end{bmatrix}.\]

Now for the punchline: why Gauss-Jordan works. Remember we have this augmented matrix:

\[\left[ A \mid I \right].\]

We take it through a series of elementary row operations (which are really left-multiplications with a series of matrices $E_1$, $E_2$, and so on up to $E_n$):

\[\left[E_n \cdots E_1 A \mid E_n \cdots E_1 I\right]\]

to get this:

\[\left[ I \mid (E_n \cdots E_1)I \right].\]

The point is this: if the combined effect of the row operations is to take $A$ to $I$ on the left side, then $E_n \cdots E_1$ must be $A^{-1}$.

\[\begin{aligned} \left[E_n \cdots E_1 A \mid E_n \cdots E_1 I\right] &= \left[ A^{-1}A \mid A^{-1}I \right] \\ &= \left[ I \mid A^{-1} \right] \end{aligned}.\]

Since the identity matrix on the right undergoes the same operations as $A$ does on the left, it ends up as just $A^{-1}$.

The geometric intuition is that inversion undoes the transformation the matrix represents. Another intuition is that if a linear transformation is a communication channel that encodes $x$ as $y$, the inverse is a decoder that recovers $x$ from $y$. ↩