A generic linear function $f : \mathbb{R}^n \to \mathbb{R}^m$ satisfies two properties:

  • Additivity: $f(x + y) = f(x) + f(y), \forall x, y \in \mathbb{R}^n$
  • Homogeneity: $f(c x) = c \cdot f(x), \forall x \in \mathbb{R}^n \forall c \in \mathbb{R}$1

It’s well known that matrix multiplication is a linear function. If $A \in \mathbb{R}^{m \times n}$ and $v, w \in \mathbb{R}^n, c \in \mathbb{R}$, both $A(v + w) = Av + Aw$ (linearity) and $A(cv) = c(Av)$ (homogeneity). Less well known is that the converse is also true: any linear function $f : \mathbb{R}^n \to \mathbb{R}^m$ can be written as a matrix multiplication $f(x) = Ax$ for some $A \in \mathbb{R}^{m \times n}$! In fact, that follows directly from the definition of a linear function. Here’s why:

Consider the standard basis for $\mathbb{R}^n$: $e_1, \ldots, e_n$, where $e_i$ has a $1$ in position $i$ and $0$s elsewhere. Any vector $x$ in $\mathbb{R}^n$ can be written as a linear combination of these basis vectors, say as $x = x_1 e_1 + \cdots + x_n e_n$. Now, because $f$ is linear, we can distribute $f$ across this sum: $f(x_1 e_1 + \cdots + x_n e_n) = x_1 f_1(e_1) + \cdots + x_n f(e_n)$. Here’s the key step: we can construct a matrix $A$ where the $i$th column is precisely $f(e_i)$:

\[A = \left[ f(e_1) \; \ldots \; f(e_n) \right].\]

When we multiply $A$ by $x$, we get

\[Ax = x_1 f(e_1) + \cdots + x_n f(e_n) = f(x).\]

Because the columns of $A$ track where the basis vectors land, the matrix captures the behavior of $f$ on every possible input.

A particularly elegant application of this idea is in taking derivatives of polynomials. First, note that differentiating polynomials is a linear function:

  • Additivity: for any polynomials $f(x)$ and $g(x)$, $\frac{d}{dx}[f(x) + g(x)] = \frac{d}{dx}f(x) + \frac{d}{dx}g(x)$ (linearity).
  • Homogeneity: for any polynomial $f(x)$ and any scalar $c$, $\frac{d}{dx} [c \cdot f(x)] = c \cdot \frac{d}{dx} f(x)$.

So. Is there a way to represent the derivative operator as a matrix? Turns out, yes.

For a polynomial of degree $n$, the standard basis is $\left\lbrace 1, x, \ldots, x^n \right\rbrace$. So, a polynomial of degree 3 $p(x) = a_0+a_1x + a_2x^2 + a_3x^3$ can be represented as the vector

\[p = \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix}.\]

The derivative operator maps each basis element as

  • $D(1) = 0$,
  • $D(x) = 1$,
  • $D(x^2) = 2x$, and
  • $D(x^3) = 3x^2$.

(This is trivial from $\frac{d}{dx} x^n = nx^{n-1}$.)

And sure enough, we can represent the derivative operator as a matrix:

\[D = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 0 \end{bmatrix}\]

Let’s apply this matrix to a specific polynomial, say $p(x) = 3 + 2x - x^2 + 4x^3$. Using just the rules of differentiation, we know that $p’(x) = 2 - 2x + 12x^2$.

We get the same result the matrix multiplication way:

\[\begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \\ -1 \\ 4 \end{bmatrix} = \begin{bmatrix} 2 \\ -2 \\ 12 \\ 0 \end{bmatrix}\]

The vector on the right can be read off as the polynomial $2 - 2x + 12x^2$.

  1. A good example of a nonlinear function is the affine $f(x) = x + b, b \neq 0$. It violates both additivity (since $f(x + y) = (x + y) + b = x + y + b$ but $f(x) + f(y) = (x + b) + (y + b) = x + y + 2b$) and homogeneity (since $f(cx) = cx + b$ but $c \cdot f(x) = c(x + b) = cx + cb$). Other examples are squaring, since $f(x + y) = x^2 + y^2 + 2xy$ is not $f(x) + f(y) = x^2 + y^2$ and $f(cx) = c^2 x^2$ is not $c \cdot f(x) = c \cdot x^2$, and ReLU, which violates additivity when inputs have different signs as in $f(-1 + 2) = f(1) = 1$ but $f(-1) + f(2) = 0 + 2 = 2$. It’s fun that all of these are obviously nonlinear (their plot is not a straight line passing through the origin!) but it’s not obvious why they fail the formal definition of linearity (additivity plus homogeneity). Would you have guessed which of the two conditions ReLU fails, for instance?