In math, ideas as they are formally defined are often distinct from how we interpret them. A great example is the notion of linear independence. A set of vectors $\{ v_1, \ldots, v_n \}$ is defined as independent if

\[a_1 v_1 + \cdots + a_n v_n = 0\]

has only the trivial solution $a_1 = \cdots = a_n = 0$.

But this is almost never how to think about linear independence. Instead, we usually think of linear independence in one of two ways:

  • The coefficients of $a_1 v_1 + \cdots + a_n v_n$ are uniquely determined, i.e., no other combination of coefficients will get you exactly that vector. If $a_1 v_1 + \cdots + a_n v_n = b_1 v_1 + \cdots + b_n v_n$, then $a_1 = b_1, \ldots, a_n = b_n$.
  • No vector $v_i$ can be expressed as a linear combination of the other vectors $v_1, \ldots, v_{i - 1}, v_{i + 1}, \ldots, v_n$.

We can show (and textbooks do) that these three ways of thinking about independence are equivalent.

So, if we rarely think of independence using the formal definition while solving problems, why do we still define it that way? It’s because showing that the formal condition holds for a set of vectors is generally easier than showing that the more intuitive conditions hold.

Another example: a basis is defined as a set of vectors that’s both independent and spanning. But it’s almost always more helpful to think of it as the minimal set of vectors that will span a space. Why do we use the formal definition? Again, easier to prove for a given set of vectors.

But this also makes it dangerous to introduce students to independence and the basis using the formal definitions. A first course should have what Terry Tao calls the pre-rigorous elements: the “examples, fuzzy notions, and hand-waving”.1 At any step in a formal proof, there’s a vast tree of possible next steps. You can cut it down to size only with intuitive mental models. For example, I recently convinced myself why the rank-nullity theorem makes intuitive sense (and no, the “nullspace vectors get crushed down” intuition which emphasizes how a mapping preserves “information” between spaces does not work for me). Coming up with my informal “proof” would’ve been impossible if I knew independence only by its formal definition.

Closely related to this is the idea of motivating definitions well:

Mathematicians pride themselves on writing proofs of propositions in an elegant way, but frequently (maybe even usually?) neglect to formally write motivations of definitions with the same elegance, efficiency, and sometimes beauty, and neglect to assign exercises in which the student is challenged to prove that a definition is the only (sometimes up to some logically equivalent formulations) one that satisfies some desiderata. Often a textbook will just say dogmatically “A left and right conjugate sub-hypopotamus is a thing that blah blah blah …” As I have remarked elsewhere in this forum, it is usually considered licit to define the concept of “group” by saying a group is a set with a binary operation satisfying etc. etc. etc. and then go on to prove a zillion theorems in group theory, rather than showing at the outset how the concept developed from a variety of concrete examples involving transformations in geometry, bijections, matrices, etc. …

So, suppose that our customs in regard to writing motivations of definitions were like those of our customs in regard to writing proofs. We want them to be complete, correct, informative, satisfying to reasonable demands for justification, elegant, beautiful, comprehensible to the intended audience, and as simple as possible subject to the foregoing constraints. Particularly clever motivations would be published in things like the Monthly in the same way that novel or unusually nice new proofs of old theorems are now published. A particularly brilliant motivation for a new definition might be the whole of the topic of a paper in a research journal, maybe in rare cases winning a Fields medal.

On the need to invest in intuitive mental models, see Richard Feynman, quoted in John Wentworth’s great post “What Are You Tracking In Your Head?”:

I had a scheme, which I still use today when somebody is explaining something that I’m trying to understand: I keep making up examples. For instance, the mathematicians would come in with a terrific theorem, and they’re all excited. As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball) – disjoint (two balls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on. Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say, ‘False!’

Relearning your field is a great way to pick up the intuitions you missed on your first pass. This seems especially true of linear algebra to me. It’s fractally complicated, and almost universally seems to require multiple courses or books to learn well.

To pick up (and remember) intuitions, it also helps to write down what you’ve done, especially in an expository way with your past self as the target audience. As Henrik Karlsson puts it:

As I type, I’m often in a fluid mode—writing at the speed of thought. I feel confident about what I’m saying. But as soon as I stop, the thoughts solidify, rigid on the page, and, as I read what I’ve written, I see cracks spreading through my ideas. What seemed right in my head fell to pieces on the page.

Seeing your ideas crumble can be a frustrating experience, but it is the point if you are writing to think. You want it to break. It is in the cracks the light shines in.

  1. Stephen Boyd’s linear dynamical systems lectures are particularly great in this regard. For example, he introduces matrix entries $A_{i, j}$ as gain factors from the $j$th input ($x_j$) to the $i$th output ($y_i$), and the nullspace $\mathcal{N}(A)$ as the ambiguity in $x$ from the sensor measurement $y = Ax$ (because if $x \in \mathcal{N}(A)$, then $z$ is undetectable from sensors since we get zero sensor readings for it, and $x$ and $x + z$ are indistinguishable from sensors since $A(x) = A(x+z)$). This interpretation of the nullspace implies that vectors mapping to zero are bad for measurement problems (since they cannot be unambiguously recovered) but great for design problems (since they afford you degrees of freedom which you can use to optimize your design against other considerations).