What makes a mathematical concept “good”

Today, I’ll discuss a topic that I believe is very important both for teaching and explaining but also for understanding math: what makes some mathematical concepts “good” and “better” than others. A bit abstract and dry don’t you think? Here’s a probability example: what justifies the definition of the variance ( \text{var} = E[ (x-\mu)^2 ]) and the standard deviation (\sigma = \sqrt{\text{var}} ) ? If a student asked you that question, would you be able to give him a good answer?

I believe that the quality of a mathematical concept needs to be measured along two different axes:

  • First of all, our mathematical concept needs to capture some intuitions that are relevant for the problem at hand.
  • Second, our concept needs to be mathematically convenient: it needs to just click in whatever mathematical setting we are working in

The conjunction of those two is very important because it’s often possible to capture the same intuition with many different concepts, but most of them won’t be mathematically convenient. There is something that I find really mysterious in math which is that, in order to prove something, there really aren’t many ways which “flow”.

Note that it sometimes happen that we sometimes want to capture the same intuition with different concepts because a concept might only work well in some small set of circumstances. Outside of those, maybe a different concurrent concept is a better fit.

Also note that the first criterion can be a bit blurry. As you learn higher and higher level math, you also learn higher and higher level concepts. For these, the intuition they capture can be something that is mostly mathematical. For these, it might be a bit harder to distinguish my two axes. Maybe the best way to separate them is to think of them as a long term goal (first axis) that we care about reaching, and convenient properties (second axis) that make the end term goal easy to manipulate.

This was way too dry. Let’s go back to examples

First example: the variance

Let’s try to answer our hypothetical student’s question: why do we define variance and standard-deviation the way we do. We’ll answer along our two axes.

First: what intuition do the variance and std capture? This is one is easy enough. They capture the spread of a random variable. It sounds almost tautological but: if a random variable has a bigger variance, it means that it is more variable than another with a smaller variance. The standard deviation measures (very roughly) the width of where the random variable can be around its mean.

At this point, our hypothetical student interjects: there are many other measures which would capture roughly “the width of where the random variable can be around its mean”. For example, the L1 deviation: \delta = E( |x-\mu| ). Why don’t we use this one instead?

The reason why \delta isn’t great is because it isn’t mathematically convenient. The key property that the variance offers is that it sums between independent variables. Sums of random variables are ubiquitous so this is a really important property. There are plenty of other ways in which the variance is a convenient concept, but I believe this to be the key one (though if you have more, please tell me). The standard deviation inherits the mathematical convenience of the variance so it’s a better way of measuring the width of a random variable.

We can now give a short answer to our student. We define variance this way because it captures something we care about: a measure of the spread of a random variable, and it has good mathematical properties, a key one being that it sums between independent variables.

A context in which variance isn’t the good concept

Variance is not always the best concept of random variable width though, which is why we don’t always want to work with it. For example, when we use Bayes formula, we are not working with sums of random variables but with products of density functions. The concept of variance still captures the right intuition but stops being so mathematically convenient. One concept that does work for products of density functions is the Fisher information but I won’t describe it here because that would be too long.

Why it matters

I believe it’s quite important to keep in mind what makes a good mathematical concept. First of all, when learning about a new subject, it’s really important to learn its structure. If for every mathematical concept you encounter you can identify what intuitions it captures and the various ways and contexts in which it’s mathematically convenient, that structures your thoughts and it helps learn easier and better. Math is all about structure, and making sure your understanding of math is well-structured is very important.

The converse of that is that it’s also very important when teaching to keep in mind why concepts are good. By structuring your presentations and communicating clearly what makes the concepts you use good, you help your audience to understand the subject and everybody benefits.

Finally, it’s very important to keep in mind what makes a good mathematical concept good when doing research. Sometimes the solution to a problem can’t be reached from the currently used concepts of a field. These problems require developing new concepts and having a good idea of what one should be looking for is quite important there.


A good concept captures something we care about while being convenient to work with. Keep this in mind when doing research, and when learning and teaching math: being explicit with structure can only help you.

As always, feel free to correct any inaccuracies, errors, spelling mistakes and to send comments on my email ! I’ll be glad to hear from you