The Median is asymptotically Gaussian

The other day, I came across a really basic fact, which I should have known a long time ago: the empirical median is approximately distributed as a Gaussian in the large-data limit ! Let me share the proof with you: hopefully this will make me remember both the proof and the theorem in the future.

Let’s start by stating the theorem. Let $X_i$ be $2n+1$ IID random variables, and let $\hat{m}$ be their empirical median (note that since we have an odd number of datapoints, the empirical is straightforward to find: just order the $X_i$ and find the value at index $n+1$ in the ordered list). The theorem states that, in the limit of a large dataset $n \rightarrow \infty$, the empirical median has a Gaussian distribution. Even better: we can know exactly what kind of random variable it is: it corresponds to a Beta distribution with parameters $(\alpha = n+1, \beta = n+1)$ deformed by the inverse of the CDF of the $X_i$:

$\displaystyle \widehat{m} = F^{-1} ( \beta )$

In order to prove this result, let’s first focus on the case in which the $X_i$ are uniform random variables. Let’s compute the probability $P(\widehat{m} = x)$. We have $2n+1$ possibilities for the index of the median. Thus, we can focus on the case in which $X_1$ is the median:

$\displaystyle p(\widehat{m} = x) = p(\widehat{m} = X_1 = x)$

We then need to worry about the other indexes. $n$ points need to be above $x$ and $n$ need to below. This gives $\binom{2n}{n}$ possible repartitions. We can once more focus on a single possibility: $X_2 \dots X_{n+1}$ are below $x$ and $X_{n+2} \dots X_{2n+1}$ are above.

$\displaystyle p(\widehat{m} = x) = (2n+1)\binom{2n}{n} p(\widehat{m} = X_1 = x \text{ AND } X_2 \dots X_{n+1} < x \text{ AND } X_{n+2} \dots X_{2n+1} > x)$

Finally, all the $X_i$ are IID independent random variables. The probability above is thus straightforward to compute:

$\displaystyle p(\widehat{m} = x) =(2n+1)\binom{2n}{n} p(X_1 = x) \prod_{i=2}^{n+1} p(X_i < x) \prod_{j=n+2}^{2n+1} p(X_j>x)$

$\displaystyle p(\widehat{m} = x) =(2n+1) \binom{2n}{n} (1-x)^n x^n$

$\displaystyle p(\widehat{m} = x) = \frac{(2n+1)!}{n! n!} (1-x)^n x^n$

which we recognize as a beta distribution with parameters $(\alpha = n+1, \beta = n+1)$.

Ok. Now we know what happens for uniform distributions. How can we extend that result to the general case? In order to do so, we have to remember that any random variable can be constructed from a uniform distribution using the inverse of its CDF. Thus, we can construct the $X_i$ as:

$\displaystyle X_i = F^{-1}(U_i)$

Furthermore, the function $F^{-1}$ is monotonous. Thus, the median of the $X_i$ is the image of the median of $U_i$. Since the median of the $U_i$ has a Beta distribution, this means that the median of the $X_i$ has a deformed Beta distribution:

$\displaystyle \hat{m} = F^{-1} ( \beta )$

You might notice that we haven’t talked about Gaussians yet, but we’re almost there. Beta distributions become Gaussian with variance tending to 0 in the limit $\alpha \rightarrow \infty, \beta \rightarrow \infty$ while the ratio $\alpha / \beta$ stays constant. This is what happens here when $n \rightarrow \infty$. We have thus proved that in the uniform case, the median becomes Gaussian. Furthermore, because the variance of the Beta also goes to 0 as $n$ grows, the non-linear function $F^{-1}$ becomes close to linear and the median becomes close to Gaussian.

Note that this property holds in general for all empirical quantiles of IID datapoints. Would you be able to prove it? You just have to slightly modify the steps which I have presented here.

Lessons from my first course (2)

Three months later, I’m finally coming back to this blog to finish gathering my thoughts on my first class. And since I took such a long break from my blog, the spring semester is now over, meaning that I can now also reflect on my second and third classes (which went better I think?)

In my preceding post, I tried to highlight what went right and wrong. Now, I’ll try to understand what I can do to make my next classes better by analyzing why things went the way they did.

One big issue during the first class was that I ended the semester being extremely tired. This happened once more during this spring semester. Part of it is understandable: these are my first courses, and I have a lot to learn and so preparing each class takes me a lot of time. Part of the solution is thus going to come from my summer resolution: finishing august with all my material ready for the classes of the autumn semester, and having the material for the spring classes ready by the end of january. Hopefully, I can then use only half a day each week preparing for each class. Hopefully, this will make me less stressed and less worn out when the end of the semester comes.

The second big issue was that I was blindsided by the fact that the students didn’t like the class. That’s pretty easy to fix, however. What I did during this semester was that I asked twice the student representatives to gather feedback from their peers so that I could get a better picture of how they felt. I’ll try to up this to three times for my next classes: once per month. I think this is very important since I feel that students are shy in expressing the problems they have with a class. They are willing to give feedback: I just need to ask it from them. Hopefully, this will be unbiased feedback. My biggest worry is that they won’t be honest with me. I’ll try to be careful here.

On top of this, I feel like I have learned a little about good teaching practices. I’ll take the time to reflect on that in another blog post. Armed with this new knowledge, I hope that my classes next year can be better!