The Median is asymptotically Gaussian

The other day, I came across a really basic fact, which I should have known a long time ago: the empirical median is approximately distributed as a Gaussian in the large-data limit ! Let me share the proof with you: hopefully this will make me remember both the proof and the theorem in the future.

 

Let’s start by stating the theorem. Let X_i be 2n+1 IID random variables, and let \hat{m} be their empirical median (note that since we have an odd number of datapoints, the empirical is straightforward to find: just order the X_i and find the value at index n+1 in the ordered list). The theorem states that, in the limit of a large dataset n \rightarrow \infty, the empirical median has a Gaussian distribution. Even better: we can know exactly what kind of random variable it is: it corresponds to a Beta distribution with parameters (\alpha = n+1, \beta = n+1) deformed by the inverse of the CDF of the X_i:

\displaystyle \widehat{m} = F^{-1} ( \beta )

 

In order to prove this result, let’s first focus on the case in which the X_i are uniform random variables. Let’s compute the probability P(\widehat{m} = x). We have 2n+1 possibilities for the index of the median. Thus, we can focus on the case in which X_1 is the median:

\displaystyle p(\widehat{m} = x) = p(\widehat{m} = X_1 = x)

We then need to worry about the other indexes. n points need to be above x and n need to below. This gives \binom{2n}{n} possible repartitions. We can once more focus on a single possibility: X_2 \dots X_{n+1} are below x and X_{n+2} \dots X_{2n+1} are above.

\displaystyle p(\widehat{m} = x) = (2n+1)\binom{2n}{n} p(\widehat{m} = X_1 = x \text{ AND } X_2 \dots X_{n+1} < x \text{ AND } X_{n+2} \dots X_{2n+1} > x)

Finally, all the X_i are IID independent random variables. The probability above is thus straightforward to compute:

\displaystyle p(\widehat{m} = x) =(2n+1)\binom{2n}{n}  p(X_1 = x) \prod_{i=2}^{n+1} p(X_i < x) \prod_{j=n+2}^{2n+1} p(X_j>x)

\displaystyle p(\widehat{m} = x) =(2n+1) \binom{2n}{n}  (1-x)^n x^n

\displaystyle p(\widehat{m} = x) = \frac{(2n+1)!}{n! n!}  (1-x)^n x^n

which we recognize as a beta distribution with parameters (\alpha = n+1, \beta = n+1).

 

Ok. Now we know what happens for uniform distributions. How can we extend that result to the general case? In order to do so, we have to remember that any random variable can be constructed from a uniform distribution using the inverse of its CDF. Thus, we can construct the X_i as:

\displaystyle X_i = F^{-1}(U_i)

Furthermore, the function F^{-1} is monotonous. Thus, the median of the X_i is the image of the median of U_i. Since the median of the U_i has a Beta distribution, this means that the median of the X_i has a deformed Beta distribution:

\displaystyle \hat{m} = F^{-1} ( \beta )

 

You might notice that we haven’t talked about Gaussians yet, but we’re almost there. Beta distributions become Gaussian with variance tending to 0 in the limit \alpha \rightarrow \infty, \beta \rightarrow \infty while the ratio \alpha / \beta stays constant. This is what happens here when n \rightarrow \infty. We have thus proved that in the uniform case, the median becomes Gaussian. Furthermore, because the variance of the Beta also goes to 0 as n grows, the non-linear function F^{-1} becomes close to linear and the median becomes close to Gaussian.

 

Note that this property holds in general for all empirical quantiles of IID datapoints. Would you be able to prove it? You just have to slightly modify the steps which I have presented here.