# Struggles with probability topology: strengthening the weak-topology

Today, I continue my series on topologies on probability distributions. This series started when I realized that the common topologies (the weak and total-variation topologies) are extremely weak, in that they don’t imply convergence of moments.

In this series, I mostly looked at alternative topologies: the KL, the MGF, the Wasserstein, etc. Today, I will instead focus on how to strenghten the weak topology: adding conditions that ensure that the limit is better behaved that is should be from just the weak-topology.

## The weak-topology

Let’s first recall how the weak topology works. A random variable sequence $X_n$ converges to $X$ iff the expected value of any bounded continuous statistic $f(x)$ converges:

$\displaystyle E( f(X_n) ) \rightarrow E( f(X) )$

That’s the basic definition but the most useful characterization is that weak convergence is equivalent to convergence of all statistics of the form $\exp(itx)$ for $t \in \mathbb{R}$ (which you will recognize as the Fourier basis).

The weak topology is the most usual topology, but it has what I consider to be confusing behavior. Consider the following example:

$X_n$ is a mixture distribution. With probability $\frac{n-1}{n}$ pick a value from a Gaussian distribution centered at 0 with variance 1; with probability $\frac{1}{n}$, pick instead from a Gaussian with mean n. The mean of $X_n$ is always 1, and it’s variance grows to infinity with $n$. However, $X_n$ converges weakly to a Gaussian centered at 0. Faced with a situation like this, I’d rather say that $X_n \rightarrow Y$ where $Y$ is a degenerate pseudo-Gaussian with mean 1 and infinite variance OR refuse to acknowledge that $X_n$ converges.

### Ensuring some higher convergence

Weird convergence behavior under the weak topology always has the same structure as the example I’m presenting here. All statistics that grow faster than some limit statistic (in this case, everything that grows faster than $|x|$) see their expected value diverge, all statistics growing slower than the limit converge to the correct value. Thus, we can strenghten the weak-convergence of a given sequence by identifying the “critical statistic” of that sequence, or at least by finding statistics which grow slower than the critical statistic.

Let’s state this more precisely. If $X_n \rightarrow X$ weakly, and we know that some statistic is bounded in the sequence: $E( f(X_n) ) \leq K$, then that implies that the expected value of any statistic which is dominated by $f$ converges to the correct value. $\forall g,\text{ } \frac{g}{f} \leq K_g$ then:

$\displaystyle E( g(X_n) ) \rightarrow E( g(X) )$

Why is that the case ? Weird behavior under the weak topology is all about the fact that the weak topology controls poorly the tails of the $X_n$. The condition $E( f(X_n) ) \leq K$ tells us that whatever is happening in the tails is in $O(f^-1)$ so that we are safe for all functions which grow more slowly.

### Ensuring MGF convergence

The strongest topology I have talked about so far is the MGF topology. MGF convergence implies weak convergence, convergence of all moments, and convergence of all statistics that grow slower than some exponential.

With the above condition, we can ensure MGF convergence from weak convergence AND convergence (or boundedness) of $\exp( |r x|)$. We thus have a way to go from our weakest convergence to our strongest convergence in an instant.

### Can opposite violations happen ?

We know that, under the weak topology, sometimes some expected values are infinite when they should be finite: can weird violations also happen the opposite way, ie: $f$ such that $E( f(x) ) = \infty$ but $E( f(X_n)$ remains bounded ?

Thankfully, this can’t happen (for continuous $f$ at least). First, remark that we can restrict ourselves to positive $f(x)$ because for $f$ that aren’t absolutely convergent, I don’t know how to define $E( f(x) )$. Showing that $E( f(X_n) )$ will always be bigger than any bound we set is straighforward. Choose $B$, find a compact domain $I$ such that $E( f(x) 1(x \in I) ) > 2B$. Then $f(x) 1(x \in I)$ is a bounded and continuous statistic. By weak-convergence of $X_n$, we get our result.

## Complete agreement

Let $X_n$ be some weakly convergent sequence, with limit $X$. We saw that the only weird things that could happen was that some statistics that have finite expected value under $X$ have a limit expected value that doesn’t match (very possibly infinite). We can then define the absolute strongest convergence which would be that for any statistic which has finite expected value under $X$ the limit of the expected values converges to the correct value. In that case, for ALL statistics we have that the expected value under $X$ matches the limit of the expected value.

In practice, this complete agreement topology is all about tail behavior so it can probably be formalized in that way. I conjecture that something like:

$\displaystyle \frac{p(x)}{p_n(x)} \leq K_1 \text{ and }\frac{p_n(x)}{p(x)} \leq K_2$

and weak convergence would ensure “complete agreement” convergence. If you’re not convinced that my condition works, consider it for some examples like $p(x) = x^-3$ and $q(x)=\exp( - x)$. I’ll try to prove this in a further post.

### Summary and conclusion

Today, I presented ways to strenghten the weak topology. We found that having weak convergence and boundedness of the expected value of one statistic ensures the correct convergence of all statistics that grow slower. In particular, this enabled us to give a very simple condition to ensure MGF convergence from weak convergence. From this, we also saw that we could define a “complete agreement” topology which would ask that the expected value of any statistic are matched and conjectured that this simply requires weak convergence and similar tail behavior.

#### A technical note

I have avoided throughout this post to refer to equivalence classes of statistics because I was afraid that this might be confusing for those which are encountering this concept for the first time. For readers who understand (or think they understand) concepts like small-o, big-O and big-theta, try to rephrase what I said earlier with those terms.

As always, feel free to correct any inaccuracies, errors, spelling mistakes and to send comments on my email ! I’ll be glad to hear from you.