Legislating on statistical inference

Attention conservation notice: car insurance companies in France are forbidden
from adapting their premiums to the sex of the insured, even though women are
less likely to have accidents than men. This leads me to write long confusing
paragraphs on statistics versus racism / prejudice versus legislation.

I was recently told by a good friend that car insurance companies in France
recently lost the ability to adapt their premiums to the sex of the insured.
Insurance companies would like to be able to do this because women are less
likely to have accidents and so they should pay less insurance. More generally,
insurance is basically a form of betting: they are basically offering you a bet
on whether or not you will run into trouble. However, this bet is setup to
minimize your risk: if you do run into trouble, they give you money/aid to make
sure that the trouble is reduced.

Like all people offering bets, insurance companies are thus in the business of
performing accurate inference on whether or not a given person will run into
trouble. For health insurance, they want to predict whether or not they will
become sick (and also exactly how sick they might become: a flu is less
expensive to treat than cancer or diabetes).

This is where a very ugly little fact of life and of statistics rears its head.
This is something that I’m still grappling with so excuse any naive thoughts I
might have on this issue. The problem is the following: how do you respond when
someone observes that, for example, black people in the USA commit a
disproportionate fraction of crime ? (see this link) Does this observation then justify having racist thoughts ? I’d
like to answer a strong: “NO” to that question but I really can’t find a way to
say. I can’t find an argument where I’m not arguing that skin color is somehow
a variable that you shouldn’t base your inference on because it is “morally
illegal”. This troubles me greatly.

If this is indeed the direction we want to follow: some features of the data we
collect are “morally illegal”, then can we maybe try to collect all such
features ? Obviously, sex is also off the table (your algorithm shouldn’t be
sexist). Similarly, sexual orientation is also off. Age has a weirder status:
you can obviously be biased against young people and/or old people (and a lot
of people seem to be extremely biased based on age, which I find very weird),
but it somehow feels not as wrong to stereotype people based on age. This seems
to round up all the clearly illegal features. (One “-ism” that is clearly
missing from my list is classism: the prejudice against people that aren’t from
the right social background, like working class people. This is because I
couldn’t find any clear feature for class.)

We then run into muddy territory: the features that by themselves should be
innocent, but can be extremely indicative, by themselves or in combination, of
an illegal feature. Consider postal adress, for example: it is extremely
indicative of social status and possibly of age/sexual orientation (think about
the Village in New York, le Maris in Paris, etc). Similarly, names are also a
big problem: they are extremely indicative of ancestry. So on and so forth, we
find that a lot of features that might be available to insurance agencies are
maybe indicative enough that they can get a good inference on these illegal
features we were discussing before.

I’m not sure how to solve this issue: forbidding by law insurance companies to
use sex as a predictor is certainly one way, but I’m not sure if it’s really
the right solution: it feels more like playing judicial whack-a-mole than a
constructive general solution. I’m not even sure if there really is a problem.
I’d be curious to hear your thoughts.