Friday, September 09, 2016

Weapons of math destruction are helpful tools

Cathy O'Neill has worked as a scholar at Columbia before she became a "quant" for D.E. Shaw in Fall 2007, making sure that the company, the financial system, and the derivatives in particular would work flawlessly in the following year.

With these achievements, she is back with a book, willing to share her wisdom. I won't link to particular reviews and interviews because I don't think that there are innovative ideas in individual articles in that group. Instead, see e.g. Google News and Google.

Instead, let me summarize her points – that seem to be the only general points. She says that mathematics (big data, statistics, machine learning etc.) is evil because
  1. "It’s like you’re being put into a cult, but you don’t actually believe in it." People are being clumped according to some characteristics and the clumping is imperfect.
  2. "Math is racist." What she actually means is that the big data keep the poor poor and they keep the rich rich.
She therefore recommends to abandon the big data, algorithms, calculations, and leave all decisions to politically correct commissars.

But it's obvious that her reasoning – and especially her negative labels and vitriol she attaches to either neutral or truly ingenious methods – is absolutely demagogic or idiotic, depending on whether she realizes that it is idiotic.

First, she opposes the fact that people etc. are being clumped along with others who seem similar – even though they may end up being dissimilar in some cases. Indeed, that's the very point of these big data techniques. That's what these methods are all about and what makes them clever. It's rather shocking that this woman could have been hired as a "quant" if she hates the most general idea or goal of this whole occupation. With this "anti-quant" attitude of the "quants", the system couldn't really have worked too well.

Predictions made with these techniques aren't perfect. And there may be "better" or "worse" techniques of this kind. But they're typically better and more accurate than "no techniques" or "extremely primitive techniques". If you don't believe me, check how much more successful the experienced machine learners are at

One question is which of the algorithms is the best one. It is a very detailed question. But she attacks something more general: the assumption that there are some algorithms or techniques that are "close to the ideal" predictors of certain quantities.

You know, different approaches to a big data problem compete with each other (think of a competition). They reflect the creators' (programmers') philosophies or prejudices but that's OK because the more accurate philosophies or prejudices may end up being selected and the progress towards the impersonal perfection is made in that way.

Many businesses have to make similar predictions all the time. What is the probability that a given person \(Q\) – that is known to us through some basic data about the person, \(\lambda_i(Q)\) – will repay the loan without problems? The big-data or machine-learning approach basically tries to reconstruct the most sensible probability \(P(\lambda_i)\) that agrees well enough with all the previous data but isn't too contrived or overfit.

This estimate of \(P(\lambda_i)\) isn't perfect for the person \(Q\) with characteristics \(\lambda_i(Q)\) – after all, the truly most precise probability doesn't depend just on the variables \(\lambda_i(Q)\) but also on some/many other variables \(\mu_j(Q)\) that are not known to the bank – and, equally importantly, quantities that cannot be known at all because they will be decided "randomly" in the future (by Nature's quantum random generator, to put it extremely). However, it is a much more accurate estimate than e.g. \(P=1/2\) or some other, less naive guesses.

It's this precision that allows the bank to lend the money at low enough interest rates, which is good for the clients (and it's good for the bank to beat the competing lenders), while avoiding the risk that it will be losing the money. An accurate formula for \(P(\lambda_i)\) is helpful for the company or another user of the algorithms. Because the economy is basically composed of such companies or users, the algorithms are good for the economy as a whole.

The algorithms may basically end up saying that if you're a white in the San Francisco area, you are very likely to be an unhinged climate alarmist. Gene Day may protest and say that he's a bigger climate skeptic than I am. And we know that his complaint is mostly justified because we know Gene much more than e.g. a generic employee of an intelligence service that tries to transfer all the dangerous climate alarmists to Gitmo. However, our special knowledge may be inaccessible to the employee of the intelligence service – or it may be ineffective to try to find out additional details.

Because of these limitations, it is often a good idea to only work with a limited set of variables and try to use the big data methods to guess the probability, interpolate and extrapolate other functions etc.

This method looks similar to the fallacies of the anthropic principle. In the anthropic principle, we're being clumped into a group of "intelligent beings" with some rather randomly selected creatures in assorted different universes that are similar to us in some respects (respects claimed to be important) and dissimilar in others (claimed to be unimportant).

In fundamental science, this separation of characteristics that are important and those that are unimportant is artificial. If we're allowed to use some properties of the world around us to deduce something about the world, we should be allowed to use all of them, and then the anthropic reasoning becomes a useless tautology concluding that "we live in a world that has all the properties of the world in which we live". However, in the practical applications, the isolation of the known variables \(\lambda_i\) may be helpful to get better estimates.

Again, the reason why the big data estimates are helpful is that even if they're imprecise, they're just more precise than guesses based on primitive methods or no methods!

This is what Ms O'Neill demagogically hides. She says that something is wrong with the methods because their predictions are imperfect. But what she should be doing is to compare the precision of the methods with the precision of the simpler methods or non-methods that would be followed if the big data technologies weren't employed. And the answer is – at least in many situations where the big data techniques are suitable – is that the big data techniques give better results. So their introduction is an improvement and the criticism of them referring to their imperfect quality is a demagogy.

The same comment, "one needs to compare the methodology with its alternatives", applies to her claim that people's fates are sometimes being decided by algorithms that mostly depend on some characteristics that the affected person doesn't know. That's right, the algorithms may be a "black box". But what she hides is that when decisions are being made by powerful human beings, those can be even darker "black boxes" and the true reasons behind their decisions may be even more obscure to the affected person (and more illegitimate and corrupt and personal, too).

Her other, related complaint is that the methods keep the poor poor and they keep the rich rich.

Someone who wants a loan may belong to some "problematic" groups (at least from the lending viewpoint) that are poor and that generally makes them less likely to repay the loans as well. For this reason, they won't get a loan or they will have to pay a higher interest, and so on, and it's bad because it amplifies these people's or groups' misery. Cathy O'Neill is a leftist so she doesn't like it.

But the point is that there's absolutely nothing wrong about keeping the rich people rich and keeping the poor people poor. If anything, this result says that the method was neutral and didn't steal anything from anyone.

If you had a method that makes the rich people poor and the poor people rich, that would be bad because it means that the "method" has stolen something from someone – in this case, from the rich ones – and gave it to someone else – in this case, the poor ones. Communist revolutionaries may dream about such things but I am among those who think that this kind of a general revolution is counterproductive, immoral, and dangerous, and those who want to do such things should be neutralized.

Individual people live their lives as individuals – but they are also adding to the "record" of many groups into which they belong. To a certain extent, every individual is unavoidably attached to – and depends on – these groups to some extent. They partly share their fate. It's actually ironic that leftists – who are mostly collectivists – sometimes try to demonize this fact. When you're a chimp, you just shouldn't be surprised that most people (and big data algorithms) expect that you may learn physics at most up to loop quantum gravity or something like that (even though there is a chance that you are a stringy exception). Their expectations are rational and they have the right to think rationally. It's good for them – and for the world – to think rationally.

Concerning "racism", well, when an impersonal, state-of-the-art machine learning algorithm de facto determines that the people of some skin color are much less likely to repay the loan, it doesn't necessarily mean that there's something immoral about the method, that "mathematics is racist". Most likely, it means that the reality is racist. If you use the term "racism" for the very fact that various (ethnic, racial, and other) groups of people differ – that they have different probabilities to repay a loan, for example – then the big data methods basically provide us with a proof of racism. You may use all kinds of emotional and would-be insulting words for those insights but that won't change that they are true facts and you're a liar and a demagogue if you work hard to deny them or obscure them.

When new loans and hiring decisions depend on the previous events, we obtain a "positive feedback loop". Again, even though extreme leftists try to claim otherwise, there is nothing wrong about the existence of positive feedback loops. Negative feedbacks sometimes exist, too.

When a person or a group of people – let's say a race in the U.S. – is individually or collectively trying to do better now, he or she or they are also trying to make his or her or their life simpler in the future, too. What you did in the past has consequences for the present and what you are doing now will have consequences in the future. There is nothing wrong about this fact (about causality). This fact is a part of the motivation that encourages the people to do better.

For example, when a kid is learning something useful at school, she is not learning it just because of the grade she will receive tomorrow. There's some probability that the learning will affect whether she will be accepted to a school or a job and what her salary will be in 2035, among other things. Is this influence "immoral" in any sense? I don't think so. If someone tries to demonize this long-term causality of all kinds, she is definitely trying to murder meritocracy, to abolish the basic mechanisms that allow the society to work well and advance.

Some differences tend to accumulate and grow over time and the leftists hate it. But there is nothing wrong about it. These processes have always been essential, are essential, and will be essential for progress – in the evolution of life forms, technological progress, economic progress, social progress, individual progress etc. For example, the economic growth has always depended on the concentration of the capital. A necessary condition for the money to be useful is that different people have different amounts of money. At least statistically, this fact helps to allocate the resources, work, and influence more effectively. Those who can create money – approximately the well-being and satisfaction of other members of the society, acknowledged by themselves – will have more of them, and therefore will be more capable of influencing what's going on in the society.

These positive loops may sometimes make someone's life "too easy". For example, it's natural to assume that the skillful entrepreneurs' wealth is de facto increasing exponentially. For Warren Buffett, it's easier to earn a billion than it is for someone else to earn $100. But there's nothing wrong about this fact, either.

Cathy O'Neill must understand why and how the things she criticizes work. But as an extreme leftist, she is trying to attach negative emotions to these fundamental properties of a functional world, civilization, or a society, and spread delusional fairy-tales about a world that would be better if it worked totally differently. Such a world wouldn't work well. In fact, it has been experimentally tried and the guinea pigs in these experiments have paid dearly. She must know it but she prefers to repeat the left-wing lies because many people will appreciate her for that.

People have done well enough without computers and without big data algorithms – and indeed, many if not most people doing similar decisions are capable of being impartial and fair – but those things make many decisions more accurate, fairer, and more effective. If someone introduces the unfairness, it's someone who is trying to beat the impartial or impersonal verdicts of the computers with his own, personal, sometimes ideologically or otherwise justified appraisals. O'Neill's screaming that the big data techniques are "unjust" is nothing else than an effort to replace with with much more ineffective, unjust, imprecise, and corrupt procedures.

No comments:

Post a Comment