“OK, the AI says our sales next month will be $252 million. How confident is it about that? 50%? 70%? 90%”
Here we have a CEO who has championed the implementation of AI for her company’s sales forecasting. She’s now looking at the first prototype forecast for next month, and naturally she wants to know how trustworthy the prediction is. Her question, however, reveals a common misunderstanding about how probabilities actually work.
If you’re a CEO or any executive in a company that is digitally transforming, and you sometimes feel uncertain about how exactly AI and machine learning outputs should be interpreted, read on – this is for you. At first glance, some of this may seem “nerdy” and irrelevant but trust me – it’s not. What follows are core essentials that you simply have to grasp to be a good CEO/CxO in these times when every company will eventually become an AI company.
AI is not an optional add-on for the strategy of your company. To thrive in this brave new world, you and your board need a good understanding of AI so that you can become one of the winners rather than one of the many companies that will be left behind by the tectonic AI transformation that is taking place right now.
Probability: the language of modern AI
The figure below shows a machine learning predictor. It has learned by observing (lots of) data, and now it is producing an output or prediction, from the input. To do this, it uses a mathematical model and the model parameters that it has learned from lots and lots of data.
For predicting sales forecasts, the inputs could be historical sales data, macro-economic trends, even predicted air temperature.
The first fact you need to understand is that this output is stochastic. What does that mean? It’s actually quite simple – in fact, you yourself output stochastic variables all the time. Maybe at some point you said to a friend, “our team is going to win the game tomorrow”. That’s a stochastic prediction. You don’t know for sure they will win, but you estimate that it is more likely that they will win than not. Wishful thinking or not, that was a stochastic prediction.
The fact that machine learning works like this is great. It’s what allows it to tackle complex challenges like driving a car or detecting machinery failure long before any human could; reality is way too complex for black-and-white predictions.
Now let’s think about our sales forecast again. The predicted $252 million is stochastic. That means the predictor doesn’t actually say, “It’s going to be $252 million.” Rather, it predicts a probability density function and the $252 million is just a summary statistic of that density. Let’s unpack this.
The “our team will win” prediction is an example of a probability. These are easier to understand than the very closely related probability densities, so we’ll start there.
A probability for an outcome O is simply a number between 0 and 100%. P(O=o) is the probability that O will be o, where o is a specific outcome. O is a stochastic variable, meaning it can take on different values and we are not certain what that value will be. Lower case o is a specific outcome. In our game example, O is the outcome of the game, and it can take on two values: WIN and LOOSE.
Since there are only these two outcomes, we know that P(O = WIN) plus P(O = LOOSE) must equal 100%.
What is hard to grasp, even for “techies”, is that when your AI tells you that the probability of winning is 99%, it is not necessarily a good idea to bet all your money on that outcome.
Remember: the output is stochastic. This even applies to probabilities. To understand this, let’s move on to probability densities.
If we want to calculate the weight of a liquid in a container, we can do that by multiplying the volume, say in liters, by the density, say in kilos per liter. 100 liters of a liquid that weighs 2 kilos per liter would weight 200 kg. Easy.
If the liquid is denser at the bottom and gradually less dense towards the surface, we need to refine this a bit. What we would do is divide the container into thin horizontal layers so we can fairly assume that the density within each layer is constant. Then we multiply the volume of each layer by its density to get the weight of that layer – and finally we sum up all these weights.
“Density” sounds odd when we are talking about sales forecasts, but it actually does make sense. In fact, it works quite similarly to the liquid weight example. The figure on the right shows the probability density for the stochastic variable, x, which in our example is sales in dollars. On the y axis, we have the probability density. The higher the density is, the more likely it is that the actual forecast will end up in that vicinity.
Densities are not quite probabilities though. Just like with the liquid weight calculation, we have to sum over a range weighing by density to get the correct result. The result that a probability density can get you is the probability that the value is in a given range. In the figure, we have defined A and B and we want to know the probability that our forecast will fall somewhere in the interval from A to B.
Mathematically, this can be done in different ways. An approximate method is the most intuitive: we subdivide the range into small sections. For each section, we take the average density – the height of each bar – and then we multiply that by the width of each bar; in this example, the widths are identical. In other words, the probability is the area under the curve within the range.
Note that if we define A as the lowest possible value for x and B as the highest, then this probability will add up to 100%. The probability that x takes on an impossible value is 0%, in other words – which is quite comforting.
Now we are ready to go back to the original question about the confidence of the sales forecast. How should we interpret what the AI is trying to tell us?
Given what we have talked about it should be clear that for stochastic variables, we are generally going to be talking about ranges, not numbers. So there are two important types of answers we can get:
- What is the probability that the sales forecast will be between A and B dollars?
- What is the range of values that encompass the center 90% of probability for the forecast?
The first one we covered above. The second is a bit more tricky, but all it’s doing is flipping things around. Instead of going from a range to a probability, we are interested in finding the range from the probability.
To explain this better, we need to understand something called the expected value, often termed E(x). This is a summary statistic that simply gives us the weighted average of the density. I’ve never liked the “expected” name, because it can mislead non experts into thinking that it is very likely that we will see the expected value. That is often not true! If the density is flat, then you can get your “expected” value but it is almost as likely that very different values will be realized.
But E(x) is a good starting point for question 2. A meaningful question is: what is the range that covers 90% of the total area and has E(x) as the central or mid point? This gives you what’s known as a credible interval. As shown in the figure, this will leave roughly 5% of the probability at either end (if the distribution was symmetrical, it would be exactly 5%.)
So instead of looking at the $252 M, which turned out to be the expected value, we should rather ask for a credible interval. For example, that could be between $191 and $320 M. This is actionable – now you know what to prepare for. You know that values under 191 and above 320 are very unlikely.
There are other ways to define credible intervals, but the principle is the same. So just make sure everyone is clear on the particular definition that was used.
Choices and consequences
In addition to the value of knowing what is likely to happen, you can now also manage risk better and make better choices.
Imagine the following scenario. If sales end up being less than $200 M, our company is going to need to secure additional working capital, say through a funding process. Since this is going to take a long time to complete and the process itself might harm the company (bad publicity), we need to know if that scenario is likely. Now because we have a probabilistic forecast, it is easy to answer the question. As shown in the figure, we just need the probability of the shaded area. If it is high – and the meaning of “high” will be discussed below – then we start raising capital. If not, we don’t.
To make the best decision, we need to calculate the expected loss. If we don’t reach $200 M, but we assumed we would, there is a certain loss. It’s big, because we will run out of cash – say $500 M.
If we act as if we won’t reach it – we start raising capital – but we did reach $200 M, that’s another, smaller loss – say $50 M. Since we know the probability and consequence of each scenario, we can calculate the expected loss of each action. We assume for simplicity that there is zero loss if we take the right action, i.e. raise capital and we end up needing it, and not raising and ending up not needing it.
Say the probability of not making $200 M is 7%. Then the expected loss of raising capital is 93% times $50 M, or $46.5 M. The expected loss of not raising capital is 7% times $500 M, or $35 M. So we should not start raising capital because the expected loss of that decision is lower.
All this assumes that your model and algorithms are reliable – that is another subject entirely. And several aspects were kept very simple. But you should now be able to see why it is critical to ensure reliable AI, and how powerful it can be as a decision support tool.
So why might it not be a good idea to bet the family fortune on a prediction that the probability of team X winning is 99%? That’s because the 99% is a stochastic variable with its own probability density, and it could be quite flat across a wide range of probabilities even if 99% “pops up” as the expected value. Probabilities over probabilities – yes, I know it’s a lot to take in! But trust me – don’t place the bet without seeing the underlying density first!
Make your processes uncertainty enabled
Transforming your company to an AI company isn’t easy. It takes huge shifts in organization, culture, and processes. Regarding the latter, success requires that business processes are changed so that they are able to work with and decide based on ranges and probabilities, rather than fixed numbers and simple binary decisions. That means getting everyone onboard, starting with the C-level and going all the way to the people working the daily processes. Everyone in your company needs a reasonable grasp of AI – the basics of probabilities given here is just one part of that.
If you’d like help with educating your C-level, board or management teams to derisk and speed up your AI transformation, contact me.