Statistics is hard #479: Absolute and Relative Risk

One theme that constantly pops up in the BI / Analytics / Big Data world is why – given we have all these amazing tools and models, etc. – is the adoption of Analytics so low? From a Microsoft perspective, Data Mining was baked into SQL Server since 2005 – and due to negligible uptake has hardly changed since. Now I know from my colleagues in Analytics – and the fact that R continues to grow at a great rate – that it’s not a dead field. Far from it. But it’s not quite at the front of everyone’s minds either.

I think the challenges are human rather than technical. Understanding Analytics often means pushing the mind to the limits of what our poor grey lumps of brain were designed to do. We are rigged to make snap decisions with limited information to aid our survival, not contemplate the likelihood of that wolf being hungry through careful modelling deep thought and … ouch, why is there a wolf biting my leg?

A great example of this is showed up in my Facebook feed recently:

Relative Risk - Soda will not kill you
Relative Risk – Soda will not kill you, read below for details




















Source: these guys, who I totally don’t endorse as they might be hippies


Well, er – lets not rush. As with all internet circulated health information, the facts are dubiously presented with no link to source. So first of all, let’s remedy that – this is the study in question:

Soft Drink and Juice Consumption and Risk of Pancreatic Cancer: The Singapore Chinese Health Study

Cancer Epidemiology, Biomarkers & Prevention, February 2010

Hurrah for open access journals. Reading through the study, the kernel of truth is there – a statistically valid effect found that indicated that those with a soda consumption of greater than 2 a week increased the relative risk of cancer by 85%. I’m not going to scoff at that, 85% is a big uptick in risk. Relative Risk – and this is where the above image is misleading.

At face value I would take the 85% figure to mean that if I drink 2 or more cans of soda a week, I have an 85% chance of getting pancreatic cancer, i.e. the Absolute Risk. If this was the case I would ban soda from my house immediately.

However dig into the maths and for the population study group the actual Absolute Risk of developing Pancreatic cancer if you drink no soda is about 1/4500. This makes it a pretty unusual cause of death compared to the big killers like Diabetes, which is a more likely consequence of drinking excess soda. For the population studied who did drink more than 2 sodas a week, the risk jumped to 1/2500. Which is still pretty remote. It also makes for a lousy headline. Much better to say the risk has increased by 85% without stating that the number refers to Relative risk and the Absolute risk is small. Not to mention that the study admits that its findings are far from conclusive.

So let’s revisit our risk types

Absolute Risk and Relative Risk are two very different things.

Absolute Risk is the chance of something happening to you if all other factors are equal. So for example, crossing a city street with your eyes closed may have a Absolute Risk of 10% in terms of being hit by a vehicle.

Relative risk is the adjustment to Absolute Risk when conditions alter. If it’s a highway, that risk of being hit by a car may jump to 70%. So the Relative Risk of crossing a highway instead of a city street is 700% higher. It doesn’t mean you have a 700% chance of getting hit by a vehicle, because – well, that makes no sense to have a 700% chance of something happening.

What does this have to do with how our brains are wired for Analytics?

It explains why the above image is simultaneously accurate and misleading. The snap decision we make is Soda – Cancer – Big Risk number – Soda Bad. The deeper analysis took a bit longer, and by which point most of us have lost interest.

Analytics is hard to get penetrated in the human way of working because it doesn’t appeal to our way of thinking, and it takes work to understand. So the message from here is if you are in Analytics and not being successful, it may not be because your models aren’t brilliant (I’m sure they are) – but because you cannot communicate how they work – and their value – in a way most peoples grey lumpy bits can grasp.


Disclaimer: I may have got some of the maths a bit wrong, particularly around the Absolute Risk of getting Pancreatic cancer, as I only spent 5 minutes trying to work it all out. This post does not constitute medical advice. If you take medical advice from Facebook, Twitter, Blogs or any other form of social media that has never been to Medical School, see a Doctor.

Read More