Tags:

Statistics

I'm of the opinion that Statistics are one of the things that we as humans have the most trouble with. The problem isn't that it's hard; there are a lot of things that are harder than Statistics. The real problem is that humans seem to have a gut instinct for stats, but that gut instinct is way off more often than not.

This leads to the tricky situation where a random person will attempt to interpret a statistic, or generate a statistic, and feel like they're probably pretty close. Sometimes someone who even knows that math will second guess it because it just doesn't feel right.

I'm going to attempt, in this article, to cover a collection of cases where I've seen people often have the wrong idea.

The Birthday Paradox

The Birthday Paradox is one of the simplest and easiest examples of how wrong a person's gut instinct can be. It goes something like this: How many people need to be in a group to have a 50% chance that two or more of them share a birthday?

So, most people, when presented with this go through a thought process like the following: Alright, the typical year has 365 days. A person can be born of any of those days evenly. We want a 50% chance of collision, so I'd guess (365/2) = 182ish. So, I'd guess about 182.

Not bad reasoning.

The real answer is 23. I'll explain why that is after covering a little bit of the basics of probability.

Probability Basics

3 in 5 means that, if you do the experiment a huge number of times, then about 3/5th of that should be the given outcome.

That's all. If you do something 5 times and don't get 3 of the given outcome, then that doesn't necessarily mean the probability is wrong.

If you do something 50 times and never get a success, that still isn't necessarily wrong. It does provide evidence that perhaps it's not the most likely probability, but that's a whole other topic.

There's actually no way to prove or even demonstrate that a given probability is wrong.

But, in general, you should assume that in the long run, when you do something 50 times, there should be close to 30 successes, and when you do something 5000 times there should be even closer to 3000 successes.

Also, if something has a probability of 1 in 1 million, that doesn't mean that it will only happen on the 1 millionth attempt. Something can have a probability of 1 in 1 million and still happen on the first attempt. It can happen on the next attempt too, then not for another 199 999 998 times and be exactly as predicted by the math.

In this representation of probability it's computed as (the number of times the interesting outcome can occur) / (the total number of outcomes), and since there can't be more interesting outcomes than total outcomes, this number ranges from 0 to 1. 0 means impossible, and "1" means certain, but without rounding, neither of those values actually come up in most cases. Often the best you'll get is 0.000000001, which is very unlikely but not impossible, or 0.999999999 which is very likely but not certain.

Two things happening together is represented by multiplication, whereas having either one thing happen or another is represented by addition.

Take, for example, rolls of a fair die: Each side of the die has a 1 in 6 chance. So, the probability of rolling either a 1 or a 2 is (1/6 + 1/6 = 2/6). This makes sense. The probability of rolling a 1, followed by a 2 is (1/6 * 1/6 = 1/36).

If you want the probability that the opposite of something happens, you just need to subtract it from 1.

For example, the probability that two dice each come up 1 is (1/36). The probability that doesn't happen is (1 - (1/36) = 35/36).

Be aware, though, that odds and probability both represent the same thing, but work differently. If something has a 3 in 5 chance, that represents a probability of (3/5). 3 successes for every 5 attempts.

If, though, something has 3 to 5 odds, that represents 3 successes vs 5 failures. That means there's actually 8 outcomes (3 success + 5 fail), which represents a probability of 3/8.

A 3 in 5 probability is the same as 3 to 2 odds.

The Birthday Paradox Revisited

So, now that we've got the basics of probability, let's see if we can work out why the answer to the birthday paradox is what it is.

First off, assumptions. I'm assuming that people are born with an equal probability on any day of the year. That's not quite true in practise, there is a clustering in certain areas of the year, but that would make it more likely that people would have the same birthday, not less, so that's acceptable.

First off, calculating the probability that a group of people all have unique birthdays is easier than computing the probability that they have 1 or more collisions. Luckily, since "having everyone have a different birthday" and "having everyone not have a unique birthday" are opposite outcomes, we can subtract that probability from 1 and get the value we actually want.

So, the probability of the first person having a unique birthday is (365/365 = 1). That makes sense, since there's only one of them.

The second person has only 364 days to choose from (since it has to be different from the first), which leaves a probability of (364/365).

The third person has (363/365).

So, to compute the probability that three people have unique birthdays we have (365 * 364 * 363) / (365 * 365 * 365), which is 0.99 That's pretty likely.

The probability that there's one or more of them that share a birthday is (1 - 0.99 = 0.01).

That probability rises quickly, though, as we add more people.

Number of People Probability of Sharing a Birthday
1 0.000
2 0.003
3 0.008
4 0.016
5 0.027
6 0.040
7 0.056
8 0.074
9 0.094
10 0.117
11 0.141
12 0.167
13 0.194
14 0.223
15 0.252
16 0.283
17 0.315
18 0.346
19 0.379
20 0.411
21 0.443
22 0.475
23 0.507

So, we can see that by 15 people we've got approximately a 25% chance that there will be a shared birthday, and by 23 people we've reached 50%.

The Weather

The weather is probably the place where people clash with probability the most in their day-to-day lives.

It seems like most people take "70% chance of rain" as "It's going to rain". Anything above 60, really, is considered a "yes", and people become irate when it doesn't rain as predicted.

Unfortunately, that's not what it means. What it does mean is that there's a 7 in 10 chance that it will rain on that day.

Even if we ignore what I said in the section on basics, that means that if he predicts 70% chance of rain for 10 days in a month, and 3 of those are sunny, then he was exactly right. In fact, if it rained every day he predicted a 70% chance of rain he'd actually be wrong.

The same obviously goes for "30% chance of rain", which doesn't mean "It will not rain."

The Lottery

In this case I'm going to talk about raffles, because it's easier to reason about things that exactly one person always wins.

Depending on the perspective you take in the raffle, the outcomes look very different. One on hand if there are one million raffle tickets sold, then each ticket has a one in one million chance of being chosen. That's considered a small probability. But, it's certain that one of them will be the winner, and that winner always had a one in one million chance.

The person who won likely wasn't expecting to be the one to win, and shouldn't have. Like the weather, when people hear that something has a one in one million chance of happening, they interpret that as "impossible". So, when the "1" comes up, and they get the unlikely result, they see something they thought impossible come to be, and assume something magical helped them out.

From the system's perspective, though, someone had to win, and each person was equally likely. The fact that 1 person won, and 999 999 people did not doesn't seem weird or magical to it. That's the only way it could have gone.

It's difficult to reconcile these two views, so I won't try. They're both true, and one just needs to think about things from both sides before jumping to any conclusions.

Implicit Assumptions

Let's say you're flipping a coin 4 times. You flip it the first 3 times and get Head, Head, Head. At this point people tend to feel like the next one has to be a Tail, since 4 Heads seems much less likely.

There is some basis for this feeling. The probability of getting four Heads is (1/16), but the probability of getting a 3 Heads and a Tail is (1/4). That's four times as likely! It would seem, then, like the odds of getting a Tail on the next toss are better than the odds of getting a Head.

There's a hidden assumption in here, though. We've already got 3 Heads. The reason that 3 Heads and a Tail is four times as likely is because there are 4 equally likely ways that can happen: THHH + HTHH + HHTH + HHHT = (1/16) + (1/16) + (1/16) + (1/16) = 4/16 = 1/4.

Obviously, though, only one of those are applicable to our current situation. THH, HTH, and HHT didn't happen. We're at HHH. There's only one outcome there, though, that has a Tail in it, and the probability of it is 1/16, same as HHHH.

Miracles and Coincidences

You're walking down the street and before crossing the road you notice something on the ground and pick it up. Just then a car flies past and you think "Wow! If I hadn't bent down to collect this thing, I'd have been hit by a car and died" and it's a miracle.

Let's look at the potential outcomes, though, and their reactions.

If you're walking down the road and you don't get hit by a car you consider this a normal day. Nothing weird or magical occurred here, and this day is mostly forgettable.

Let's say you're walking down the road and you get hit by a car and die. In this case no one uses the word miracle, it's an accident. Whatever you did that day is insignificant, and in many cases no one even knows what you did.

If you're walking down the road and get hit by a car, but are only injured, then this is an awful day. It's also never really called a miracle, but it's possible for you to maybe draw a line from your activities before you were hit and the actual accident.

If you're walking down the road and you almost get hit by a car, but are narrowly missed, then it's considered a miracle.

Like the lottery, these are all of the choices. From moment to moment one of these has to be happening, and most of the time you're not getting hit by cars.

When someone gets hit by a car, though, they tend to feel like something unlikely has occurred, but they often fail to consider every other time they walked down the street and didn't get hit. And that's just them, what about every other pedestrian that day that didn't get hit by cars? Did each of them consider that day to have been a miracle because they managed to walk from one place to another without being killed? Likely not, since that's expected.

Looking at it as three outcomes: Nothing, Miracle, Accident; where no on considers Nothing, there are only Miracles and Accidents. And, really, Accidents are just miracles that aren't deemed positive.

The other example that comes up is things like "Wow, I was just thinking of you when you called me! WEIRD!" There are two potential areas here.

If you're thinking of someone and they call, then it's super coincidental. So much so that perhaps magic was involved.

If you're not thinking of someone when they call, then it's just a phone call. There are lots of those everyday, the event is not significant, and an hour from now you may not even remember that you took this call at all.

The real issue is that probability is based on numbers. If there's a 3/5 chance of something happening and you do 5 trials, you expect about 3 successes. If you do 50 trials, you expect about 30.

In these cases, though, people seem to grossly underestimate the number of trials. Despite potentially answering 100 phone calls in a month, when something happens it can seem like it happens far more often than it does if you only feel like you've taken 3 calls this month.

Now, like the lottery, I've been mostly speaking about the system here. Let's say that 1 person is hit by a car every 3 days in some area, when you get hit that's not unlikely from the world's perspective. If, though, someone were to ask "Why me?", then that's potentially a valid question. Even if everyone who was hit thought that, the people who weren't hit rarely ask themselves "Why not me?" They just assume it's something that happens to other people.

If I may, I'd also like to apply this to prayer. Let's say a person becomes ill, and people pray that they will recover. There are now two options: they recover, in which case the prayer is deemed successful; or they don't recover.

I'm not saying that prayer is ineffective, but as a skeptic on the outside, I see it as a bias. Either the prayer was critical, or it was just their time.

It's even easier on longer term wishes. If one wishes every day for their entire life that they win the lottery, then any time they don't win is a normal day, and they day they do it's all because they prayed for it every day. The days they prayed to win and didn't just fall away, and aren't significant in the story of their life.

Conclusions

My intention here isn't to disprove miracles, or claim that people should be happy when it rains during their sunny plans.

In the end, it comes down to random occurrences, but what actually guides the outcome is up to you to decide. I happen to believe that the outcomes are due to physical processes which have no concept of "Our interests", but one could easily also believe that there is an interested party out there guiding the outcomes.

I can't prove that there isn't, and for most of what I've said it doesn't matter whether there is or not. I'm more just collecting a few of the things that I've heard people say, or claim, that have seemed, to me, to have been potentially rooted in a misguiding of statistical understanding.

I'm sorry if I've enraged anyone.

Originally Published:
2012-11-28T14:53:24Z
Last Updated:
2012-11-28T19:48:47Z