Standard Deviation and z-score in plain English

esoito · Sep 01, 02:15 AM 2010

Congratulations, Bayes!

And a big THANK YOU for this EXCELLENT explanation.

I'm actually starting to understand SD at last.

You certainly have a gift for rendering difficult matter into clear, concise English.

And many thanks, too, to mr.ore for his valuable comments, examples and explanations.

We are so lucky to have such expert and generous members.

Bayes · Sep 17, 08:04 AM 2010

Thanks esoito. Ok, moving on to the z-score.

The z-score

In the first post of this thread, I gave the 68 - 95 - 99.7 rule, which said that in a sequence of fixed length (say 100 spins) the numbers of reds would conform to the rule, ie; that in 68% of these 100 spin sequences you would get between 45 and 55 reds (1 standard deviation), in 95% of them you would find between 40 and 60 reds (2 standard deviations), and in 99.7% of them you would find between 35 and 65 reds (3 standard deviations), but what if you have a sequence of a different length? or you're not interested in the number of reds, but maybe the number times a particular dozen or street hits? Also, suppose you want to know the "interval" of hits for a particular % (not just one of 68%, 95% or 99.7%)?

The z-score is a measure of the dispersion (how "spread out" the hits are) which is defined in terms of the standard deviation. So the z-score tells you how many standard deviations away from the average a particular measurement is. It turns out there's a simple formula for calculating the standard deviation for the sort of outcomes we're interested in, so for example, instead of having to record the numbers of reds in each of your 1000 sessions, you can use this:

s = Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p))

where s is the standard deviation, n is the number of spins in the sequence, and p is the probability of the outcome you're interested in.

So to find the SD of a sequence of 100 spins, where red is the outcome of interest, we have:

n = 100
p = 0.5 (I've ignored the zero just to keep the arithmetic simple).
Also (1 Ã¢Ë†â€™ p) also equals 0.5 in this case.

So s = Ã¢Ë†Å¡(100 Ã¢Å"â€¢ 0.5 Ã¢Å"â€¢ 0.5)
= Ã¢Ë†Å¡25
= 5

The z-score is another simple formula, which uses the standard deviation and also the AVERAGE number of hits you would EXPECT to get in your sequence (you have to calculate this) together with the measurement you're interested in (or what actually occurred):

z = (X Ã¢Ë†â€™ expected no. of hits)/s

In other words, z is the difference between what you ACTUALLY got (X) and the AVERAGE (what you would expect to get in the long term), divided by the standard deviation.

The 'expected number of hits' is just the probability of a hit, multiplied by the number of spins in the sequence.

Example, in 100 spins you would expect to get 50 reds, on average. ie:

Average = n Ã¢Å"â€¢ p

Where n = 100 and p = 0.5

So, Average = 100 Ã¢Å"â€¢ 0.5 = 50 hits.

So the formula for the z-score is:

z = (X Ã¢Ë†â€™ np)/s

To make sense of this formula, notice what happens when X = np. What does this mean? it means that the actual outcome you got is the same as the expected outcome. So in 100 spins, if the actual number of reds was 50, then this tells you that (X Ã¢Ë†â€™ np) is zero, so the z-score is also zero. Remember what the z-score tells you - it is a measure of the dispersion - how far the outcome is from what you would expect on average. In this case, the outcome is the SAME as the average, so the z-score reflects that. If the absolute value of the z-score is large, it means the dispersion is also large.

What do I mean by 'absolute value'?

To be continued....

Bayes · Sep 24, 11:56 AM 2010

If your actual measurement (X) is more than what's expected (np) the z-score will be positive, and if it's less it will be negative. If in 100 spins you get 35 reds, then:

z = (35 - 50)/5 = -3.00

And if you get 65 reds it will be:

z = (65 - 50)/5 = +3.00

The absolute value disregards the "sign" of the number (whether it's positive or negative) and is concerned only with the "distance" from zero - if you imagine a number line with zero in the middle with the positive numbers increasing to the right and the negative numbers increasing to the left, the absolute value only takes account of how far the number is from the zero.

If the z-score is negative, it means worse than average "performance" for the number or group of numbers in question. If it's positive, this indicates a better than average showing.

A couple of step-by-step examples.

Example 1

What is the z-score of a street which has hit twice in the last 75 spins?

First thing is to identify what numbers go into the formula.

z = (X - np)/s, where s = Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p))

What is n? the number of spins over which you want to measure the z-score, which is 75
What is p? this is the probability of a street hitting. Since a street consists of 3 numbers, and there are 37 numbers in total (all of which are equally likely), then p = 3/37 = 0.081081
So (1 Ã¢Ë†â€™ p) = (1 Ã¢Ë†â€™ 0.081081) = 0.918919

Therefore the standard deviation, s = Ã¢Ë†Å¡(75 Ã¢Å"â€¢ 0.081081 Ã¢Å"â€¢ 0.918919) = 2.363898

What is X? this is the number of times the street has hit in the last 75 spins, which is 2.

So now we're ready to plug the numbers into the formula:

z = (2 Ã¢Ë†â€™ 75 Ã¢Å"â€¢ 0.081081)/2.363898 = -1.726

The negative number tells us that this particular street has hit well below expectation over the last 75 spins. Remember the 68-95-99.7 rule which says that in a sequence of 75 spins, a z-score of between -1 and +1 will occur in 68% of the sessions, a z-score between -2 and +2 will occur in 95% of the sessions, and a z-score of between -3 and +3 will occur in 99.7% of the sessions. So -1.726 means that this result is outside the 68% interval, but inside the 95% interval (maybe about 80%), meaning that it only occurs in about 20% of sessions or less.

Example 2

You notice that a dozen has hit 13 times in the last 18 spins, what is its z-score?

First find s, the standard deviation.

s = Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p)

What is n? the number of spins over which you want to measure the z-score, which is 18.
What is p? this is the probability of a dozen hitting. Since a dozen consists of 12 numbers, the probability, p is 12/37 = 0.324, so (1 Ã¢Ë†â€™ p) = (1 Ã¢Ë†â€™ 0.324) = 0.676.

Therefore, the standard deviation, s = Ã¢Ë†Å¡(18 Ã¢Å"â€¢ 0.324 Ã¢Å"â€¢ 0.676) = 1.986

What is X? this is the number of times the dozen has hit in the last 18 spins, which is 13.

Plug the numbers into the formula:

z = (13 - 18 Ã¢Å"â€¢ 0.324)/1.986 = +3.609

Since the z-score is positive it means that the dozen is hitting above expectation in this sequence of 18 spins. In fact, because the score is above +3.00, it indicates a rare event which occurs in less than 0.3% of all 18 spin sequences (because +3.609 lies outside the +3.00, and +3.00 is the outside "limit" for which 99.7% of results will occur).

Now here's one for you to try

:

You have a system which bets on sleepers - the plan is to tick off all numbers until there are only 4 left unhit and then start betting on them. Eventually, after 87 spins, you have your 4 numbers remaining. What is the z-score?

Bayes · Sep 27, 03:01 AM 2010

All members go to the back of the class for not doing their homework.

Seriously though, is there too much maths? I know it's not a popular subject, I tried to keep it to a minimum but sometimes you can't avoid formulas. I read somewhere that for every equation in a book, the sales drop by half.

If you don't understand something, let me know and I'll try to make it clearer, there are no st*pid questions!

Maybe you're just not interested, and are thinking, "what's the point of all this, just give me a winning system!". It's a fair point, but IMHO, an understanding of this stuff can generate ideas, and can save a lot of manual testing and experiment. It can also help you understand why something won't work.

I'll be covering more applications in future posts.

mr.ore · Oct 10, 05:02 AM 2010

z = (0-87*4/37)/sqrt(87*4/37*(1-4/37)) = -3.24737656354395485351

Bayes · Oct 10, 05:58 AM 2010

Z = (0-87*4/37)/sqrt(87*4/37*(1-4/37)) = -3.24737656354395485351 Ã¢Å"â€œ

Thanks mr.ore.

This brings me to a specific application of the formula which can be used to find the longest losing runs. Notice that in that last problem X was zero (corresponding to no wins). You can set X = 0 in the formula and do a little algebra to find another formula:

z = (X - np)/s, where s = Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p))

ie; z = (X - np)/Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p))

Now set X = 0

z = -np/Ã¢Ë†Å¡(np(1 Ã¢Ë†â€™ p))
z² = n²p²/np(1 Ã¢Ë†â€™ p)
z² = np/(1 Ã¢Ë†â€™ p)

n = z²(1 Ã¢Ë†â€™ p)/p

We now have a formula for the longest losing run in terms of the z-score and the probability, p.
But what is realistic number for z to use in the formula? If we take z = 3.00 that will cover 99.7% of cases.

Example 1

What is the longest losing run for a dozen, assuming z = 3 (3 standard deviations from the mean)?

Here, z = 3 so z ² = 9, and p = 12/37 = 0.324, so (1 Ã¢Ë†â€™ p) = 0.676 (rounded up).

So n = (9Ã¢Å"â€¢0.676)/0.324 = 18.78 or 19 rounded up to the nearest whole number.

But hang on. That doesn't seem right. We know from published stats that a dozen can sleep for 30 spins, sometimes more (I think the record is something like 35).

Don't forget that Ã,Â± 3 standard deviations from the mean (a z-score of Ã,Â±3) still leaves 0.3% of sequences which will exceed it - it by no means represents a "cap" on how long a losing run can be. In fact, there's no way of knowing how long it could be, there is no "law" of probability which says when a losing run must end; in fact the maths says that theoretically, the losing run could be infinite. However, practically speaking, we fairly safely assume that a sequence (assuming a random wheel) is extremely unlikely to exceed 5 standard deviations (a z-score of Ã,Â±5).

More later...

Bayes · Nov 02, 08:50 AM 2010

Here is a sample of some "maximum sleeps":

460 - 540 spins with a straight up not showing. (maths expectancy is a hit once every 37 spins)
309 splits with a split not showing. (maths expectancy is a hit once every 18.5 spins or so)
178 spins with a street not showing. (maths expectancy is a hit once every 12.3 spins or so)
155 spins with a corner(square) not showing. (maths expectancy is a hit once every 4.6 spins or so)
93 spins with a line (double-street) not showing... (maths expectancy is a hit every 4.6 spins or so)
36-40 spins with a dozen/column not showing.(maths expectancy is a hit once every 3 spins or so)
20- 36 spins with any even money chance not showing. (expectancy is a hit once every two spins)
10 numbers may sleep for about 45-55 spins

We can re-arrange the formula n = z²(1 Ã¢Ë†â€™ p)/p to find z in terms of n, then use this together with the stats to find the highest z-score which we can reasonably expect. We can then use z²(1 Ã¢Ë†â€™ p)/p with other groups of numbers to find out what the maximum sleep is likely to be.

So, after a little algebra, we have:

z = Ã¢Ë†â€™Ã¢Ë†Å¡(np/(1Ã¢Ë†â€™p))

Take the first stat in the above list: 460 - 540 spins with a straight up not showing

Let's call it 500 spins. Then we can put the numbers into the formula:

z = Ã¢Ë†â€™Ã¢Ë†Å¡(500 Ã¢Å"â€¢ 0.027 / (1 Ã¢Ë†â€™ 0.027)) = -3.725

Now if we do the same for the other stats, and take the average, this should give us a good indication of which value of z to use to calculate the maximum sleep for any set of numbers.

To be continued...

Fripper · Nov 02, 10:00 AM 2010

Hi bayes and thanks for all this stats and effort.

I have a quesiton tho:
We know that a even chance can sleep up to 40 spins. Now I wonder how long half the wheel can sleep, do you have any statistics on this or is it the same as all the other even chance bets?

I mean like numbers 32,15,19.....and so on until and number 10.
How long can this sector sleep? Is it the same?

I'm a little curious

Bayes · Nov 03, 05:59 AM 2010

Quote from: Fripper on Nov 02, 10:00 AM 2010
I have a quesiton tho:
We know that a even chance can sleep up to 40 spins. Now I wonder how long half the wheel can sleep, do you have any statistics on this or is it the same as all the other even chance bets?

Hi Fripper,

I don't have any stats from live wheels on this, but assuming there is no bias or dealer influence, then any 18 numbers will have the same distribution, so the fact that they are adjacent on the wheel shouldn't make any difference - you should find the same deviations.

MrJ · Nov 03, 02:47 PM 2010

This is still one of my favorite threads on any board. Good job Bayes!

Ken

Bayes · Nov 05, 06:47 AM 2010

Thanks ken. I was thinking that maybe I'd overdone the maths, because its supposed to be "in plain English". But you can still use the results even if you don't follow all the equations.

I still have quite a few posts to add in this thread, it's a big subject!

Bayes · Jan 11, 07:21 AM 2011

So, what is the longest losing run we can realistically expect? Based on the above stats and others, we can be pretty sure that it won't get any worse than a z-score of -5.0.

Remember, a z-score of -3.0 covers 99.7% of cases, this is 1 chance in 741 (see this site to calculate the chance of a particular z-score occurring).

So we can plug z = -5.0 into the formula for the longest losing run, which gives:

n = 25(1Ã¢Ë†â€™p)/p - Formula for the longest losing run.

The only thing you need to supply is the value of the probability p.

Remember, p is the quantity of numbers you're betting on, divided by 36, 37, 38 (no-zero, single-single or double-zero wheels respectively).

For example, the longest losing run for a street (3 numbers):

p = 3/37 = 0.0811 (single-zero, to 4 decimal places)

n = 25Ã¢Å"â€¢(1Ã¢Ë†â€™ 0.0811) / 0.0811 = 283

This is nearly 100 spins more than the maximum of 178 spins given in the stats above, but remember this is the worst case scenario, and records are being broken all the time.

Actually, that number (178 spins) is very close to a z-score of -4.0 (181 spins).

This is the master formula:

n = z²(1Ã¢Ë†â€™p)/p -- Longest losing run (general case)

For those not sure, z² means zÃ¢Å"â€¢z (z multiplied by itself once).

For a z of -3.0, z² = 3² = 9 (1 chance in 741 of this)
For a z of -4.0 z² = 4² = 16 (1 chance in 31,574 of this)
For a z of -5.0 z² = 5² = 25 (1 chance in 3,486,914 of this)

Toby · Feb 28, 01:39 PM 2011

Hi Bayes, it happens that you beging to collect trials and when you have about 2000 you may have a 6-number-sector with +3sd. The same sector has 4 sd after 4000 trials keeping the same edge.

You could have a 6-number-sector after 20k with 5sd and an edge of 3% or less.

Is there a way to compare what is better during the wheel clocking?

Example, +4sd on 10k trials or +4sd on 5000 trials? what is better and why?

Toby · May 29, 07:01 PM 2011

There is a confussion when you measure 1 2 3 o r more SDs.

Some players believe that having past data hey can look for -3SD there and try to explote the advantage.

The way is to pick a sector/dozen or so randomly, such as neighbor numbers, ECs or double streets.

It easier to find -3SDs in any group in 200 trials than to decide beforehand what group to check for a drawdown.

I guess we would need more SDs to pick from any of a group from past spins.

Best Regards

iggiv · May 29, 07:20 PM 2011

thanx Bayes. long time no see. u OK?

News:

Standard Deviation and z-score in plain English

esoito