Monday, April 23, 2012

04-23-2012

Remember me?
The Standard Normal Distribution (Z Distribution) [video]:

  • The area under the curve is always equal to one (1.0), because the total probability for any event is 1.0 (As you may recall the probability of 1 means that the event always happens or has a 100% probability - which is our maximum)
  • The distribution is always centered around the mean (µ), which has a standard deviation of 0. 50% of the data lies above the mean and 50% of the data lies below the mean.
  • To find Z scores: 


Z Table (pdf) - Gives you area under the curve for the specific standard deviation you are interested in.
Normal distribution (java applet) - Helps you visualize what you are trying to find


What is the area to the left of Z-score value 0.57?
First we find the 0.5 row under the Z column on the table, then move across that row to find 0.07. This will give us the area under the curve at 0.57 which is: 0.7157
p(Z<0.57) = 0.7157


Suppose we want to know the area to the right of the Z-score 0.57, how would we do that given that the Z-Table only provides us with area to the LEFT of the Z score? Well we use our knowledge that the entire curve accounts for a total probability of 1. If we remove the section to the left (which we can find easily from the Z-table) we will be left with the area to the right!
p(Z>0.57) = 1 - 0.7157
p(Z>0.57) = 0.2843

Alternatively, the area to the right of the positive value is same as the area to the left of negative value due to the symmetric nature of the distribution.
p(Z>0.57) = p(Z<-0.57) = 0.2843


Remember the Empirical Rule? You know, one standard deviation being 68%, two standard deviations being 95% and three standard deviations being 99.7%? Well let's see how accurate that rule of thumb is.

So what is the probability that Z is greater than negative one standard deviation and less than one standard deviation? p(-1 < Z < 1) ?
To find the area between two points it's best to approach it as two separate problems, so let's find the highest value first.

Let's draw a picture of what we're trying to find so we can visualize the problem, this java applet should help you on your way.

Looking at the Z table, what's the probability that Z is less than positive one? p (Z < 1.00) = 0.8413
Looking at the Z table, what's the probability that Z is less than negative one? p(Z < -1.00) = 0.1587

Since we are interested in the area between -1 and 1, we don't really want the area to the left of -1 as we have found, so we can just subtract it from the area to the left of positive 1 and we will have our answer.
0.8413 - 0.1587 = .6826
p(-1 < Z < 1) = .6826, or 68.26% of the data lies between 1 standard deviation
Pretty close to the Empirical Rule's estimate of 68% of the data falling within 1 standard deviation.

p(-2 < Z < 2) ?
p(Z<2.00) = 0.9772
p(Z<-2.00) = 0.0228

p(-2 < Z < 2)  = 0.9772 - 0.0228
p(-2 < Z < 2)  = 0.9544

Again, pretty to the Empirical Rule's estimate of 95%

p(-3 < Z < 3) ?

p(Z<3.00) = 0.9987
p(Z<-3.00) = 0.0013

p(-3 < Z < 3)  = 0.9987 - 0.0013
p(-3 < Z < 3)  = 0.9974
Also close to the Empirical Rule's estimate of 99.7%

But what if we wanted to know what Z scores correspond with exactly 68%?
Well if we know the entire area under the curve is 100% or 1.0 and we want to capture the middle 68%
1-0.68 = 0.32 so there will be 32% of the data unaccounted for, but this is not at one end, it's distributed evenly at both tails because the distribution is symmetric. So 0.32/2 = 0.16
1-0.16 = 0.8400

Now we go to the Z-Table and look for the value closest to 0.8400, after some searching we find the values 0.8365 (located at 0.98), 0.8389 (located at 0.99), and 0.8413 (located at 1.0). Unfortunately 0.8413 is greater than 0.8400 so we will go with the next highest value 0.8389. So if we're interested in the middle 68% we're looking at p( -0.99 < Z < 0.99)

1 comment:

  1. Very descriptive and knowledgeable blog.In statistics, a standard score indicates how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This conversion process is called standardizing or normalizing.
    Standard scores are also called z-values, z-scores, normal scores, and standardized variables; the use of "Z" is because the normal distribution is also known as the "Z distribution".I hope you will like this information.

    ReplyDelete