Wednesday, May 9, 2012

05-09-2012


Central Limit Theorem (CLT) for Means (x-bar)  - Allows us to figure out how samples of varying sizes behave in the long run.

The Central Limit Theorem states that:
1) Shape: Is normal or approximately normal
2) Center: μ(subscripted x-bar) = μ, meaning the center for both the population and the sample is the same.
3) Spread: σ(subscripted x-bar) = σ/√ n (standard deviation divided by the square root of the sample size)


Note: We can only use the CLT when n ≥ 30 OR the population is normally distributed

Example 7.12 (p. 357)
μ = 12, 485
σ = 21,973
n = 10
Notice that the standard deviation is enormous looking at the provided histogram we observe that our distribution is right skewed. Thus, the distribution is NOT normally distributed. If we use the CLT to proceed we will get inaccurate data. Therefore we do not continue because we do not have enough information.

Now let's suppose our sample size was 36 as opposed to 10.
Now that n ≥ 30, we can use the CLT.

So using what we know about the CLT we can conclude three things.
1) Shape. Shape will be approximately normal
2) Center. Centered at μ (12, 485)
3) Spread.σ/√ n. Plugging in 21973/√ 36 = 21973/6 = 3662.17

Suppose we are interested in the probability that the mean is greater than 17,000.
p(x-bar > 17,000)
In order to solve this problem, we are going to use the Z-score formula
z = (observed - expected) / standard deviation
The only difference in using the z score formula in conjunction with the CLT is that the standard deviation we plug in is the one we have found using σ/√ n.

So for this problem:
(17000-12485) / 3662.17 = 1.23
Going to the Z-table with this value of 1.23 we find a value of 0.8907.
However the question posed was the probability of finding a sample mean greater than 17,000 so we need to do 1-0.8907 to find the area to the right. Subtracting we find the probability of finding a sample mean higher than 17,000 to be 0.1093. Interpretation: 10.93% chance that a random sample size of 36 cities will give you greater than 17,000 small businesses.



Central Limit Theorem for Proportions (p-hat)


The Central Limit Theorem maintains:
1) Shape: Is normal or approximately normal
2) Center: μ(subscripted p-hat) = P, meaning the center for both the population and the sample is the same.
3) Spread: σ(subscripted p-hat) = √[(p(1-p))/n]


Note: We can only use the CLT for proportion when np ≥ 5 AND n(1-p) ≥ 5



In class example with Reese's Pieces applet.

We are interested in the proportion of candies that are orange.
Setting π = 0.40 (makes the simulation machine produce 40% orange candies)
n = 25
Then draw a sample, Brandon got 0.36.


Applying what the Central Limit Theorem tells us about sampling variability.
1) Shape: Is normal or approximately normal
2) Center: μ(subscripted p-hat) = 0.4
3) Spread: σ(subscripted p-hat) = √[(0.4(1-0.4))/25] = .0979 = .0980

Checking to see if it meets the prerequisite criteria: 25(0.4) = 10. 10 > 5, so we're good there.
25(1-0.4) = 15, 15>5. Both criteria have been met.

What's the probability of observing a bag of Reese's pieces with 24% orange candies?
Z score time!
( 0.24 - 0.4 ) / 0.098 = -1.63
Finding -1.63 on the z table, we find that the probability is .0576.
Interpretation: 5.76% chance of finding a package with 24% or fewer orange candies.

What's the probability of observing a bag of Reese's Pieces with 60% or greater orange candies?
( 0.6 - 0.4 ) / 0.098 = 2.04
Find 2.04 on the Z table, we find the probability to be 0.9793. However, that .9793 refers to the area to the left, we're interested in the area to the right, so we do 1-0.9793 and find the the probability to be .0207.
Interpretation: 2.07% chance of finding a packaged with 60% or greater orange candies in any given package of Reese's Pieces.



If the class continues to take increasingly larger samples we tighten up the variability and come closer to our intended proportion of 0.40. Observe the trend in the table below:


Sample Size (n) Class Low Class High
25 .24 .60
50 .22 .50
75 .29 .53
100 .30 .52
500 .36 .43


The larger the sample size the closer the values are to our intended 0.40.
In order to cut the standard deviation in half you need to quadruple the sample size.

No comments:

Post a Comment