Wednesday, April 11, 2012

04-11-2012

Section 5.3, page 234
Sampling with replacement - Each trial is an independent event. (e.g. pulling names from a hat and returning the names to the hat after they've been pulled)

Sampling without replacement - Each trial is a dependent event, odds of an event happening increase with each trial (e.g pulling a name from a hat, and don't replace them, if you continue to pull names your name will eventually come up- thus your odds increase with each trial).



ELISA test of HIV example
The ELISA test reports a positive result 99.6% if blood has HIV, therefore it reports a false negative (meaning the test says you don't have HIV, but you do) 0.4% of the time (1-.996 = .004)

If blood has no HIV the test reports a negative result 98% of the time, conversely the false positive (meaning the test says you have HIV, but you don't) rate is 2% (1-.98 = .02).

If the prevalence of HIV is 0.5% and we collect blood samples from 100,000 randomly selected people.

How many people will have HIV?
Well if 0.5% of the population have HIV and we have 100,000 people, we simply multiply the population by the percentage: (0.005)*(100000) = 500.
500 people will have HIV


How many people do not have HIV?
100000 - 500 = 99500.
99500 will not have HIV



Population
HIV Positive
500
HIV Negative
99500




How many of the 500 HIV positive people will the test detect?
The test accurately reports positive 99.6% of the time if the blood has HIV, so (0.996)* (500) = 498
498 HIV positive people will be detected by the test

How many of the 500 HIV positive people will the test miss?
The test erroneously reports negative (false negative) 0.4% of the time if the blood has HIV, so (0.004) * (500) = 2
Alternatively, you could acknowledge that the false negative is the compliment of the positively detected group, so 500-498 = 2.
2 HIV positive people will be erroneously reported as HIV negative (missed by the test).

How many of the 99500 HIV negative people will the test detect?
The test accurately reports a negative result 98% of the time, so (0.98) * (99500) = 97510
97510 HIV negative people will be detected by the test

How many of the 99500 HIV negative people will the test miss?
The test erroneously reports positive (false positive) 2% of the time if the blood lacks HIV, so (0.02)*(99500) = 1990
Alternatively, you could acknowledge that the false positive is the compliment of the negatively detected group, so 99500-97510 = 1990
1990 HIV negative people will be erroneously reported as HIV positive (missed by the test).

Using this data to create a cross-tabulation table...


Actually Positive (+) Actually Negative (-) Total
Test Positive (+)
498
1990
2488
Test Negative (-)
2
97510
97512
Total
500
99500
100000

Proportion of people that are actually HIV positive given ELISA reported negative?
p(HIV+ | ELISA-) = 2 / 97512
p(HIV+ | ELISA-) = 0.00002051
0.002051% of the time ELISA will report negative when the sample is HIV positive

Proportion of people that are HIV positive given ELISA reported positive? 
p(HIV+ | ELISA+) = 498 / 2488
p(HIV+ | ELISA+) = 0.200160772
20% of the time ELISA will report positive when the sample HIV positive

Proportion of people that are HIV negative given ELISA reported positive?
p(HIV- | ELISA+) = 1990 / 2488
p(HIV- | ELISA+) = 0.799839228
Nearly 80% of the time ELISA will report positive when sample HIV negative


Why would the ELISA be designed to report HIV positive when the sample is actually HIV negative more frequently than report HIV negative when the sample truly is HIV positive (false negative)?
A person who receives a false negative from ELISA could potentially spread the infection further given the epidemic nature of the illness. Thus, ELISA was designed in order to keep the false negative rate as low as possible.

6.1)

Discrete Random Variable - Specific number of probable outcomes. If you chose from the class - 40 possible outcomes.

Continuous Random Variable - If class ran a mile - Infinite number of possible outcomes.

If you would like to augment your knowledge of this subject, please watch this video.

6.2)
Binomial Distribution or Binomial "pattern of variability"
How do you determine if it's a Binomial Distribution?

  1. Two possible outcomes (event happens or it doesn't)
  2. Fixed number of trials/attempts (I will play $1 on the slot machine, as opposed to playing until I win or run out of money)
  3. Each outcome is independent (the outcome is not contingent upon the previous outcome, for example if you were to flip a coin- the coin is not going to "remember" to land on heads the second flip because it landed on heads the first flip)
  4. Probability remains constant (success/failure rate remains the same from one trial to the next, odds do not change as you continue
If you are unclear on any of this, please watch this video


Is flipping a coin three times for a heads a binomial distribution?
1) 2 possible outcomes? Yes (heads or not heads)
2) Fixed number of trials? Yes (3)
3) Outcomes independent? Yes
4) Probability remains constant? Yes
All 4 conditions are met, this qualifies as a binomial distribution.

34% of burglars enter through the front door, is a study of 36 burglaries a binomial distribution?

1) 2 possible outcomes? Yes (front door entry or not)
2) Fixed number of trials? Yes (36)
3) Outcomes independent? Yes (36 different burglaries, they have nothing to do with one another)
4) Probability remains constant? Yes (34%)
All 4 conditions are met, this qualifies as a binomial distribution.

Formula for Binomial distribution by hand p. 277


Binomial Distribution in Minitab

In this example we will use a binomial distribution to randomly generate the probability of your four hypothetical children being female.

Calc> Random Data> Binomial Distribution

1000 rows (trials), storing in C1, 4 trials, probability of event (.51)

Select OK> Number of girls in each family is now in each cell.

Stat> Tables> Tally Individual Variables>

Select C1, include "Counts" and "Percents"

Select OK>

Interpreting results:
p(0 females) = 5.2%
p(1 female) = 24.0%
p(2 females) = 38.1%
p(3 females) = 23.7%
p(4 females) = 9.0%

p(3 of 1 gender) = (1B & 3G) + p(3G+1G) = 24+23.7 = 47.7%


Suppose you wanted to grow your happy hypothetical family from 4 children to 10 children. What happens to the probability of having a female?

Calc> Random Data> Binomial Distribution> 1000 rows (trials), storing in C2, 10 trials, probability of event (.51)

Resulting data is stored in C2>

Stat> Tables> Tally Individual Variables> Select C2, include "Counts" and "Percents"> Select OK>


Interpreting results:
p(0 females) = 0%
p(1 females) = 0.7%
p(2 female) = 3.6%
p(3 females) = 10.2%
p(4 females) = 17.6%
p(5 females) = 26.9%
p(6 females) = 22.3%
p(7 females) = 12.8%
p(8 females) = 5.3%
p(9 females) = 0.6%
p(10 females) = 0%


What have we learned? The more trials (children) you have, the probability decreases. If you wanted the best odds of having a child of each gender you're best off stopping at 2 children, continuing to have kids will not increase the odds!

No comments:

Post a Comment