Monday, February 13, 2012

02-13-2012


Measures of center:
Mean (the arithmetic mean or average) - Add all observations up and divide by the number of observations. Good for quantitative variables. Easily influenced by outliers.
    • If the mean is with respect to a population we use: μ (mu)
    • If the mean is with respect to a sample we use: (x bar)
X bar = the sum (sigma) of numbers from i (which equals 1 in this case) to n (number of observations) all divided by n (number of observations). As complex as this expression looks, it's really shorthand for: (observation1 + observation2 + observation3 + observation4 + observation5) / 5. Why 5? Because there are 5 observations
Median (middle) - Value exactly in the middle of the data after the data is put in order (ascending or descending). Good for quantitative variables.
Calculating median for odd numbers: (n+1)/2
Calculating median for even numbers: (middle 2 observations)/2

Mode - Most frequent number/observation. Good for quantitative and categorical variables, but usually used for categorical variables.

If you are unclear on any of this please watch this video

Minitab exercises:

File from Blackboard > Course Documents > Golf > "PGA Earnings 09.mtw"

Stat > Basic Stats

Stats > Basic Stats > Display Descriptive Statistics

Select "Earnings"

Click "OK" which will yield these descriptive statistics:

 Please make note of the Mean and Median, then make a histogram (if you forgot how check 02-06-2012) to check for outliers.

Notice that outlier? Let's see what our Mean and Median would look like without Tiger Woods. Replace his "Earnings" with an asterisk (*) this tells Minitab to skip this observation.

Run the Descriptive Statistics again, make a note of Mean and Median.

What would happen to the Mean and Median if we made the outlier even more drastic? Add an extra zero to Tiger Woods' earnings.


Run the Descriptive Statistics on last time, note the Mean and Median.

What have we learned?
"The mean is more susceptible to outliers because everyone is include in the mean value"
If the data is skewed: use the median.
If the data is symmetric: use the mean.

However, we will always be use both (mean and median).

No comments:

Post a Comment