Wednesday, February 22, 2012

02-22-2012

3.1 # 7B) Put them in order (from least to greatest) then find the middle. If it's even we will take  the two middle observations sum them then divide by two in order to create the median. If it's odd add one to the total number of observations and divide by two to find your median observation.
7C) 15 and 5 are the mode.

#25) Mean, median and mode are all halved

3.2 #26) Don't care about variance. Range = Highest value-lowest value. Std Deviation: Just pull up data sets from publisher's site or CD-ROM and use Minitab (Stat>Basic Stat>Descriptive Statistics)

Standard Deviation - What's normal and acceptable variability (sets the limit for how far is too far)

3.6 Robust Measures
Range is determined by subtracting the lowest value from the highest value, which includes outliers (on either end). It merely measures the distance from top to bottom, doesn't tell you anything in between.

Interquartile Range (IQR) - Contains the middle 50% of the data. It's a much better "range" to give as it excludes outliers.
IQR = Upper Quartile (Q3)- Lower Quartile (Q1).

What's a quartile? It's basically the median of your median. Given median is "the middle" if you find the middle of middle you are given a quarter. You use the same mathematical methodology to find quartiles that you used to determine median.

To get an even better picture of our data set, we will use the Five Number Summary (FNS).
What Five Numbers? (from left to right) Minimum, Q1, Median (Q2), Q3, Maximum.


What is a distribution? Pattern of variability. How does the data change from one observation to another.

1.5IQR - Method of calculating outliers with the IQR. Multiply 1.5 x (IQR value)
Add the (1.5IQR value) to Q3 (Q3 + (1.5IQR value)), any observations greater than this number is considered an outlier.
Subtract the (1.5IQR value) from Q1 (Q1 - (1.5IQR value)), any observations less than this number is considered an outlier.
Exercise comparing average monthly high temperatures for San Francisco, CA and Raleigh, NC.


San Francisco:
58, 59, 61, 62, 64, 65, 65, 68, 68, 69, 70, 71
  1. What is the mean (x-bar) temperature? 65
  2. What is the median temperature? 65
  3. What is the minimum temperature? 58
  4. What is the maximum temperature? 71
  5. What is the temperature range? 13
  6. What is the temperature of the first quartile (Q1)? 61.5
  7. What is the temperature of the third quartile (Q3)? 68.5
  8. What is the interquartile range (IQR)? 7
Interpretation of the IQR: the middle 50% varies by 7 degrees.

Raleigh:
49, 52, 53, 60, 61, 70, 71, 78, 80, 84, 86, 88
  1. What is the mean (x-bar) temperature? 69.3
  2. What is the median temperature? 70.5
  3. What is the minimum temperature? 49
  4. What is the maximum temperature? 88
  5. What is the temperature range? 39
  6. What is the temperature of the first quartile (Q1)? 56.5
  7. What is the temperature of the third quartile (Q3)? 82
  8. What is the interquartile range (IQR)? 25.5

Interpretation of the IQR: The middle 50% varies by 25.5 degrees.

Given what you now know about the climate of these two cities, which city would you expect has a larger standard deviation? Which city would you expect has a smaller standard deviation? San Francisco will have a smaller standard deviation because it has lower total variability.
Minitab exercises
Getting the Five Number Summary from Minitab


Get the temperature data set off Blackboard> Course Documents>"Temps"
Stat> Basic Stats> Display Descriptive Statistics>

Select the variables of interest (both "San Francisco" and "Raleigh").

Then select "OK" (highlighted text contains five number summary)
Interpreting the data: "Most of the time it will be between 61 and 69 degrees in San Francisco"



How to construct a Boxplot in Minitab
Graph> Boxplot>

Select "Multiple Y's - Simple"

Select the variables of interest (both "San Francisco" and "Raleigh").

Select "OK"
Interpretation: Line connects minimum to Q1 and Q3 to maximum. Box contains IQR (Q1 to Q3) with middle 50% of the data.

Constructing a Boxplot in Minitab pt. II
Blackboard> Course Documents > Golf PGA '09

Create a new variable to make the data easier to manage.
Calc> Calculator

Store result in a new column to not overwrite any data, expression is Earnings divided by 1,000,000.

Select "OK" (output is highlighted)

Stat> Basic Stat> Display Descriptive Statistics>

Select the new column (e.g. "earnings in millions")

Select "OK"

Graph > Boxplot

Select "One Y - Simple"

Select variable of interest (e.g. "earnings in millions"), then select "Scale"

After you've selected "Scale", check "Transpose value and category scales"

Select "OK" to produce the boxplot
Notice that this boxplot is horizontal? That's the result of "Transpose value and category scales". Graph is right-skewed.
Note: Minitab produces modified boxplots, meaning that it goes to the value nearest the 1.5 IQR that doesn't exceed that 1.5 IQR value. Asterisks indicate outliers.



No comments:

Post a Comment