Math: Statistics

Mean, Median, Mode

1) mean is the average: mean = sum of N entries/N

2) often, in questions about means, it is helpful to think in terms of sums: sum of entries = N*(mean)

3) the median is the ‘middle’ of an ordered list. If two numbers are in the ‘middle’, then we average these two numbers

4) the mode, the most frequently appearing number, is less important. Some lists have a mode, some have more than one, and many have no mode at all

Change the highest/lowest number on a list would not change the median, but it would change the mean

More on Mean and Median

Numbers far away from the center of the list are called ‘outliers’ heavily influenced mean but not median

1) if all the numbers on a list are evenly spaced, or if the list is symmetrically distributed, then mean = median

2) outliers pull the mean away from the median

3) we often can compare mean & median - or infer which one got bigger or smaller - without a calculation, purely by observing the direction of outliers

Weighted Averages 1

1) one way to approach weighted averages is to find sums

2) we can also find the proportion of each group, and multiply each group average by that group’s proportion

Average of whole = A_1p_1 + A_2p_2 + A_3p_3

Weighted Averages 2

If there are only two groups, the distances from the two group averages to the total average are in a ratio that is the reciprocal of the ratio of the proportions.

Range and Standard Deviation

1) measures of spread tell us how far apart numbers on a list are from each other

2) range = max - min

3) if all the numbers on a list are identical , then the SD = 0

4) if all the numbers on a list are the same distance from the mean, SD = that distance

5) lots of points close to the mean => small SD; lots of points far from the mean => large SD

6) list +/- K does not change SD

7) (list)*K -> K*(SD)

More Standard Deviation

1) the effect on the SD of including a new pair of numbers in a set

The more close to the mean, the more SD decrease.

2) we discussed using the SD as a unit to indicate the position of an individual in a large population

If a is 10 units above the mean, and the SD = 2, then this individual is 5 SDs above the mean

Normal Distribution (a.k.a ‘the bell curve’)

1) Histograms visually display data; the heights of the bars represent the ‘frequency’.

2) for population-size set, histogram 柱状图 become smooth distributions

3) you need to know the normal distribution (a.k.a. The ‘bell curve’)

4) on any normal distribution, 34% of the population is between M and (M+S), and 13.5% is btw (M+S) and (M+2S)

Quartile and Boxplots

区分 mean & median

1) the quartiles (Q_1, median, Q_3) divide the whole list into four equal lists

2) Q_1 is the median of the ‘lowest list’. Q_3 is the median of the ‘upper list’

3) the max & min and three quartile numbers constitute the ‘five number summary’, and these determine the five vertical lines on a boxplot

Note: there is never any guarantee that the median will be the average of the max & the min

More on Boxplots

1) the box of the boxplot is the middle 50% of the population

2) the size of this middle 50% is called the interquartile range, IQR, and equals Q_3 - Q_1

3) the ‘percent’ interpretations of median and the quartiles works only if the data set is the size of a real population

Percentiles

Notice:

1) percentiles only make sense for LARGE distributions

2) the percentile of the lowest score is the 0th percentile

3) the percentile of the highest score is the 99th percentile. (There is no such thing as a ‘100th percentile’!!)

Halfway btw percentile is NOT the same as halfway btw scores