Below are my notes from when I originally took the Udacity course Introduction to Statistics. At that time, I only took notes in Microsoft Word, so there was a real of lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

Unit 1: Introduction
Type A: 80 Friends
Type B: 20 Friends
Expected / Average # of Friends
50% chance of being either SO: (1/2) * 80 + (1/2) * 20 = 50
Chance Of linking to Type A is 0.8 and chance of linking to Type B is 0.2
In Expectation, how many friends does your friend have?
(0.8 * 80) +( 0.2 * 20) = 68
Unit 2: Scatterplots
Most important thing a statistician does is look at data
Previous House Sales in Neighborhood:
Size |
Cost |
1400 |
112000 |
2400 |
192000 |
1800 |
144000 |
1900 |
152000 |
1300 |
104000 |
1100 |
88000 |
How much would you pay for 2100 ft?
Get the mean of 1800 and 2400
80/ft2 found that the price per square foot was constant
Scatterplots: good way to eyeball relationships between variables
Does not have to have a fixed price/ft2 in order to be linear
Price = 30 * size + $2000
Unit 3: Bar Charts
Noise
Bar Charts 2D
Histograms 1D (Y axis becomes ‘count’ of that data)
Histogram is a count of each “bucket” (grouping the data)
Unit 4: Pie Charts
Use to visualize relative data
Wonderful for comparing
Unit 5: Programming Charts
barchart(Height,Weight)
(height is x-axis, weight is y-axis)
Unit 7: Admissions Case Study
Did admissions policy have gender discrimination (UC Berkley)
Gender bias because the numbers are different for each gender
Who is being favored?
Males have higher admissions rate overall
Majors individually show that females have higher admission rate
Statistics is: deep and often manipulated (be skeptical)
Unit 8: Probability
Probability is opposite of statistics
P(A) = 1 – P(7A)
Probability of A equals 1 minus the Probability of Not-A
P(H,H) where P(H) = 0.5
P(H,H) = 0.25
Truth Table:
P(DOUBLE) for throwing fair die
6 sides for each dice (6 * 6) = 36 possible combinations, only 6 of them are possible to be doubles SO 6/36 = 1/6 or 0.167
Summary
-Probability of event P()
-Probability of opposite event 1-P
-Probability of composite event P*P…*P (independence)
Unit 9: Conditional Probability
Dependent things
Like two coin flips one is whether smart or dumb, the second is professor at Stanford
P(POSITIVE | CANCER)
P(NEGATIVE | CANCER)
This is conditional probability: What is the probability of the stuff on the left, GIVEN that we assume the stuff on the right is actually the case (bar in the middle divides the sides)
P(c)
P(P|c)
P(P|7c) (the 7 is the shortcut symbol for “not”)
P(P) = P(P|c)*P(c) + P(P|7c)*P(7c) Total Probability
Summary
P(TEST | DISEASE)
P(TEST) = [P(TEST | DISEASE) * P(DISEASE)] + [P(TEST|7DISEASE) * P(7DISEASE)]
Unit 10: Bayes Rule
Unit 11: Programming Probabilities
Code for 3 coin flips containing exact 1 heads (so it’s doing 1 – P to get the probability of a ‘tails’, and we plug in the probability for ‘heads’ into the function itself. We multiply by 3 because we KNOW that only 3 cases has exactly 1 heads)
EXAMPLE
#Return the probability of flipping one head each from two coins
#One coin has a probability of heads of p1 and the other of p2
def f(p1,p2):
return p1 * p2
print f(0.5,0.8)
ABOVE EXAMPLE WRITTEN IN PYTHON
#Two coins have probabilities of heads of p1 andd p2
#The probability of selecting the first coin is p0
#Return the probability of a flip landing on heads
def f(p0,p1,p2):
return (p0 * p1) + ((1-p0) * p2)
print f(0.3,0.5,0.9)
ANOTHER EXAMPLE
#Calculate the probability of a positive result given that
#p0=P(C)
#p1=P(Positive|C)
#p2=P(Negative|Not C)
def f(p0,p1,p2):
return (p0 * p1) + (1-p0) * (1-p2)
print f(0.1,0.9,0.8)
ALL OF BAYES RULE
#Return the probability of A conditioned on B given that
#P(A)=p0, P(B|A)=p1, and P(Not B|Not A)=p2
def f(p0,p1,p2):
return (p0 * p1) / ((p0 * p1) + (1-p0) * (1-p2))
print f(0.1,0.9,0.8)
BAYES RULE 2
#Return the probability of A conditioned on Not B given that
#P(A)=p0, P(B|A)=p1, and P(Not B|Not A)=p2
def f(p0,p1,p2):
return (p0 * (1-p1))/((p0 * (1-p1)) + ((1-p0) * p2))
print f(0.1,0.9,0.8)
Unit 11A: Probability Distributions
Continuous Probability Distributions every outcome has probability of 0
Density Probability for Continuous Spaces
f(x) is density of x, p(x) is probability (two different things)
Density is whatever you multiple the difference between the two by in order to equal 1
So between 3 and 3.5 there is .5, multiply by 2 to get 1, so density = 1 (density can be larger than 1)
Density can be 0, which is positive but is non-negative
Unit 12: Correlation vs Causation
Should you stay at home? No
Hospitals do not cause sick people to die. Based on correlation data it makes it seem like you’re more likely to die, but it does not mean it will increase your chance.
Changes of dying in hospital are 40 times larger than at home correlation
Being in a hospital increases your probability of dying by a factor of 40 causal statement
If confounding variable is omitted, correlations are made that could be misleading (when sick variable is taken into account, realize there is a negative correlation, but when omitted it is a positive correlation)
Reverse Causation: size of fire causes # of fire fighters, OR # of fire fighters causes size of fire
Problem Set 2:
Top: 4 / 16 = .25 (4 instances of 1 H, 16 possible combos)
Bottom: 1 / 16 (only 1 instance of H as first)
Use Bayes Rule below:
Unit 13: Estimation
Point where you get greatest probability (starts going back down when p > 2/3, so 2/3 is the maximum likelihood estimator)
lapacian = add one fake data point for each possible outcome
Unit 14: Averages
Mean = sum / count
Mode = most repeated
Median = middle number (if 2, take the mean)
Unit 15: Variance and Standard Deviation
Variance is the sum of variance (Xi – mean) squared normalized (multiplied by 1/n, or count)
It is the measure of how far the data is spread from the mean
Variance = Average quadratic deviation from the mean
Standard deviation = square root of variance
O is sigma, sigma^2 is variance, sqr rt of sigma^2 is standard dev or just sigma
N is count of all numbers
Sum of Xi is the sum of all the numbers
Sum of Xi^2 is the sum of each number squared and then added together
Standard Score = x – m / sigma
X = score given
M = mean
Sigma = std dev
Unit 17: Programming Esimators
MEAN:
#Complete the mean function to make it return the mean of a list of numbers
data1=[49., 66, 24, 98, 37, 64, 98, 27, 56, 93, 68, 78, 22, 25, 11]
def mean(data):
return sum(data) / len(data)
print mean(data1)
MEDIAN:
#Complete the median function to make it return the median of a list of numbers
data1=[1,2,5,10,-20]
def median(data):
sdata = sorted(data)
mnum = len(data) / 2
return sdata[mnum]
print median(data1)
MODE:
#Complete the mode function to make it return the mode of a list of numbers
data1=[1,2,5,10,-20,5,5]
def mode(data):
mcount = 0
for i in range(len(data)):
icount = data.count(data[i])
if icount >= mcount:
mode = data[i]
mcount = icount
return mode
print mode(data1)
VARIANCE:
#Complete the variance function to make it return the variance of a list of numbers
data3=[13.04, 1.32, 22.65, 17.44, 29.54, 23.22, 17.65, 10.12, 26.73, 16.43]
def mean(data):
return sum(data)/len(data)
def variance(data):
#ndata = []
variance = 0
mu = mean(data)
for i in range(len(data)):
variance = variance + (data[i] – mu)**2
return variance/len(data)
print variance(data3)
Other way to calculate the variance
STANDARD DEVIATION
#Complete the stddev function to make it return the standard deviation
#of a list of numbers
from math import sqrt
data3=[13.04, 1.32, 22.65, 17.44, 29.54, 23.22, 17.65, 10.12, 26.73, 16.43]
def mean(data):
return sum(data)/len(data)
def variance(data):
mu=mean(data)
return mean([(x-mu)**2 for x in data])
def stddev(data):
return sqrt(variance(data))
print stddev(data3)
Problem Set 3: Estimators
When increasing by ratio, mean and std dev increase by same ratio. Variance is square of std dev. Standard score stays the same
Measure of spread, it’s clear that adult is more spread out so adult must have bigger std dev
#In class you wrote a function mean that computed the mean of a set of numbers
#Consider a case where you have already computed the mean of a set of data and
#get a single additional number. Given the number of observations in the
#existing data, the old mean and the new value, complete the function to return
#the correct mean
from __future__ import division
def mean(oldmean,n,x):
return ((oldmean * n) + x) / (n + 1)
currentmean=10
currentcount=5
new=4
print mean(currentmean,currentcount,new) #Should print 9
LIKELIHOOD:
#Compute the likelihood of observing a sequence of die rolls
#Likelihood is the probability of getting the specific set of rolls
#in the given order
#Given a multi-sided die whose labels and probabilities are
#given by a Python dictionary called dist and a sequence (list, tuple, string)
#of rolls called data, complete the function likelihood
#Note that an element of a dictionary can be retrieved by dist[key] where
#key is one of the dictionary’s keys (e.g. ‘A’, ‘Good’).
def likelihood(dist,data):
likelihood = 1
for i in range(len(data)):
likelihood = likelihood * dist[data[i]]
return likelihood
tests= [(({‘A’:0.2,’B’:0.2,’C’:0.2,’D’:0.2,’E’:0.2},’ABCEDDECAB’), 1.024e-07),(({‘Good’:0.6,’Bad’:0.2,’Indifferent’:0.2},[‘Good’,’Bad’,’Indifferent’,’Good’,’Good’,’Bad’]), 0.001728),(({‘Z’:0.6,’X’:0.333,’Y’:0.067},’ZXYYZXYXYZY’), 1.07686302456e-08),(({‘Z’:0.6,’X’:0.233,’Y’:0.067,’W’:0.1},’WXYZYZZZZW’), 8.133206112e-07)]
for t,l in tests:
if abs(likelihood(*t)/l-1)<0.01: print ‘Correct’
else: print ‘Incorrect’
Unit 17: Outliers
Quartiles
Unit 18: Binomial Distribution
To calculate # of outcomes:
10 coins, 5 heads
Numerator : Multiply the coins on the top (10 * 9 * 8 * 7 * 6) since there are 10 coins and only using 5, we would multiply them until we get down to the # we need
Denominator: The number of desired outcome to place remaining (5 * 4 * 3 * 2 * 1) since we want 5 heads, we start at 5 and subtract 1 each time (this compensates for the amount of times we ‘overcount’ when multiplying the top)
(10 * 9 * 8 * 7 * 6) = 30,240
(5 * 4 * 3 * 2 * 1) = 120
30,240 / 120 = 252
The (n-k)! cancels out the rest of the numerator’s factorial that would have continued down to 1
.032 1 HEADS (3)
.128 2 HEADS (3)
.512 3 HEADS (1)
.008 3 TAILS (1)
.68 TOTAL
(3 * .032) = 0.096
n! / (n – k)! * k!
5! / (5 – 4)! * 4! = 5
THHHH 0.2 * (0.8 ^ 4) = .08192 * 5 possibilities (see above) = .4096
5! / (5 – 3)! * 3!, or (5*4*3) / (3*2*1) = 10
TTHHH = (.2 ^ 2) * (.8 ^ 3) = .02048 * 10 possibilities = .2048
Unit 19A: Central Limit Theorum