Below are my notes from when I originally took the Udacity course Introduction to Statistics. At that time, I only took notes in Microsoft Word, so there was a real of lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

Unit 1: Introduction

Type A: 80 Friends

Type B: 20 Friends

Expected / Average # of Friends

50% chance of being either SO: (1/2) * 80 + (1/2) * 20 = 50

Chance Of linking to Type A is 0.8 and chance of linking to Type B is 0.2

In Expectation, how many friends does your friend have?

(0.8 * 80) +( 0.2 * 20) = 68

Unit 2: Scatterplots

Most important thing a statistician does is look at data

Previous House Sales in Neighborhood:

Size |
Cost |

1400 |
112000 |

2400 |
192000 |

1800 |
144000 |

1900 |
152000 |

1300 |
104000 |

1100 |
88000 |

How much would you pay for 2100 ft?

Get the mean of 1800 and 2400

80/ft2 found that the price per square foot was constant

Scatterplots: good way to eyeball relationships between variables

Does not have to have a fixed price/ft2 in order to be linear

Price = 30 * size + $2000

Unit 3: Bar Charts

Noise

Bar Charts 2D

Histograms 1D (Y axis becomes ‘count’ of that data)

Histogram is a count of each “bucket” (grouping the data)

Unit 4: Pie Charts

Use to visualize relative data

Wonderful for comparing

Unit 5: Programming Charts

barchart(Height,Weight)

(height is x-axis, weight is y-axis)

Unit 7: Admissions Case Study

Did admissions policy have gender discrimination (UC Berkley)

Gender bias because the numbers are different for each gender

Who is being favored?

Males have higher admissions rate overall

Majors individually show that females have higher admission rate

Statistics is: deep and often manipulated (be skeptical)

Unit 8: Probability

Probability is opposite of statistics

P(A) = 1 – P(7A)

Probability of A equals 1 minus the Probability of Not-A

P(H,H) where P(H) = 0.5

P(H,H) = 0.25

Truth Table:

P(DOUBLE) for throwing fair die

6 sides for each dice (6 * 6) = 36 possible combinations, only 6 of them are possible to be doubles SO 6/36 = 1/6 or 0.167

Summary

-Probability of event P()

-Probability of opposite event 1-P

-Probability of composite event P*P…*P (independence)

Unit 9: Conditional Probability

Dependent things

Like two coin flips one is whether smart or dumb, the second is professor at Stanford

P(POSITIVE | CANCER)

P(NEGATIVE | CANCER)

This is conditional probability: What is the probability of the stuff on the left, GIVEN that we assume the stuff on the right is actually the case (bar in the middle divides the sides)

P(c)

P(P|c)

P(P|7c) (the 7 is the shortcut symbol for “not”)

P(P) = P(P|c)*P(c) + P(P|7c)*P(7c) Total Probability

Summary

P(TEST | DISEASE)

P(TEST) = [P(TEST | DISEASE) * P(DISEASE)] + [P(TEST|7DISEASE) * P(7DISEASE)]

Unit 10: Bayes Rule

Unit 11: Programming Probabilities

Code for 3 coin flips containing exact 1 heads (so it’s doing 1 – P to get the probability of a ‘tails’, and we plug in the probability for ‘heads’ into the function itself. We multiply by 3 because we KNOW that only 3 cases has exactly 1 heads)

EXAMPLE

#Return the probability of flipping one head each from two coins

#One coin has a probability of heads of p1 and the other of p2

def f(p1,p2):

return p1 * p2

print f(0.5,0.8)

ABOVE EXAMPLE WRITTEN IN PYTHON

#Two coins have probabilities of heads of p1 andd p2

#The probability of selecting the first coin is p0

#Return the probability of a flip landing on heads

def f(p0,p1,p2):

return (p0 * p1) + ((1-p0) * p2)

print f(0.3,0.5,0.9)

ANOTHER EXAMPLE

#Calculate the probability of a positive result given that

#p0=P(C)

#p1=P(Positive|C)

#p2=P(Negative|Not C)

def f(p0,p1,p2):

return (p0 * p1) + (1-p0) * (1-p2)

print f(0.1,0.9,0.8)

ALL OF BAYES RULE

#Return the probability of A conditioned on B given that

#P(A)=p0, P(B|A)=p1, and P(Not B|Not A)=p2

def f(p0,p1,p2):

return (p0 * p1) / ((p0 * p1) + (1-p0) * (1-p2))

print f(0.1,0.9,0.8)

BAYES RULE 2

#Return the probability of A conditioned on Not B given that

#P(A)=p0, P(B|A)=p1, and P(Not B|Not A)=p2

def f(p0,p1,p2):

return (p0 * (1-p1))/((p0 * (1-p1)) + ((1-p0) * p2))

print f(0.1,0.9,0.8)

Unit 11A: Probability Distributions

Continuous Probability Distributions every outcome has probability of 0

Density Probability for Continuous Spaces

f(x) is density of x, p(x) is probability (two different things)

Density is whatever you multiple the difference between the two by in order to equal 1

So between 3 and 3.5 there is .5, multiply by 2 to get 1, so density = 1 (density can be larger than 1)

Density can be 0, which is positive but is non-negative

Unit 12: Correlation vs Causation

Should you stay at home? No

Hospitals do not cause sick people to die. Based on correlation data it makes it seem like you’re more likely to die, but it does not mean it will increase your chance.

Changes of dying in hospital are 40 times larger than at home correlation

Being in a hospital increases your probability of dying by a factor of 40 causal statement

If confounding variable is omitted, correlations are made that could be misleading (when sick variable is taken into account, realize there is a negative correlation, but when omitted it is a positive correlation)

Reverse Causation: size of fire causes # of fire fighters, OR # of fire fighters causes size of fire

Problem Set 2:

Top: 4 / 16 = .25 (4 instances of 1 H, 16 possible combos)

Bottom: 1 / 16 (only 1 instance of H as first)

Use Bayes Rule below:

Unit 13: Estimation

Point where you get greatest probability (starts going back down when p > 2/3, so 2/3 is the maximum likelihood estimator)

lapacian = add one fake data point for each possible outcome

Unit 14: Averages

Mean = sum / count

Mode = most repeated

Median = middle number (if 2, take the mean)

Unit 15: Variance and Standard Deviation

Variance is the sum of variance (Xi – mean) squared normalized (multiplied by 1/n, or count)

It is the measure of how far the data is spread from the mean

Variance = Average quadratic deviation from the mean

Standard deviation = square root of variance

O is sigma, sigma^2 is variance, sqr rt of sigma^2 is standard dev or just sigma

N is count of all numbers

Sum of Xi is the sum of all the numbers

Sum of Xi^2 is the sum of each number squared and then added together

Standard Score = x – m / sigma

X = score given

M = mean

Sigma = std dev

Unit 17: Programming Esimators

MEAN:

#Complete the mean function to make it return the mean of a list of numbers

data1=[49., 66, 24, 98, 37, 64, 98, 27, 56, 93, 68, 78, 22, 25, 11]

def mean(data):

return sum(data) / len(data)

print mean(data1)

MEDIAN:

#Complete the median function to make it return the median of a list of numbers

data1=[1,2,5,10,-20]

def median(data):

sdata = sorted(data)

mnum = len(data) / 2

return sdata[mnum]

print median(data1)

MODE:

#Complete the mode function to make it return the mode of a list of numbers

data1=[1,2,5,10,-20,5,5]

def mode(data):

mcount = 0

for i in range(len(data)):

icount = data.count(data[i])

if icount >= mcount:

mode = data[i]

mcount = icount

return mode

print mode(data1)

VARIANCE:

#Complete the variance function to make it return the variance of a list of numbers

data3=[13.04, 1.32, 22.65, 17.44, 29.54, 23.22, 17.65, 10.12, 26.73, 16.43]

def mean(data):

return sum(data)/len(data)

def variance(data):

#ndata = []

variance = 0

mu = mean(data)

for i in range(len(data)):

variance = variance + (data[i] – mu)**2

return variance/len(data)

print variance(data3)

Other way to calculate the variance

STANDARD DEVIATION

#Complete the stddev function to make it return the standard deviation

#of a list of numbers

from math import sqrt

data3=[13.04, 1.32, 22.65, 17.44, 29.54, 23.22, 17.65, 10.12, 26.73, 16.43]

def mean(data):

return sum(data)/len(data)

def variance(data):

mu=mean(data)

return mean([(x-mu)**2 for x in data])

def stddev(data):

return sqrt(variance(data))

print stddev(data3)

Problem Set 3: Estimators

When increasing by ratio, mean and std dev increase by same ratio. Variance is square of std dev. Standard score stays the same

Measure of spread, it’s clear that adult is more spread out so adult must have bigger std dev

#In class you wrote a function mean that computed the mean of a set of numbers

#Consider a case where you have already computed the mean of a set of data and

#get a single additional number. Given the number of observations in the

#existing data, the old mean and the new value, complete the function to return

#the correct mean

from __future__ import division

def mean(oldmean,n,x):

return ((oldmean * n) + x) / (n + 1)

currentmean=10

currentcount=5

new=4

print mean(currentmean,currentcount,new) #Should print 9

LIKELIHOOD:

#Compute the likelihood of observing a sequence of die rolls

#Likelihood is the probability of getting the specific set of rolls

#in the given order

#Given a multi-sided die whose labels and probabilities are

#given by a Python dictionary called dist and a sequence (list, tuple, string)

#of rolls called data, complete the function likelihood

#Note that an element of a dictionary can be retrieved by dist[key] where

#key is one of the dictionary’s keys (e.g. ‘A’, ‘Good’).

def likelihood(dist,data):

likelihood = 1

for i in range(len(data)):

likelihood = likelihood * dist[data[i]]

return likelihood

tests= [(({‘A’:0.2,’B’:0.2,’C’:0.2,’D’:0.2,’E’:0.2},’ABCEDDECAB’), 1.024e-07),(({‘Good’:0.6,’Bad’:0.2,’Indifferent’:0.2},[‘Good’,’Bad’,’Indifferent’,’Good’,’Good’,’Bad’]), 0.001728),(({‘Z’:0.6,’X’:0.333,’Y’:0.067},’ZXYYZXYXYZY’), 1.07686302456e-08),(({‘Z’:0.6,’X’:0.233,’Y’:0.067,’W’:0.1},’WXYZYZZZZW’), 8.133206112e-07)]

for t,l in tests:

if abs(likelihood(*t)/l-1)<0.01: print ‘Correct’

else: print ‘Incorrect’

Unit 17: Outliers

Quartiles

Unit 18: Binomial Distribution

To calculate # of outcomes:

10 coins, 5 heads

Numerator : Multiply the coins on the top (10 * 9 * 8 * 7 * 6) since there are 10 coins and only using 5, we would multiply them until we get down to the # we need

Denominator: The number of desired outcome to place remaining (5 * 4 * 3 * 2 * 1) since we want 5 heads, we start at 5 and subtract 1 each time (this compensates for the amount of times we ‘overcount’ when multiplying the top)

(10 * 9 * 8 * 7 * 6) = 30,240

(5 * 4 * 3 * 2 * 1) = 120

30,240 / 120 = 252

The (n-k)! cancels out the rest of the numerator’s factorial that would have continued down to 1

.032 1 HEADS (3)

.128 2 HEADS (3)

.512 3 HEADS (1)

.008 3 TAILS (1)

.68 TOTAL

(3 * .032) = 0.096

n! / (n – k)! * k!

5! / (5 – 4)! * 4! = 5

THHHH 0.2 * (0.8 ^ 4) = .08192 * 5 possibilities (see above) = .4096

5! / (5 – 3)! * 3!, or (5*4*3) / (3*2*1) = 10

TTHHH = (.2 ^ 2) * (.8 ^ 3) = .02048 * 10 possibilities = .2048

Unit 19A: Central Limit Theorum