Python, Udacity Class

Udacity Classes 4: Intro to Computer Science

Below are my notes from when I originally took the Udacity course Introduction to Computer Science. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

Lists and Strings

How to define a procedure using lists:

days_in_month = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
def how_many_days(month):
    return days_in_month[month - 1]
print how_many_days(9)

To index a nested list, use two indexes in a row:

Mutation can change the value of a list after created. Strings look like the value is being changed, but instead a new string is created. The program then refers to the new one and stops referring to the old one.

Different than strings, because you cannot mutate string, but any other variable that shares same list will alter original list

Q refers to same object as p, so the list was altered (Aliasing is where two names refer to same object)



Numbers are not mutable, without a “return” statement in inc(), all it does is change “n” to be pointing at a new number, but leaves “a” pointing at original (list is an object, so variable affects the object, and does not create a new assignment)

Append: mutate list to add another element on end

+ is like concatenation for strings, creates a new list combining two others

len is for strings/lists/object that’s a collection of things: the number of elements of the output

How Computers Store Data:

  1. Need something to be in a “state”
  2. Need something to read that “state”

Light bulb has 2 states: on and off (can store 1 bit of data)

Need something to read that state (like an eye)

Data stored in the processor (Register—fastest memory, built right into processor) is like that, more expensive ways of storing data available:

Bucket that’s full and a bucket that’s empty

Can weight it or look at it, but can have leakage or evaporation. Eventually the water will evaporate and will not be able to tell if a 0/1, and need a threshold to distinguish between 0 and 1. When doing this with electrons, you use this with a capacitor (DRAM)

Python: exponentiation using ** (instead of ^)

Gigabyte: 2 ^ 30


One Nanodollar – need 1 billion of these to equal $1


DRAM figured out by using $10 for 2GB

Speed of light: 300,000 km/s


Executes until greater than length of object p



Example of IF and FOR


Index method, does same thing as find_element above, but returns error instead of -1 if it doesn’t exist

IN operator and NOT IN: true/false if in/not in list


Takes two lists as inputs and appends only those that are not duplicates


Pop removes and returns last element in list



Instead of printing URL, just want to create a list (append)

Make an empty list by [] (two empty square brackets)

Make an empty string by ‘’

Need 2 new variables, pages left to crawl, and pages already crawled






Unit 4

Goal of an index—mapping between keyword and where that keyword is found (like a book, shows pages)

Search engine, keyword will map to list of webpages were that keyword appears (list of urls)

Data structures is important—proper structure makes writing rest of code easier (sometimes can make it impossible)


Split: gives list of words in a string

Triple quote: can divide them over multiple lines




get_page uses python library urllib

Try block is an exception handler (instead of a lot of if’s, will return the error under except)

Network: at least 3 entities with at least one instance where not directly connected (over 3000 years old)

Iliad network:

Ancient Greek would light fires on islands to signal 600 km across Greek islands

Need a way to convert message wanting to send into a signal, need a way to route (to know whether or not to send on message), needs to decide who gets to use the network

Measuring Networks: Latency and Bandwidth

Latency: time it takes message to get from source to destination

Bandwidth: amount of data that can be sent through channel at one time (rate you can send information bits per second Mbps)


Every bit doubles the amount of things you can distinguish 4 bits = 2^4 = 16

traceroute shows each hop to get server (ms to get to each), shows how packets go from where you are to where you want to go




No guarantee that packet gets where it needs to go, can be dropped. Just does its best to get message across

HTTP Protocol

def split_string(source,splitlist):

output = []

atsplit = True

for char in source:

if char in splitlist:

atsplit = True

else:

if atsplit:

output.append(char)

atsplit = False

else:

output[-1] = output[-1] + char finds the last character in the output list

return output

Homework 6

#The current index includes a url in the list of urls

#for a keyword multiple times if the keyword appears

#on that page more than once.

#It might be better to only include the same url

#once in the url list for a keyword, even if it appears

#many times.

#Modify add_to_index so that a given url is only

#included once in the url list for a keyword,

#no matter how many times that keyword appears.

#index = crawl_web(“http://www.udacity.com/cs101x/index.html”)

#print lookup(index,”is”) => [‘http://www.udacity.com/cs101x/index.html’]

def add_to_index(index, keyword, url):

for entry in index:

if entry[0] == keyword:

if url not in entry[1]:

entry[1].append(url)

return

# not found, add new keyword to index

index.append([keyword, [url]])

def get_page(url):

try:

if url == “http://www.udacity.com/cs101x/index.html”:

return ‘<html> <body> This is a test page for learning to crawl! <p> It is a good idea to <a href=”http://www.udacity.com/cs101x/crawling.html”>learn to crawl</a> before you try to <a href=”http://www.udacity.com/cs101x/walking.html”>walk</a> or <a href=”http://www.udacity.com/cs101x/flying.html”>fly</a>. </p> </body> </html> ‘

elif url == “http://www.udacity.com/cs101x/crawling.html”:

return ‘<html> <body> I have not learned to crawl yet, but I am quite good at <a href=”http://www.udacity.com/cs101x/kicking.html”>kicking</a>. </body> </html>’

elif url == “http://www.udacity.com/cs101x/walking.html”:

return ‘<html> <body> I cant get enough <a href=”http://www.udacity.com/cs101x/index.html”>crawling</a>! </body> </html>’

elif url == “http://www.udacity.com/cs101x/flying.html”:

return ‘<html> <body> The magic words are Squeamish Ossifrage! </body> </html>’

except:

return “”

return “”

def union(a, b):

for e in b:

if e not in a:

a.append(e)

def get_next_target(page):

start_link = page.find(‘<a href=’)

if start_link == -1:

return None, 0

start_quote = page.find(‘”‘, start_link)

end_quote = page.find(‘”‘, start_quote + 1)

url = page[start_quote + 1:end_quote]

return url, end_quote

def get_all_links(page):

links = []

while True:

url, endpos = get_next_target(page)

if url:

links.append(url)

page = page[endpos:]

else:

break

return links

def crawl_web(seed):

tocrawl = [seed]

crawled = []

index = []

while tocrawl:

page = tocrawl.pop()

if page not in crawled:

content = get_page(page)

add_page_to_index(index, page, content)

union(tocrawl, get_all_links(content))

crawled.append(page)

return index

def add_page_to_index(index, url, content):

words = content.split()

for word in words:

add_to_index(index, word, url)

def lookup(index, keyword):

for entry in index:

if entry[0] == keyword:

return entry[1]

return None

#2 Gold Stars

#One way search engines rank pages

#is to count the number of times a

#searcher clicks on a returned link.

#This indicates that the person doing

#the query thought this was a useful

#link for the query, so it should be

#higher in the rankings next time.

#(In Unit 6, we will look at a different

#way of ranking pages that does not depend

#on user clicks.)

#Modify the index such that for each url in a

#list for a keyword, there is also a number

#that counts the number of times a user

#clicks on that link for this keyword.

#The result of lookup(index,keyword) should

#now be a list of url entries, where each url

#entry is a list of a url and a number

#indicating the number of times that url

#was clicked for this query keyword.

#You should define a new procedure to simulate

#user clicks for a given link:

#record_user_click(index,word,url)

#that modifies the entry in the index for

#the input word by increasing the count associated

#with the url by 1.

#You also will have to modify add_to_index

#in order to correctly create the new data

#structure.

#Here is an example showing a sequence of interactions:

#index = crawl_web(‘http://www.udacity.com/cs101x/index.html’)

#print lookup(index, ‘good’) => [[‘http://www.udacity.com/cs101x/index.html’, 0], [‘http://www.udacity.com/cs101x/crawling.html’, 0]]

#record_user_click(index, ‘good’, ‘http://www.udacity.com/cs101x/crawling.html’)

#print lookup(index, ‘good’) => [[‘http://www.udacity.com/cs101x/index.html’, 0], [‘http://www.udacity.com/cs101x/crawling.html’, 1]]

def record_user_click(index,keyword,url):

urls = lookup(index,keyword) #retrieve urls

if urls:

for entry in urls:

if entry[0] == url:

entry[1] = entry[1] + 1

def add_to_index(index, keyword, url):

for entry in index:

if entry[0] == keyword:

entry[1].append([url,0])

return

# not found, add new keyword to index

index.append([keyword, [url]])

def get_page(url):

try:

if url == “http://www.udacity.com/cs101x/index.html”:

return ‘<html> <body> This is a test page for learning to crawl! <p> It is a good idea to <a href=”http://www.udacity.com/cs101x/crawling.html”>learn to crawl</a> before you try to <a href=”http://www.udacity.com/cs101x/walking.html”>walk</a> or <a href=”http://www.udacity.com/cs101x/flying.html”>fly</a>. </p> </body> </html> ‘

elif url == “http://www.udacity.com/cs101x/crawling.html”:

return ‘<html> <body> I have not learned to crawl yet, but I am quite good at <a href=”http://www.udacity.com/cs101x/kicking.html”>kicking</a>. </body> </html>’

elif url == “http://www.udacity.com/cs101x/walking.html”:

return ‘<html> <body> I cant get enough <a href=”http://www.udacity.com/cs101x/index.html”>crawling</a>! </body> </html>’

elif url == “http://www.udacity.com/cs101x/flying.html”:

return ‘<html> <body> The magic words are Squeamish Ossifrage! </body> </html>’

except:

return “”

return “”

def union(a, b):

for e in b:

if e not in a:

a.append(e)

def get_next_target(page):

start_link = page.find(‘<a href=’)

if start_link == -1:

return None, 0

start_quote = page.find(‘”‘, start_link)

end_quote = page.find(‘”‘, start_quote + 1)

url = page[start_quote + 1:end_quote]

return url, end_quote

def get_all_links(page):

links = []

while True:

url, endpos = get_next_target(page)

if url:

links.append(url)

page = page[endpos:]

else:

break

return links

def crawl_web(seed):

tocrawl = [seed]

crawled = []

index = []

while tocrawl:

page = tocrawl.pop()

if page not in crawled:

content = get_page(page)

add_page_to_index(index, page, content)

union(tocrawl, get_all_links(content))

crawled.append(page)

return index

def add_page_to_index(index, url, content):

words = content.split()

for word in words:

add_to_index(index, word, url)

def lookup(index, keyword):

for entry in index:

if entry[0] == keyword:

return entry[1]

return None

def crawl_web(seed,max_depth):

tocrawl = [[seed,0]] adding in a depth counter, so it’s a list of lists w/ 2 parts, the url and its depth

crawled = []

while tocrawl:

page,depth = tocrawl.pop() shortcut to assign 2 variables at one time

if page not in crawled and depth <= max_depth: #depth check

for link in get_all_links(get_page(page)):

tocrawl.append([link,depth+1])

crawled.append(page)

return crawled

def check_sudoku(p):

n = len(p) #Extract size of grid

digit = 1 #start with 1

while digit <= n: #Go through each digit

i = 0

While i < n: #Go through each row and column

row_count = 0

col_count = 0

j = 0

while j < n: #for each entry in ith row/column

if p[i][j] == digit: #check row count

row_count = row_count + 1

if p[j][i] == digit: #check col count

col_count = col_count + 1

j = j + 1

if row_count != 1 or col_count != 1:

Return false

i = i + 1 #next row/column

digit = digit + 1 #next digit

return True #Nothing was wrong!

Unit 5

Algorithm Analysis

Algorithm: a well-defined sequence of steps that can be executed mechanically, guaranteed to always finish and produce the correct result

Cost: depends on the input (n), might be for some inputs one algorithm is better but for other inputs it might be better for another algorithm

Need to be able to predict cost without trying every input ever (most commonly the size of input is what affects this)

Costs: Time (it takes to finish/renting computer and processor), and memory

Time is most important to look at (usually)

Focus on measuring how time scales with input size, instead of absolute time:

  1. Can predict how long it will take for a program to execute before running it
  2. We want to know how the time will change as computers get faster (Moore’s Law, amount of computer power for cost approximately doubles every 18 months)
  3. We want to understand fundamental properties of our algorithms, not things specific to a particular input or machine

Timing code:

def time_execution(code):

start = time.clock()

result = eval(code) pass in any string and evaluate as python code

run_time = time.clock() – start

return result, run_time

def spin_loop(n):

i = 0

while i < n:

i =i + 1

time_execution(‘spin_loop(100000)’) will show result (index [0]), run_time (index [1])

time_execution(‘spin_loop(100000)’)[1] will only show the run_time of the result

Expected to take 50 seconds to loop through 50 billion loops of spin_loop

Running time is linear in the magnitude of n O(n)

SEARCHES LARGE INDEX AND APPENDS IF FOUND/NOT FOUND:

Def add_to_index(index,keyword,url):

For entry in index:

If entry[0] == keyword:

Entry[1].append(url)

Return

Index.append([keyword,[url]])

Def lookup(index,keyword):

For entry in index:

If entry[0] == keyword:

Return entry[1]

Return None

MAKES RANDOM LARGE INDEX:

Def make_string(p):

S = “

For e in p:

S = s + e

Return s

Def make_big_index(size):

Index = []

Letters = [‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’]

While len(index) < size:

Word = make_string(letters)

Add_to_index(index,word,’fake’)

For i in range(len(letters) – 1, 0 , -1): stepping through backwords, increasing by -1

Range loop goes from len(letters) -1 to 0

If letters[i] < ‘z’:

Letters[i] = chr(ord(letters[i]) + 1) increases letter by 1

Break

Else:

Letters[i] = ‘a’

Return index

Can linearly search index of 10,000,000 in one second

worst-case execution time is what you based the time on

If you keep the index in a sorted order, then you don’t have to be linear

Create a hash table to do lookups by making lists of alphabet letters that contain lists of those keywords – this would reduce run time by 26 (for each letter in alphabet), would instead figure out a better way to break up these keywords

Hash table built in to python: Dictionary

Hash table outputs a number of the bucket that contains the keyword looking for

b – 1 ( since index starts at 0, need to have ‘b’ minus 1 buckets)

ord(<one-letter string>) and chr(<number>) are inverses

Modulus Operator: % (takes a number and maps it to the range based on its remainder)

So 14 % 12 would have a remainder of 2, which is where it maps (below is example of a clock to show the 12 steps) Basically the first number divided by the second number, and the output of MOD is the remainder. If the first number is smaller than the second number, it is always equal to the first number (does not add a 0 to the end of the number like normal long division)




Making a better hash function:

2 inputs: keyword, # of buckets

Output: number (corresponding to bucket to add keyword to, between 0 and buckets-1)

Evenly distributed between all buckets

s = “abcd”

for c in s:

n = n + chr(c)

return n %(b – 1)



def hash_string(keyword,buckets):

h = 0

for c in keyword:

h = (h + ord(c)) % buckets can compute at end, but this way it keeps it smaller

return h

Time to lookup only depends on number of keywords per bucket (k/b):


[[[keyword,[url…]],…],…] <– Buckets

[[keyword,[url…]],…] <– Keywords

[keyword,[url…]] <–Ind Keyword w/ Url List

Creating an Empty Hash Table

For making an index, you just make an empty list

Index=[]

Hash table you need to start with a set of all the empty buckets (because want to be able to do lookups right away)

def make_hashtable(nbuckets):

i = 0

table = []

while I < nbuckets:

table.append([]) this will add an empty list to the list table

i=i+1

return table

better way: using For loop

for unused in range(0,nbuckets): put “unused” because variable is not actually used in code but needed for syntax of range

table.append([])

Finding the Right Bucket

return htable[hash_string(key,len(htable))] returns the bucket where the keyword is—bucket is just the generic name of the list that contains the keyword in the hash table

bucket = hashtable_get_bucket(htable,key) bucket is the list in the hash table

bucket.append([key,value])

#Define a procedure,

#hashtable_lookup(htable,key)

#that takes two inputs, a hashtable

#and a key (string),

#and outputs the value associated

#with that key.

def hashtable_lookup(htable,key):

bucket = hashtable_get_bucket(htable,key)

for entry in bucket:

if entry[0] == key:

return entry[1]

return None

def hashtable_add(htable,key,value):

bucket = hashtable_get_bucket(htable,key)

bucket.append([key,value])

def hashtable_get_bucket(htable,keyword):

return htable[hash_string(keyword,len(htable))]

def hash_string(keyword,buckets):

out = 0

for s in keyword:

out = (out + ord(s)) % buckets

return out

def make_hashtable(nbuckets):

table = []

for unused in range(0,nbuckets):

table.append([])

return table

def hashtable_update(htable,key,value):

bucket = hashtable_get_bucket(htable,key)

for entry in bucket:

if entry[0] == key:

entry[1] = value

return

bucket.append([key,value])

Dictionary vs String vs List

Can use “in” to see if keyword in dictionary

Elements = { ‘hydrogen’ : 1, ‘helium’ : 2, ‘carbon’ : 6 }

‘lithium’ in elements evaluates to False

Elements[‘nitrogen’] = 8 adds nitrogen

elements[‘nitrogen’] = 7 modifies to 7 (dictionaries are mutable)

elements in a dictionary can even be another dictionary

elements = {}

elements[‘H’] = {‘name’ : ‘Hydrogen’, ‘number’ : 1, ‘weight’ : 1.00794}

print elements[‘H’][‘weight’]

Modifying our Search Engine:

def get_all_links(page):

links = []

while True:

url, endpos = get_next_target(page)

if url:

links.append(url)

page = page[endpos:]

else:

break

return links

def crawl_web(seed):

tocrawl = [seed]

crawled = []

index = {} #changed from []

while tocrawl:

page = tocrawl.pop()

if page not in crawled:

content = get_page(page)

add_page_to_index(index, page, content)

union(tocrawl, get_all_links(content))

crawled.append(page)

return index

def add_page_to_index(index, url, content):

words = content.split()

for word in words:

add_to_index(index, word, url)

def add_to_index(index, keyword, url):

if keyword in index:

index[keyword].append(url)

else:

index[keyword] = [url]

def lookup(index, keyword):

if keyword in index:

return index[keyword]

return None

REMOVED OLD CODE from lookup:

for entry in index:

if entry[0] == keyword:

return entry[1]

return None

REMOVED OLD CODE from add_to_index:

for entry in index:

if entry[0] == keyword:

entry[1].append(url)

return

# not found, add new keyword to index

index.append([keyword, [url]])

Dictionary Courses Structure:

# { <hexamester>, { <class>: { <property>: <value>, … },

# … },

# … }

EXAMPLE OF LOOPING THROUGH MULTIPLE LEVELS OF DICTIONARY

def involved(courses, person):

output= {}

for hexamester in courses:

for course in courses[hexamester]:

for key in courses[hexamester][course]:

if courses[hexamester][course][key] == person:

if hexamester in output:

output[hexamester].append(course)

else:

output[hexamester] = [course]

return output

bucket.append([key, hashtable_loop(htable,key,value,function])

Refactoring: splitting code apart to have center point of control over it

Passing index/buckets in for loops:

def bucket_find(bucket,key):

for entry in bucket:

if entry[0] == key:

return entry can return the entry being pointed at (reference)

return None

entry = bucket_find(bucket,key)

if entry:

entry[1] = value

Memoization: store results of code to save computing time

def cached_execution(cache,code):

if code not in cache:

cache[code] = eval(code)

return cache[code]

Recursive Definitions:


No such thing as the longest word—just something that has meaning that speakers of the language could understand

Recursive definition: has a “Base Case” (stopping case), and “Recursive Case”

  1. Base Case: a starting point, not defined in terms of itself

    1. Smallest input, already know the answer
  2. Recursive Case: defined in terms of “smaller” version of itself

Ancestor Parent (Base Case)

Ancestor Parent of Ancestor (Recursive Case)

Factorial(n) = n * (n-1) * (n-2)…

Base Case: factorial(0) = 1

Recursive Case: n * factorial(n-1)

def factorial(n):

if n == 0:

return 1 base case

else:

return n * factorial(n-1)

Palindromes:

Base Case with numbers if usually something to do with numbers like 0 or 1

Base Case with strings could use ‘’ True

Recursive case: test if first and last characters in string are equal

If they match it’s false

If they don’t, need to check rest of string

def is_palindrome(s):

if s == ‘’:

return True

else:

if s[0] == s[-1]:

return is_palindrome(s[1:-1]

else:

return False

Recursive vs Iterative:

Recursive calls are expensive, but easier to understand/get right (in palindrome it makes a new string each time)

If care about performance, then better off finding an iterative approach to lessen cost

Liber Abaci : Leonardo da Pisa (Fionacci 1202) – book of calculations

Replaced Roman Numerals with Arabic Numerals using decimals

Fibonacci, previous two numbers added together to create next number

Two base cases:

fibonacci(0) = 0

fibonacci(1) = 1

Fibonacci(n) = Fibonacci(n-1) + finbonacci(n-2) recursive

Where n>1

def fibonacci(n):

if n<1:

return 0

if n==1:

return 1

return fibonacci(n-1) + fibonacci(n-2)

Recursive breakdown:

def recursive(n):

if basecase:

return basecaseanswer

return recursive(n-1) whatever recursive formula is

Instead of using recursive, use a while loop (anything you can define recursively, you can define iteratively)

def fibonacci(n):

current = 0

after = 1

for i in range(0,n):

current, after = after, current + after

return current

#checking to see how many months it will take to have greater mass of rabbits than mass of earth

mass_of_earth = 5.9722 * 10**24 # kilograms

mass_of_rabit = 2 # 2 kilograms per rabbit

n = 1

do while fibonacci(n) * mass_of_rabit < mass_of_earth:

n = n + 1

print n, fibonacci(n)

Take 10 years for the rabbits to take over the earth


Learned to make crawler, then learned how to index these URLS by keyword so that this crawler doesn’t have to execute each time when searching, need to rank urls to get in right order

Close, but need to add in friends popularity score

Not a recursive definition (above), but a circular definition (no base case!)

Never going to get an answer (will never know how popular Charlie is)

Can we give a base case? Need to invent one

popularity(‘Alice’) = 1

popularity(p)=SUM(popularity(f)) (

def popularity(p):

if p==’Alice’:

return 1

score = 0

for f in friends(p):

score = score + popularity(f)

return score

This (above) still will not work, because it will still bounce back and forth and make a circular reference if two people are linked to each other

Relaxation Algorithm:

start with a guess

while not done:

make the guess better

popularity(time step, person) score

popularity(0,p) 1 BASE CASE

ppopularity(t,p) = SUM(popularity(t-1))

t>0

def popularity(t,p):

if t ==0:

return 1

else:

score = 0

for f in friends(p):

score = score + popularity(t-1,f)

return score

Is this a good recursive definition? Yes, because it will always return a result (base case will be achieved)

rank(0,url) 1

rank(t,url)sum(rank(t-1,p)/outlinks[p]

p E in links[url]

sum up the popularity of all pages that links to a page and divide by the number of outlinks (since a page just full of links is probably not as important)

d = damping constant 0.8 (determines how frequently web server will pick a new link, versus start at a new random page), typical value is less than 1

n = number of pages

Start by initializing 1/n, and for as many timesteps, for each link that links to us, we take its popularity, multiplied by its dampening, and add in the value of random web server starting over

Page Rank algorithm described was what launched Google, made by cofounder Larry Page (so algorithm was called Page Rank)

–was made because AltaVista used to page rank based on # of keywords on a page, so a page that said keyword “restaurant” 32 times was ranked higher than one that said “restaurant” twice, even if the site w/ the keyword twice had more traffic

Implementing URank (PageRank™ is property of google)

crawl_web(seed)index,graph (where the graph is the structure of the pages each url links to)

def crawl_web(seed):

tocrawl = [seed]

crawled = []

index = {}

graph = {}

while tocrawl:

page = tocrawl.pop()

if page not in crawled:

content = get_page(page)

add_page_to_index(index, page, content)

outlinks = get_all_links(content)

graph[page] = outlinks do not need to put in [] because output is already a list

union(tocrawl, get_all_links(content))

crawled.append(page)

return index, graph

Computing the Page Ranks:

Use graph from crawl_web

def compute.ranks(graph) and return dictionary with {url : rank}

lookup_best, pass in index, keyword, dictionary of ranked pages and returns the best page

d = 0.8 (dampening constant)

rank(0,url) = 1/npages

newranks[url]= (1-d)/npages + sum of d*rank(t-1,p)/# of outlinks from p

ranks = ranks at time t-1

newranks = ranks at time t

def compute_ranks(graph):

d = 0.8 # damping factor

numloops = 10

ranks = {}

npages = len(graph)

for page in graph:

ranks[page] = 1.0 / npages

for i in range(0,numloops):

newranks = {}

for page in graphs:

newrank = (1 – d) / npages

for node in graph:

if page in graph[node]:

newrank = newrank + d * (ranks[nodes]/len(graph[node]))

newranks[page] = newrank

ranks = newranks

return ranks

def hexes_to_udaciousness(n, spread, target):

if n >= target:

return 0

else:

return 1 + hexes_to_udaciousness(n + (n * spread),spread,target)

1, [2,3] , 4

1,

[2,3] list, so add the sum + 1 for the list and keep running

def deep_count(p):

sum = 0

for e in p:

if is_list(e):

sum = sum + 1 + deep_count(e)

else:

sum = sum + 1

return sum

HOMEWORK 6 PROBLEM 5:

#Feeling Lucky

#In Unit 6, we implemented a page ranking algorithm, but didn’t finish the final

#step of using it to improve our search results. For this question, you will use

#the page rankings to produce the best output for a given query.

#Define a procedure, lucky_search, that takes as input an index, a ranks

#dictionary (the result of compute_ranks), and a keyword, and returns the one

#URL most likely to be the best site for that keyword. If the keyword does not

#appear in the index, lucky_search should return None.

def lucky_search(index, ranks, keyword):

toprank = 0

if keyword not in index:

return None

for e in index[keyword]:

if ranks[e] > toprank:

toprank = ranks[e]

result = e

return result

cache = {

‘http://udacity.com/cs101x/urank/index.html’: “””<html>

<body>

<h1>Dave’s Cooking Algorithms</h1>

<p>

Here are my favorite recipies:

<ul>

<li> <a href=”http://udacity.com/cs101x/urank/hummus.html”>Hummus Recipe</a>

<li> <a href=”http://udacity.com/cs101x/urank/arsenic.html”>World’s Best Hummus</a>

<li> <a href=”http://udacity.com/cs101x/urank/kathleen.html”>Kathleen’s Hummus Recipe</a>

</ul>

For more expert opinions, check out the

<a href=”http://udacity.com/cs101x/urank/nickel.html”>Nickel Chef</a>

and <a href=”http://udacity.com/cs101x/urank/zinc.html”>Zinc Chef</a>.

</body>

</html>

“””,

‘http://udacity.com/cs101x/urank/zinc.html’: “””<html>

<body>

<h1>The Zinc Chef</h1>

<p>

I learned everything I know from

<a href=”http://udacity.com/cs101x/urank/nickel.html”>the Nickel Chef</a>.

</p>

<p>

For great hummus, try

<a href=”http://udacity.com/cs101x/urank/arsenic.html”>this recipe</a>.

</body>

</html>

“””,

‘http://udacity.com/cs101x/urank/nickel.html’: “””<html>

<body>

<h1>The Nickel Chef</h1>

<p>

This is the

<a href=”http://udacity.com/cs101x/urank/kathleen.html”>

best Hummus recipe!

</a>

</body>

</html>

“””,

‘http://udacity.com/cs101x/urank/kathleen.html’: “””<html>

<body>

<h1>

Kathleen’s Hummus Recipe

</h1>

<p>

<ol>

<li> Open a can of garbonzo beans.

<li> Crush them in a blender.

<li> Add 3 tablesppons of tahini sauce.

<li> Squeeze in one lemon.

<li> Add salt, pepper, and buttercream frosting to taste.

</ol>

</body>

</html>

“””,

‘http://udacity.com/cs101x/urank/arsenic.html’: “””<html>

<body>

<h1>

The Arsenic Chef’s World Famous Hummus Recipe

</h1>

<p>

<ol>

<li> Kidnap the <a href=”http://udacity.com/cs101x/urank/nickel.html”>Nickel Chef</a>.

<li> Force her to make hummus for you.

</ol>

</body>

</html>

“””,

‘http://udacity.com/cs101x/urank/hummus.html’: “””<html>

<body>

<h1>

Hummus Recipe

</h1>

<p>

<ol>

<li> Go to the store and buy a container of hummus.

<li> Open it.

</ol>

</body>

</html>

“””,

}

def get_page(url):

if url in cache:

return cache[url]

return “”

def get_next_target(page):

start_link = page.find(‘<a href=’)

if start_link == -1:

return None, 0

start_quote = page.find(‘”‘, start_link)

end_quote = page.find(‘”‘, start_quote + 1)

url = page[start_quote + 1:end_quote]

return url, end_quote

def get_all_links(page):

links = []

while True:

url, endpos = get_next_target(page)

if url:

links.append(url)

page = page[endpos:]

else:

break

return links

def union(a, b):

for e in b:

if e not in a:

a.append(e)

def add_page_to_index(index, url, content):

words = content.split()

for word in words:

add_to_index(index, word, url)

def add_to_index(index, keyword, url):

if keyword in index:

index[keyword].append(url)

else:

index[keyword] = [url]

def lookup(index, keyword):

if keyword in index:

return index[keyword]

else:

return None

def crawl_web(seed): # returns index, graph of inlinks

tocrawl = [seed]

crawled = []

graph = {} # <url>, [list of pages it links to]

index = {}

while tocrawl:

page = tocrawl.pop()

if page not in crawled:

content = get_page(page)

add_page_to_index(index, page, content)

outlinks = get_all_links(content)

graph[page] = outlinks

union(tocrawl, outlinks)

crawled.append(page)

return index, graph

def compute_ranks(graph):

d = 0.8 # damping factor

numloops = 10

ranks = {}

npages = len(graph)

for page in graph:

ranks[page] = 1.0 / npages

for i in range(0, numloops):

newranks = {}

for page in graph:

newrank = (1 – d) / npages

for node in graph:

if page in graph[node]:

newrank = newrank + d * (ranks[node] / len(graph[node]))

newranks[page] = newrank

ranks = newranks

return ranks

def ancestors(genealogy, person):

ancestorlist = []

for e in genealogy:

if e == person or e in ancestorlist:

for p in genealogy[e]:

if p not in ancestorlist:

ancestorlist.append(p)

return ancestorlist

Unit 7

Themes:

Procedural abstraction – write one procedure and use it (so hide the details—don’t need to know what the exact code is, just know what the input/output would be)

Universality: can make any program with the three things above

Recursive Definitions: need something to cause recursion, and a base case

Breakdown of themes in each Units:


What is Computer Science?

Engineering:

Use abstraction to hide details to overcome limits of human minds (in relation to how much you can keep in your mind at one time)

Similar to engineering by making things while under constraint of human minds (not laws of physics)

Science:

Biology is really all about computation (DNA) in nature

Liberal Art:

The Seven Liberal Arts:

3 about language and 4 about number


Leave a Reply