Below are my notes from when I originally took the Udacity course Introduction to Computer Science. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

Lists and Strings

How to define a procedure using lists:
days_in_month = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
def how_many_days(month):
return days_in_month[month - 1]
print how_many_days(9)

To index a nested list, use two indexes in a row:

Mutation can change the value of a list after created. Strings look like the value is being changed, but instead a new string is created. The program then refers to the new one and stops referring to the old one.

Different than strings, because you cannot mutate string, but any other variable that shares same list will alter original list
Q refers to same object as p, so the list was altered (Aliasing is where two names refer to same object)
Numbers are not mutable, without a “return” statement in inc(), all it does is change “n” to be pointing at a new number, but leaves “a” pointing at original (list is an object, so variable affects the object, and does not create a new assignment)
Append: mutate list to add another element on end
+ is like concatenation for strings, creates a new list combining two others
len is for strings/lists/object that’s a collection of things: the number of elements of the output
How Computers Store Data:
- Need something to be in a “state”
- Need something to read that “state”
Light bulb has 2 states: on and off (can store 1 bit of data)
Need something to read that state (like an eye)
Data stored in the processor (Register—fastest memory, built right into processor) is like that, more expensive ways of storing data available:
Bucket that’s full and a bucket that’s empty
Can weight it or look at it, but can have leakage or evaporation. Eventually the water will evaporate and will not be able to tell if a 0/1, and need a threshold to distinguish between 0 and 1. When doing this with electrons, you use this with a capacitor (DRAM)
Python: exponentiation using ** (instead of ^)
Gigabyte: 2 ^ 30
One Nanodollar – need 1 billion of these to equal $1
DRAM figured out by using $10 for 2GB
Speed of light: 300,000 km/s
Executes until greater than length of object p
Example of IF and FOR
Index method, does same thing as find_element above, but returns error instead of -1 if it doesn’t exist
IN operator and NOT IN: true/false if in/not in list
Takes two lists as inputs and appends only those that are not duplicates
Pop removes and returns last element in list
Instead of printing URL, just want to create a list (append)
Make an empty list by [] (two empty square brackets)
Make an empty string by ‘’
Need 2 new variables, pages left to crawl, and pages already crawled
Unit 4
Goal of an index—mapping between keyword and where that keyword is found (like a book, shows pages)
Search engine, keyword will map to list of webpages were that keyword appears (list of urls)
Data structures is important—proper structure makes writing rest of code easier (sometimes can make it impossible)
Split: gives list of words in a string
Triple quote: can divide them over multiple lines
get_page uses python library urllib
Try block is an exception handler (instead of a lot of if’s, will return the error under except)
Network: at least 3 entities with at least one instance where not directly connected (over 3000 years old)
Iliad network:
Ancient Greek would light fires on islands to signal 600 km across Greek islands
Need a way to convert message wanting to send into a signal, need a way to route (to know whether or not to send on message), needs to decide who gets to use the network
Measuring Networks: Latency and Bandwidth
Latency: time it takes message to get from source to destination
Bandwidth: amount of data that can be sent through channel at one time (rate you can send information bits per second Mbps)
Every bit doubles the amount of things you can distinguish 4 bits = 2^4 = 16
traceroute shows each hop to get server (ms to get to each), shows how packets go from where you are to where you want to go
No guarantee that packet gets where it needs to go, can be dropped. Just does its best to get message across
HTTP Protocol
def split_string(source,splitlist):
output = []
atsplit = True
for char in source:
if char in splitlist:
atsplit = True
else:
if atsplit:
output.append(char)
atsplit = False
else:
output[-1] = output[-1] + char finds the last character in the output list
return output
Homework 6
#The current index includes a url in the list of urls
#for a keyword multiple times if the keyword appears
#on that page more than once.
#It might be better to only include the same url
#once in the url list for a keyword, even if it appears
#many times.
#Modify add_to_index so that a given url is only
#included once in the url list for a keyword,
#no matter how many times that keyword appears.
#index = crawl_web(“http://www.udacity.com/cs101x/index.html”)
#print lookup(index,”is”) => [‘http://www.udacity.com/cs101x/index.html’]
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
if url not in entry[1]:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
def get_page(url):
try:
if url == “http://www.udacity.com/cs101x/index.html”:
return ‘<html> <body> This is a test page for learning to crawl! <p> It is a good idea to <a href=”http://www.udacity.com/cs101x/crawling.html”>learn to crawl</a> before you try to <a href=”http://www.udacity.com/cs101x/walking.html”>walk</a> or <a href=”http://www.udacity.com/cs101x/flying.html”>fly</a>. </p> </body> </html> ‘
elif url == “http://www.udacity.com/cs101x/crawling.html”:
return ‘<html> <body> I have not learned to crawl yet, but I am quite good at <a href=”http://www.udacity.com/cs101x/kicking.html”>kicking</a>. </body> </html>’
elif url == “http://www.udacity.com/cs101x/walking.html”:
return ‘<html> <body> I cant get enough <a href=”http://www.udacity.com/cs101x/index.html”>crawling</a>! </body> </html>’
elif url == “http://www.udacity.com/cs101x/flying.html”:
return ‘<html> <body> The magic words are Squeamish Ossifrage! </body> </html>’
except:
return “”
return “”
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def get_next_target(page):
start_link = page.find(‘<a href=’)
if start_link == -1:
return None, 0
start_quote = page.find(‘”‘, start_link)
end_quote = page.find(‘”‘, start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None
#2 Gold Stars
#One way search engines rank pages
#is to count the number of times a
#searcher clicks on a returned link.
#This indicates that the person doing
#the query thought this was a useful
#link for the query, so it should be
#higher in the rankings next time.
#(In Unit 6, we will look at a different
#way of ranking pages that does not depend
#on user clicks.)
#Modify the index such that for each url in a
#list for a keyword, there is also a number
#that counts the number of times a user
#clicks on that link for this keyword.
#The result of lookup(index,keyword) should
#now be a list of url entries, where each url
#entry is a list of a url and a number
#indicating the number of times that url
#was clicked for this query keyword.
#You should define a new procedure to simulate
#user clicks for a given link:
#record_user_click(index,word,url)
#that modifies the entry in the index for
#the input word by increasing the count associated
#with the url by 1.
#You also will have to modify add_to_index
#in order to correctly create the new data
#structure.
#Here is an example showing a sequence of interactions:
#index = crawl_web(‘http://www.udacity.com/cs101x/index.html’)
#print lookup(index, ‘good’) => [[‘http://www.udacity.com/cs101x/index.html’, 0], [‘http://www.udacity.com/cs101x/crawling.html’, 0]]
#record_user_click(index, ‘good’, ‘http://www.udacity.com/cs101x/crawling.html’)
#print lookup(index, ‘good’) => [[‘http://www.udacity.com/cs101x/index.html’, 0], [‘http://www.udacity.com/cs101x/crawling.html’, 1]]
def record_user_click(index,keyword,url):
urls = lookup(index,keyword) #retrieve urls
if urls:
for entry in urls:
if entry[0] == url:
entry[1] = entry[1] + 1
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
entry[1].append([url,0])
return
# not found, add new keyword to index
index.append([keyword, [url]])
def get_page(url):
try:
if url == “http://www.udacity.com/cs101x/index.html”:
return ‘<html> <body> This is a test page for learning to crawl! <p> It is a good idea to <a href=”http://www.udacity.com/cs101x/crawling.html”>learn to crawl</a> before you try to <a href=”http://www.udacity.com/cs101x/walking.html”>walk</a> or <a href=”http://www.udacity.com/cs101x/flying.html”>fly</a>. </p> </body> </html> ‘
elif url == “http://www.udacity.com/cs101x/crawling.html”:
return ‘<html> <body> I have not learned to crawl yet, but I am quite good at <a href=”http://www.udacity.com/cs101x/kicking.html”>kicking</a>. </body> </html>’
elif url == “http://www.udacity.com/cs101x/walking.html”:
return ‘<html> <body> I cant get enough <a href=”http://www.udacity.com/cs101x/index.html”>crawling</a>! </body> </html>’
elif url == “http://www.udacity.com/cs101x/flying.html”:
return ‘<html> <body> The magic words are Squeamish Ossifrage! </body> </html>’
except:
return “”
return “”
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def get_next_target(page):
start_link = page.find(‘<a href=’)
if start_link == -1:
return None, 0
start_quote = page.find(‘”‘, start_link)
end_quote = page.find(‘”‘, start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None
def crawl_web(seed,max_depth):
tocrawl = [[seed,0]] adding in a depth counter, so it’s a list of lists w/ 2 parts, the url and its depth
crawled = []
while tocrawl:
page,depth = tocrawl.pop() shortcut to assign 2 variables at one time
if page not in crawled and depth <= max_depth: #depth check
for link in get_all_links(get_page(page)):
tocrawl.append([link,depth+1])
crawled.append(page)
return crawled
def check_sudoku(p):
n = len(p) #Extract size of grid
digit = 1 #start with 1
while digit <= n: #Go through each digit
i = 0
While i < n: #Go through each row and column
row_count = 0
col_count = 0
j = 0
while j < n: #for each entry in ith row/column
if p[i][j] == digit: #check row count
row_count = row_count + 1
if p[j][i] == digit: #check col count
col_count = col_count + 1
j = j + 1
if row_count != 1 or col_count != 1:
Return false
i = i + 1 #next row/column
digit = digit + 1 #next digit
return True #Nothing was wrong!
Unit 5
Algorithm Analysis
Algorithm: a well-defined sequence of steps that can be executed mechanically, guaranteed to always finish and produce the correct result
Cost: depends on the input (n), might be for some inputs one algorithm is better but for other inputs it might be better for another algorithm
Need to be able to predict cost without trying every input ever (most commonly the size of input is what affects this)
Costs: Time (it takes to finish/renting computer and processor), and memory
Time is most important to look at (usually)
Focus on measuring how time scales with input size, instead of absolute time:
- Can predict how long it will take for a program to execute before running it
- We want to know how the time will change as computers get faster (Moore’s Law, amount of computer power for cost approximately doubles every 18 months)
- We want to understand fundamental properties of our algorithms, not things specific to a particular input or machine
Timing code:
def time_execution(code):
start = time.clock()
result = eval(code) pass in any string and evaluate as python code
run_time = time.clock() – start
return result, run_time
def spin_loop(n):
i = 0
while i < n:
i =i + 1
time_execution(‘spin_loop(100000)’) will show result (index [0]), run_time (index [1])
time_execution(‘spin_loop(100000)’)[1] will only show the run_time of the result
Expected to take 50 seconds to loop through 50 billion loops of spin_loop
Running time is linear in the magnitude of n O(n)
SEARCHES LARGE INDEX AND APPENDS IF FOUND/NOT FOUND:
Def add_to_index(index,keyword,url):
For entry in index:
If entry[0] == keyword:
Entry[1].append(url)
Return
Index.append([keyword,[url]])
Def lookup(index,keyword):
For entry in index:
If entry[0] == keyword:
Return entry[1]
Return None
MAKES RANDOM LARGE INDEX:
Def make_string(p):
S = “
For e in p:
S = s + e
Return s
Def make_big_index(size):
Index = []
Letters = [‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’, ‘a’]
While len(index) < size:
Word = make_string(letters)
Add_to_index(index,word,’fake’)
For i in range(len(letters) – 1, 0 , -1): stepping through backwords, increasing by -1
Range loop goes from len(letters) -1 to 0
If letters[i] < ‘z’:
Letters[i] = chr(ord(letters[i]) + 1) increases letter by 1
Break
Else:
Letters[i] = ‘a’
Return index
Can linearly search index of 10,000,000 in one second
worst-case execution time is what you based the time on
If you keep the index in a sorted order, then you don’t have to be linear
Create a hash table to do lookups by making lists of alphabet letters that contain lists of those keywords – this would reduce run time by 26 (for each letter in alphabet), would instead figure out a better way to break up these keywords
Hash table built in to python: Dictionary
Hash table outputs a number of the bucket that contains the keyword looking for
b – 1 ( since index starts at 0, need to have ‘b’ minus 1 buckets)
ord(<one-letter string>) and chr(<number>) are inverses
Modulus Operator: % (takes a number and maps it to the range based on its remainder)
So 14 % 12 would have a remainder of 2, which is where it maps (below is example of a clock to show the 12 steps) Basically the first number divided by the second number, and the output of MOD is the remainder. If the first number is smaller than the second number, it is always equal to the first number (does not add a 0 to the end of the number like normal long division)
Making a better hash function:
2 inputs: keyword, # of buckets
Output: number (corresponding to bucket to add keyword to, between 0 and buckets-1)
Evenly distributed between all buckets
s = “abcd”
for c in s:
n = n + chr(c)
return n %(b – 1)
def hash_string(keyword,buckets):
h = 0
for c in keyword:
h = (h + ord(c)) % buckets can compute at end, but this way it keeps it smaller
return h
Time to lookup only depends on number of keywords per bucket (k/b):
[[[keyword,[url…]],…],…] <– Buckets
[[keyword,[url…]],…] <– Keywords
[keyword,[url…]] <–Ind Keyword w/ Url List
Creating an Empty Hash Table
For making an index, you just make an empty list
Index=[]
Hash table you need to start with a set of all the empty buckets (because want to be able to do lookups right away)
def make_hashtable(nbuckets):
i = 0
table = []
while I < nbuckets:
table.append([]) this will add an empty list to the list table
i=i+1
return table
better way: using For loop
for unused in range(0,nbuckets): put “unused” because variable is not actually used in code but needed for syntax of range
table.append([])
Finding the Right Bucket
return htable[hash_string(key,len(htable))] returns the bucket where the keyword is—bucket is just the generic name of the list that contains the keyword in the hash table
bucket = hashtable_get_bucket(htable,key) bucket is the list in the hash table
bucket.append([key,value])
#Define a procedure,
#hashtable_lookup(htable,key)
#that takes two inputs, a hashtable
#and a key (string),
#and outputs the value associated
#with that key.
def hashtable_lookup(htable,key):
bucket = hashtable_get_bucket(htable,key)
for entry in bucket:
if entry[0] == key:
return entry[1]
return None
def hashtable_add(htable,key,value):
bucket = hashtable_get_bucket(htable,key)
bucket.append([key,value])
def hashtable_get_bucket(htable,keyword):
return htable[hash_string(keyword,len(htable))]
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
def hashtable_update(htable,key,value):
bucket = hashtable_get_bucket(htable,key)
for entry in bucket:
if entry[0] == key:
entry[1] = value
return
bucket.append([key,value])
Dictionary vs String vs List
Can use “in” to see if keyword in dictionary
Elements = { ‘hydrogen’ : 1, ‘helium’ : 2, ‘carbon’ : 6 }
‘lithium’ in elements evaluates to False
Elements[‘nitrogen’] = 8 adds nitrogen
elements[‘nitrogen’] = 7 modifies to 7 (dictionaries are mutable)
elements in a dictionary can even be another dictionary
elements = {}
elements[‘H’] = {‘name’ : ‘Hydrogen’, ‘number’ : 1, ‘weight’ : 1.00794}
print elements[‘H’][‘weight’]
Modifying our Search Engine:
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = {} #changed from []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
return None
REMOVED OLD CODE from lookup:
for entry in index:
if entry[0] == keyword:
return entry[1]
return None
REMOVED OLD CODE from add_to_index:
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
Dictionary Courses Structure:
# { <hexamester>, { <class>: { <property>: <value>, … },
# … },
# … }
EXAMPLE OF LOOPING THROUGH MULTIPLE LEVELS OF DICTIONARY
def involved(courses, person):
output= {}
for hexamester in courses:
for course in courses[hexamester]:
for key in courses[hexamester][course]:
if courses[hexamester][course][key] == person:
if hexamester in output:
output[hexamester].append(course)
else:
output[hexamester] = [course]
return output
bucket.append([key, hashtable_loop(htable,key,value,function])
Refactoring: splitting code apart to have center point of control over it
Passing index/buckets in for loops:
def bucket_find(bucket,key):
for entry in bucket:
if entry[0] == key:
return entry can return the entry being pointed at (reference)
return None
entry = bucket_find(bucket,key)
if entry:
entry[1] = value
Memoization: store results of code to save computing time
def cached_execution(cache,code):
if code not in cache:
cache[code] = eval(code)
return cache[code]
Recursive Definitions:
No such thing as the longest word—just something that has meaning that speakers of the language could understand
Recursive definition: has a “Base Case” (stopping case), and “Recursive Case”
- Base Case: a starting point, not defined in terms of itself
- Smallest input, already know the answer
- Recursive Case: defined in terms of “smaller” version of itself
Ancestor Parent (Base Case)
Ancestor Parent of Ancestor (Recursive Case)
Factorial(n) = n * (n-1) * (n-2)…
Base Case: factorial(0) = 1
Recursive Case: n * factorial(n-1)
def factorial(n):
if n == 0:
return 1 base case
else:
return n * factorial(n-1)
Palindromes:
Base Case with numbers if usually something to do with numbers like 0 or 1
Base Case with strings could use ‘’ True
Recursive case: test if first and last characters in string are equal
If they match it’s false
If they don’t, need to check rest of string
def is_palindrome(s):
if s == ‘’:
return True
else:
if s[0] == s[-1]:
return is_palindrome(s[1:-1]
else:
return False
Recursive vs Iterative:
Recursive calls are expensive, but easier to understand/get right (in palindrome it makes a new string each time)
If care about performance, then better off finding an iterative approach to lessen cost
Liber Abaci : Leonardo da Pisa (Fionacci 1202) – book of calculations
Replaced Roman Numerals with Arabic Numerals using decimals
Fibonacci, previous two numbers added together to create next number
Two base cases:
fibonacci(0) = 0
fibonacci(1) = 1
Fibonacci(n) = Fibonacci(n-1) + finbonacci(n-2) recursive
Where n>1
def fibonacci(n):
if n<1:
return 0
if n==1:
return 1
return fibonacci(n-1) + fibonacci(n-2)
Recursive breakdown:
def recursive(n):
if basecase:
return basecaseanswer
return recursive(n-1) whatever recursive formula is
Instead of using recursive, use a while loop (anything you can define recursively, you can define iteratively)
def fibonacci(n):
current = 0
after = 1
for i in range(0,n):
current, after = after, current + after
return current
#checking to see how many months it will take to have greater mass of rabbits than mass of earth
mass_of_earth = 5.9722 * 10**24 # kilograms
mass_of_rabit = 2 # 2 kilograms per rabbit
n = 1
do while fibonacci(n) * mass_of_rabit < mass_of_earth:
n = n + 1
print n, fibonacci(n)
Take 10 years for the rabbits to take over the earth
Learned to make crawler, then learned how to index these URLS by keyword so that this crawler doesn’t have to execute each time when searching, need to rank urls to get in right order
Close, but need to add in friends popularity score
Not a recursive definition (above), but a circular definition (no base case!)
Never going to get an answer (will never know how popular Charlie is)
Can we give a base case? Need to invent one
popularity(‘Alice’) = 1
popularity(p)=SUM(popularity(f)) (
def popularity(p):
if p==’Alice’:
return 1
score = 0
for f in friends(p):
score = score + popularity(f)
return score
This (above) still will not work, because it will still bounce back and forth and make a circular reference if two people are linked to each other
Relaxation Algorithm:
start with a guess
while not done:
make the guess better
popularity(time step, person) score
popularity(0,p) 1 BASE CASE
ppopularity(t,p) = SUM(popularity(t-1))
t>0
def popularity(t,p):
if t ==0:
return 1
else:
score = 0
for f in friends(p):
score = score + popularity(t-1,f)
return score
Is this a good recursive definition? Yes, because it will always return a result (base case will be achieved)
rank(0,url) 1
rank(t,url)sum(rank(t-1,p)/outlinks[p]
p E in links[url]
sum up the popularity of all pages that links to a page and divide by the number of outlinks (since a page just full of links is probably not as important)
d = damping constant 0.8 (determines how frequently web server will pick a new link, versus start at a new random page), typical value is less than 1
n = number of pages
Start by initializing 1/n, and for as many timesteps, for each link that links to us, we take its popularity, multiplied by its dampening, and add in the value of random web server starting over
Page Rank algorithm described was what launched Google, made by cofounder Larry Page (so algorithm was called Page Rank)
–was made because AltaVista used to page rank based on # of keywords on a page, so a page that said keyword “restaurant” 32 times was ranked higher than one that said “restaurant” twice, even if the site w/ the keyword twice had more traffic
Implementing URank (PageRank™ is property of google)
crawl_web(seed)index,graph (where the graph is the structure of the pages each url links to)
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = {}
graph = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
graph[page] = outlinks do not need to put in [] because output is already a list
union(tocrawl, get_all_links(content))
crawled.append(page)
return index, graph
Computing the Page Ranks:
Use graph from crawl_web
def compute.ranks(graph) and return dictionary with {url : rank}
lookup_best, pass in index, keyword, dictionary of ranked pages and returns the best page
d = 0.8 (dampening constant)
rank(0,url) = 1/npages
newranks[url]= (1-d)/npages + sum of d*rank(t-1,p)/# of outlinks from p
ranks = ranks at time t-1
newranks = ranks at time t
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0,numloops):
newranks = {}
for page in graphs:
newrank = (1 – d) / npages
for node in graph:
if page in graph[node]:
newrank = newrank + d * (ranks[nodes]/len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks
def hexes_to_udaciousness(n, spread, target):
if n >= target:
return 0
else:
return 1 + hexes_to_udaciousness(n + (n * spread),spread,target)
1, [2,3] , 4
1,
[2,3] list, so add the sum + 1 for the list and keep running
def deep_count(p):
sum = 0
for e in p:
if is_list(e):
sum = sum + 1 + deep_count(e)
else:
sum = sum + 1
return sum
HOMEWORK 6 PROBLEM 5:
#Feeling Lucky
#In Unit 6, we implemented a page ranking algorithm, but didn’t finish the final
#step of using it to improve our search results. For this question, you will use
#the page rankings to produce the best output for a given query.
#Define a procedure, lucky_search, that takes as input an index, a ranks
#dictionary (the result of compute_ranks), and a keyword, and returns the one
#URL most likely to be the best site for that keyword. If the keyword does not
#appear in the index, lucky_search should return None.
def lucky_search(index, ranks, keyword):
toprank = 0
if keyword not in index:
return None
for e in index[keyword]:
if ranks[e] > toprank:
toprank = ranks[e]
result = e
return result
cache = {
‘http://udacity.com/cs101x/urank/index.html’: “””<html>
<body>
<h1>Dave’s Cooking Algorithms</h1>
<p>
Here are my favorite recipies:
<ul>
<li> <a href=”http://udacity.com/cs101x/urank/hummus.html”>Hummus Recipe</a>
<li> <a href=”http://udacity.com/cs101x/urank/arsenic.html”>World’s Best Hummus</a>
<li> <a href=”http://udacity.com/cs101x/urank/kathleen.html”>Kathleen’s Hummus Recipe</a>
</ul>
For more expert opinions, check out the
<a href=”http://udacity.com/cs101x/urank/nickel.html”>Nickel Chef</a>
and <a href=”http://udacity.com/cs101x/urank/zinc.html”>Zinc Chef</a>.
</body>
</html>
“””,
‘http://udacity.com/cs101x/urank/zinc.html’: “””<html>
<body>
<h1>The Zinc Chef</h1>
<p>
I learned everything I know from
<a href=”http://udacity.com/cs101x/urank/nickel.html”>the Nickel Chef</a>.
</p>
<p>
For great hummus, try
<a href=”http://udacity.com/cs101x/urank/arsenic.html”>this recipe</a>.
</body>
</html>
“””,
‘http://udacity.com/cs101x/urank/nickel.html’: “””<html>
<body>
<h1>The Nickel Chef</h1>
<p>
This is the
<a href=”http://udacity.com/cs101x/urank/kathleen.html”>
best Hummus recipe!
</a>
</body>
</html>
“””,
‘http://udacity.com/cs101x/urank/kathleen.html’: “””<html>
<body>
<h1>
Kathleen’s Hummus Recipe
</h1>
<p>
<ol>
<li> Open a can of garbonzo beans.
<li> Crush them in a blender.
<li> Add 3 tablesppons of tahini sauce.
<li> Squeeze in one lemon.
<li> Add salt, pepper, and buttercream frosting to taste.
</ol>
</body>
</html>
“””,
‘http://udacity.com/cs101x/urank/arsenic.html’: “””<html>
<body>
<h1>
The Arsenic Chef’s World Famous Hummus Recipe
</h1>
<p>
<ol>
<li> Kidnap the <a href=”http://udacity.com/cs101x/urank/nickel.html”>Nickel Chef</a>.
<li> Force her to make hummus for you.
</ol>
</body>
</html>
“””,
‘http://udacity.com/cs101x/urank/hummus.html’: “””<html>
<body>
<h1>
Hummus Recipe
</h1>
<p>
<ol>
<li> Go to the store and buy a container of hummus.
<li> Open it.
</ol>
</body>
</html>
“””,
}
def get_page(url):
if url in cache:
return cache[url]
return “”
def get_next_target(page):
start_link = page.find(‘<a href=’)
if start_link == -1:
return None, 0
start_quote = page.find(‘”‘, start_link)
end_quote = page.find(‘”‘, start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
def crawl_web(seed): # returns index, graph of inlinks
tocrawl = [seed]
crawled = []
graph = {} # <url>, [list of pages it links to]
index = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
graph[page] = outlinks
union(tocrawl, outlinks)
crawled.append(page)
return index, graph
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 – d) / npages
for node in graph:
if page in graph[node]:
newrank = newrank + d * (ranks[node] / len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks
def ancestors(genealogy, person):
ancestorlist = []
for e in genealogy:
if e == person or e in ancestorlist:
for p in genealogy[e]:
if p not in ancestorlist:
ancestorlist.append(p)
return ancestorlist
Unit 7
Themes:
Procedural abstraction – write one procedure and use it (so hide the details—don’t need to know what the exact code is, just know what the input/output would be)
Universality: can make any program with the three things above
Recursive Definitions: need something to cause recursion, and a base case
Breakdown of themes in each Units:
What is Computer Science?
Engineering:
Use abstraction to hide details to overcome limits of human minds (in relation to how much you can keep in your mind at one time)
Similar to engineering by making things while under constraint of human minds (not laws of physics)
Science:
Biology is really all about computation (DNA) in nature
Liberal Art:
The Seven Liberal Arts:
3 about language and 4 about number