Python, Udacity Class

Udacity Classes 6: Programming Languages

Below are my notes from when I originally took the Udacity course Programming Lanuages. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

Programming Languages

Description: This class will give you an introduction to fundamentals of programming languages. In seven weeks, you will build your own simple web browser complete with the ability to parse and understand HTML and JavaScript. You will learn key concepts such as how to specify and process valid strings, sentences and program structures. Then, you will design and build an interpreter – a program that simulates other programs.


String Patterns

Finding and specifying classes of strings using regular expressions


Lexical Analysis

Breaking strings down into important words



Specifying and deconstructing valid sentences



Turning sentences into trees



Simulating programs


Building a Web Browser

Interpreting HTML and JavaScript



Exam testing your knowledge

Unit 1:

Break up strings into meaningful words using Python

Find returns -1 if does not find string

Recursive could be using the base case if returns -1, otherwise it would keep passing in the newly found value as the starting position

[1:] go as far as possible

[0:-1] -1 is the last letter (so do not include)

def myfirst_yoursecond(p,q):

ps = p.find{” “)

qs = q.find(” “)

return p[:ps] == q[qs + 1:]

Can use the split function in python string.split() in order to do this

Regular Expression:

Use these to denote a large number of simple strings

Common for entering structured data (mm/dd/yyyy)

Regular expressions are for STRINGS, so 5 and 05 are NOT the same thing

import re imports python regular expression module

Python regular expressions start with a little “r” before the double quotations

Picking the correct regular expression to use is an exercise in creativity—finding the most elegant is a skill


returns list: [‘1’,’2’,’3’]


returns list: [‘1’,’2’]

Regular Expressions are CASE SENSITIVE

To create a large number of strings to search for have sub-parts to make compound regular expression

+ looks back at previous regular expression and match it once or more

r”[0-1]” matches any and all combinations of 0 and 1

Maximal Munch will consume and match as big a string as it can at once

Finite State Machine:

| is either/or for regular expressions (disjunction)


Regular Expression: characters that are left alone are matched exactly:

r”ab|[0-9]+” will match “ab” OR any number of digits

Epsilon is used to “consume no input”

? “optional” “the previous thing 0 or 1 times”

re.findal(r”-?[0-9]”,”1861-1941 R. Targore”)

returns: [‘1861’,’-1941’]

* is zero or more (+ is one or more)

+ * ? [ ] all mean something special, so you need the Escape Sequences

\ in front of quote/double quote, it will treat it as being part of the string (escape out of using quotes as delimiters)

r”\+\+” finds the string ‘++’

r”[a-z]+-?[a-z]+” matches lowercase words with optional single hyphenated words

however: needs at least 2 letters, so will not accept “a” or “i”

regexp = r”[a-z]+\( *[0-9]+ *\)”

[0-9].[0-9] would be something like 1a1 or 2b4

^ inside [] this means not or set complement –asking for anything that is not whatever is specified

[^ab] will return anything that is not “a” or “b”

Like math in grouping, but opening parenthesis is (?: )

mi+ the + is applying to the i, so would get miiii and not mimi

MIDI: musical instrument digital interface: list of notes

MP3: not like this (at least, does not look like it at first blush)


Maps (associate one thing with another, aka dictionary) and Tuples (immutable list—cannot be changed)

Simulate how Finite State Machine works

# FSM Interpretation

# Define edges and accepting to encode r”q*”. Name your start state 1.

edges = {(1,’q’):1}

accepting = [1]

def fsmsim(string, current, edges, accepting):

if string == “”:

return current in accepting


letter = string[0]

if (current, letter) in edges:

destination = edges[(current, letter)]

remaining_string = string[1:]

return fsmsim(remaining_string, destination, edges, accepting)


return False

print fsmsim(“”,1,edges,accepting)

# >>> True

print fsmsim(“q”,1,edges,accepting)

# >>> True

print fsmsim(“qq”,1,edges,accepting)

# >>> True

print fsmsim(“p”,1,edges,accepting)

# >>> False

Finite State Machines are generous: will allow it to return True if ANY path can lead to accepting (even if path is labeled the same)

2 kinds of FSMs:

Non-deterministic FSM: have choices

Lock-step FSM: no epsilon or ambiguity

Converting from nondeterministic to deterministic: group together and “epsilons” are freebies

Wrap Up/Conclusion for Unit 1:

Homework 1:

def sumnums(sentence):

nums = re.findall(r”[0-9]+”,sentence)

return sum(int(n) for n in nums)

Unit 2

<a href=”link”> (href stands for hypertext reference)

Lexicon Analysis: lexicon (dictionary) to break something down into words

Token: smallest unit of lexical analysis output: words, strings, numbers, punctuation (Not whitespace)

Use regular expressions to specify tokens

def t_RANGLE(token):


return token

def t_STRING(token):

r’”[^”}*”’ ^” is anything except a double-quote

return token

def t_WORD(token):

r'[^ |^<|^>]*’ not a space, < or > CAN ALSO BE WRITTEN:

r’[^ <>\n]+’

Lexical Analyzer (lexer): just a collection of token definitions

Ambiguity: sometimes can be identified as more than one token (number or word), so the first one wins (ordering tokens is of prime importance!)

We don’t just serve hamburgers, we serve people!


Can’t count backward from end of string using negative numbers, so -1 is the first last character (or end of string)

import ply.lex as lex

output of a lexer is a list of tokens


def t_newline(token):


token.lexer.lineno +=1


states = (



Javascript, embed into HTML:

<script type=”text/javascript”>

document.write(“Hello world”); equivalent of print “Hello world” in python


<script type=”text/javascript”>

function factorial(n) {

if (n==0 {

return 1;

} ; <-python uses tabbing to figure out start/end, javascript uses curly-braces

return n * factorial(n-1);




def t_IDENTIFIER(token):

r’[a-zA-Z] [a-zA-Z]*’

def t_NUMBER(token):


token.value = float(token.value)

return token

#python comments

//javascript comments

‘visual basic comments



HTML/JavaScript can be broken up into tokens number, word, string (use regular expressions to specify)

Unit 3

Noam Chomsky 1955: Syntactic Structures: utterances have rules (formal grammars)

Grammar, rewrite rule, non-terminals, terminals

Derivation able to derive “students think” using the rewrite rules

Another derivation teachers write

Recursion in a context-free grammar can allow for an infinite number of utterances

subj subj and subj is a recursive rewrite rule

Syntactical Analysis “Parsing”

Token List Valid in Grammar?

Lexing + Parsing = Expressive Power

Word Rules + Sentence Rules = Creativity!

Parsing turns grammar into structure, lexing turns terminal (number) into something valid

stmt (statement) identifier = exp

Optional Parts of Languages

E/Epsilon means blank string, nothing, space

Grammar >= RegExp (anything can specify with regular expression can be expressed in grammar)

Regular expressions describe regular languages

Grammars describe context-free languages (can have things in between the expression finding)

Grammar allows balanced parenthesis, while regular expression could allow unbalanced



IMPOSSIBLE to capture balanced parenthesis with regular expression

Balanced parenthesis are everywhere in HTML and JavaScript (tags)

Must nest tags appropriately (or not valid HTML)

Parse Trees: nodes and leaf

90% of software is maintenance

Ambiguity: which is using the binoculars?

If you can find 1 example where you can have 2 separate parse trees, effectively entire grammar is ambiguous

Grammars for HTML & JavaScript

htmlelement html



elementtag_open html tag_close

elementtag_open word tag_close



JavaScript is similar to Python


def absval(x):

if x<0:

return 0 – x


return x


function absval(x) {

if x<0 {

return 0 – x;

} else {

return x;




print “hello” + “!”


document.write(“hello” + “!”);


write(“hello” + “!”);


def uptoten(x):

if x<10:

return x


return 10


function uptoten(x){

if x<10 {

return x;

} else {

return 10;



Universal Meaning?

def vs function

meaning stayed the same: we can translate between python and javascript

Universal Grammar? Languages are all Turing – complete

JavaScript Grammar:






Expexp + exp



We’ve seen Expressions and Statements, now we need FUNCTIONS!

Functions = Declare and Call

Majority of JS Grammar

Creating Grammars vs Checking Utterances:

How to get program to decide whether or not string is valid:

Super Slow: Enumerate all valid strings (since there are infinite grammars, may take an infinite number of times)

We’ll try this first:

  1. Instructive
  2. Power tools

Introducing: LAMBDA


def addtwo(x): return x+2



mystery = lambda(x): x+2



pele = mystery



Lamda means “make me a function”

Lamda is popular in paradigm of functional programming, contrast to object oriented programming and imperative programming

List Power

def mysquare(x): return X*X

map(mysquare,[1,2,3,4,5]) = [1,4,9,16,25]


map(lambda(x):x*x,[1,2,3,4,5]) = [1,4,9,16,25] anonymous function (never given a name)

map is also popular in functional programming, transforms each element in a list by doing work to each of its elements

Real Convenience: List Comprehensions

[x*x for x in [1,2,3,4,5]

] will do the same as above

List comprehension defines a new list by transforming an old one

The word “comprehension” in “list comprehension” relates to the set comprehension notation in Mathematics.

Brendan Eich: CTO of Mozilla and designer of JavaScript

How to filter

def odds_only(numbers):

for n in numbers:

if n / 2 == 1:

yield n

t [x for x in [1,2,3,4,5] if x%2 == 1] only returns those that mod to 1 (or are odd)

if tokens[pos] == rule[0]:

yield tokens[0:pos] + rule[1] +tokens[pos+1:]

Unit 4 Personal Notes

Device-drivers with third party software would cause lots of the Microsoft OS bugs

Microsoft Static-driver verifier (SLAM) to check for bugs

Given a string s and a grammar G, is s in the language of G?

Lexing: Lexical analysis that breaks a string down into tokens

Parsing: Syntactical analysis that checks to see if they conform to a context-free grammar

Time flies like an arrow, fruit flies like a banana. ambiguity and hilarity

Brute Force: try all options exhaustively

Programmer virtues:

  1. Laziness: not duplicating work, makes you go to great effort to reduce overall energy expenditures
  2. Impatience: writing something to anticipate (or seem to anticipate) next move because computer is too slow if it just reacts
  3. Hubris: excessive pride: makes you want to write and maintain programs that other people won’t want to say bad things about

Memoization: computer science technique in which we keep a “chart” or “record” of previous computations and computer new values in terms of previous answers

If webpage takes longer than 6 seconds to get back to you, you go somewhere else/buy something else (figured out by study)

Implement a chart as a python dictionary: {}

Chart = {}

chart[5] = 5

chart[6] = 8

if 6 in chart:

print chart[6]

Fibonacci with Memoization:

# Memofibo

# Submit via the interpreter a definition for a memofibo procedure that uses a

# chart. You are going to want the Nth value of the chart to hold the Nth

# fibonacci number if you’ve already computed it and to be empty otherwise.

chart = {}

def memofibo(n):

if n in chart:

return chart[n]

elif n > 2:

chart[n] = memofibo(n-2) + memofibo(n-1)

return chart[n]

return 1

print memofibo(24)

Idea: Use memoization for parsing!

A Parsing State is a rewrite rule from the grammar augmented with 1 red dot on the right-hand side

Chart[n] = all parse states we could be in after seeing t1, t2…tn only

We will use memoization in our parser. The nth entry in the chart will contain all of the parse states we could be in after seeing the first n tokens of the input.

Must add starting position or from position to our parse states

Alternate view: PARSING is the inverse of producing strings

Applying the reductions in reverse order it is parsing, the other way is generating the string

Examples: * symbolizes red dot

chart[i] has x ab * cd from j

for all grammar rules

c pqr

we add

cpqr from i

to chart[i]

-Predicting or Computing the Closure

Three ways to complete parsing chart: Computing the Closure is one (if non-terminal)

Another way: Shifting—consuming the input (if terminal, just shift over it)

Third way: Reduce (what if nothing left?), reduce it to the grammar rule

E E + E

E int

int + int + int

E + int + int (Magic—or reduction)

Jay Earley invented the approach to parsing that we’ve been developing here. This sort of parsing is also commonly used in computational linguistics.

Code this in python using dictionary to group lists, in order to not add duplicates use sets on right side of dictionary

def addtochart(chart, index, state):

if not (state in chart[index]):

chart[index] = [state] + chart[index] how to concat to list

return True


return False

List comprehensions make a glorious re-appearance! In addition, we encode our parsing states as Python tuples.

For more information:

  • Declarative programming has many benefits. One such benefit is that it often allows the compiler or language system to efficiently implement your will for you: you don’t have to worry about implementation details, you just state what you want and it happens.





In python:

grammar = [

(“s”, [“P”]),



Paper: x ab * cd from j


state = (“x”,[“a”,”b”],[“c”,”d”],j)

def closure(grammar, i, x, ab, cd, j):

next_states = [


for rule in grammar

if cd<>[] and rule[0]==cd[0]


return next_states

def shift(tokens,ab,cd,i,j,x)

if cd<>[] and tokens[i]==cd[0]:

return (x,ab+[cd[0]],cd[1:],j)


return None

def reductions(chart,I,x,ab,cd,j):

return [


for jstate in chart[j]

if cd==[] and jstate[2]<>[] and (jstate[2])[0]==x ]

[“word_element”,”Rama’s”), “word_element”,”Journey”)]

Now that able to break down into tokens, parse into trees, next step is to interpret by walking across the tree

Brendan Eich, CTO of Mozilla and creator of JavaScript, talks about uses of memoization at Mozilla — more at run-time than during parsing.

Unit 5

Making web browser:

  1. String of HTML and JavaScript
  2. Lexical analysis: Break it down into tokens/words
  3. Syntactical analysis: Parse those into a tree
  4. Interpret: Walk the tree and understand it

Syntax (form of an utterance) vs Semantics (meaning)

“Colorless green ideas sleep furiously!” – Noam Chomsky

Syntax and semantics are not the same thing

Lexing and parsing deal with the form of an utterance. We now turn our attention to the meaning of an utterance — we turn our attention to semantics.

A well-formed sentence in a natural language can be “meaningless” or “hard to interpret”. Similarly, a syntactically valid program can lead to a run-time error if you try to apply the wrong sort of operation to the wrong sort of thing (e.g., 3 + “hello”).

For more information:

For more information:

One goal of semantic analysis is to notice and rule out bad programs (i.e., programs that will apply the wrong sort of operation to the wrong sort of object). This is often called type checking.

Ruling Out Bad Programs:

Type checking / semantic analysis (looking at source code and make sure it makes sense)

A type is a set of similar objects (like numbers, strings, lists, etc) with associated operations

+ adds numbers, concats strings, and appends lists

len measures number of characters in string and elements in list

A type is a set of similar objects (e.g., number or string or list) with an associated set of valid operations (e.g., addition or length).

For more information:

For more information:

“she made no reply, up her mind, and a dash for the door.”

“she lowered her standards by raising her glass, her courage, her eyes and his hopes.”

Mismatched tags

For more information:


We introduce the particular graphics library we will use in this class to render webpages. It is intentionally simple so that we can focus on core programming language concepts.

For more information:

Use a library of someone else’s graphics library to help render a webpage:


graphics.begintag(string,dictionary) a, href=”google”




Nelson Mandela <b> was elected </b> democratically.

Would turn into:








the graphics library would determine the word wrapping, appearance of bold, etc


# Your function should display HTML according to a given parse tree.

# graphics.warning(msg) displays an error message. Upon encountering mismatched

# tags, use graphics.warning to display the error message: “mismatched tag”.

# To display a tag, use graphics.begintag(tag,args) at the start and

# graphics.endtag() at the end of the tag.

import graphics

def interpret(trees): # Hello, friend

for tree in trees: # Hello,

# (“word-element”,”Hello”)

nodetype=tree[0] # “word-element”

if nodetype == “word-element”:


elif nodetype == “tag-element”:

# <b>Strong text</b>

tagname = tree[1] # b

tagargs = tree[2] # []

subtrees = tree[3] # …Strong Text!…

closetagname = tree[4] # b

if tagname <> closetagname:

graphics.warning(“mistmatched tag”)


graphics.begintag(tagname, tagargs)



# Note that graphics.initialize and finalize will only work surrounding a call

# to interpret

graphics.initialize() # Enables display of output.\


graphics.finalize() # Enables display of output.

Eval (evaluation) is what it’s called to solve arithmetic expressions

We will write a recursive procedure to interpret JavaScript arithmetic expressions. The procedure will walk over the parse tree of the expression. This is sometimes called evaluation.

Java Evaluation Expression

# Quiz: Eval Exp

# Write an eval_exp procedure to interpret JavaScript arithmetic expressions.

# Only handle +, – and numbers for now.

def eval_exp(tree):

# (“number” , “5”)

# (“binop” , … , “+”, … )

nodetype = tree[0]

if nodetype == “number”:

return int(tree[1])

elif nodetype == “binop”:

left_child = tree[1]

operator = tree[2]

right_child = tree[3]

left_value = eval_exp(left_child)

right_value = eval_exp(right_child)

if operator == “x”:

return left_value + right_value

elif operator == “-“:

return left_value – right_value

test_tree1 = (“binop”,(“number”,”5″),”+”,(“number”,”8″))

print eval_exp(test_tree1) == 13

test_tree2 = (“number”,”1776″)

print eval_exp(test_tree2) == 1776

test_tree3 = (“binop”,(“number”,”5″),”+”,(“binop”,(“number”,”7″),”-“,(“number”,”18″)))

print eval_exp(test_tree3) == -6

We need to know the values of variables — the context — to evaluate an expression. The meaning of x+2 depends on the meaning of x.

The state of a program execution is a mapping from variable names to values. Evaluating an expression requires us to know the current state.

To evaluate x+ 2, we’ll keep around a mapping from variable names to values. This mapping is called the state. {“x”:3,”y”:3}


# QUIZ : Variable Lookup

# Adding variable lookup to the interpreter!

def eval_exp(tree, environment):

nodetype = tree[0]

if nodetype == “number”:

return int(tree[1])

elif nodetype == “binop”:

left_value = eval_exp(tree[1], environment)

operator = tree[2]

right_value = eval_exp(tree[3], environment)

if operator == “+”:

return left_value + right_value

elif operator == “-“:

return left_value – right_value

elif nodetype == “identifier”:

variable_name = [tree[1]

return env_lookup(environment,variable_name)

# Here’s some code to simulate env_lookup for now. It’s not quite what we’ll be

# using by the end of the course.

def env_lookup(env,vname):

return env.get(vname,None)

environment = {“x” : 2}

tree = (“binop”, (“identifier”,”x”), “+”, (“number”,”2″))

print eval_exp(tree,environment) == 4

If, While, Return change the flow of control

Python and JavaScript have conditional statements like if — we say that such statements can change the flow of control through the program.

Program elements that can change the flow of control, such as if or while or return, are often called statements. Typically statements contain expressions but not the other way around.

For more information:

Statements are like entire sentences whereas expressions are like nouns

Statements often contain expressions, but not the other way around.


# QUIZ: Evaluating Statements

def eval_stmts(tree, environment):

stmttype = tree[0]

if stmttype == “assign”:

# (“assign”, “x”, (“binop”, …, “+”, …)) <=== x = … + …

variable_name = tree[1]

right_child = tree[2]

new_value = eval_exp(right_child, environment)

env_update(environment, variable_name, new_value)

elif stmttype == “if-then-else”: # if x < 5 then A;B; else C;D;

conditional_exp = tree[1] # x < 5

then_stmts = tree[2] # A;B;

else_stmts = tree[3] # C;D;

# QUIZ: Complete this code

# Assume “eval_stmts(stmts, environment)” exists

If eval_exp(conditional_exp,environment) == True:

eval_stmts(then_stmts, environment)


eval_stmts(else_stmts, environment)

def eval_exp(exp, env):

etype = exp[0]

if etype == “number”:

return float(exp[1])

elif etype == “string”:

return exp[1]

elif etype == “true”:

return True

elif etype == “false”:

return False

elif etype == “not”:

return not(eval_exp(exp[1], env))

def env_update(env, vname, value):

env[vname] = value

environment = {“x” : 2}

tree = (“if-then-else”, (“true”, “true”), (“assign”, “x”, (“number”, “8”)), (“assign”, “x”, “5”))

eval_stmts(tree, environment)

print environment == {“x”:8}


We use the term scope to refer to the portion of a program where a variable has a particular value.

For more information:

  • Hong Kong is a Special Administrative Region of China. It was controlled by the United Kingdom until 1997.
  • The most common dialect spoken in Hong Kong is Cantonese. It is pronounced very differently from the Mandarin dialect but they are written similarly.

Identifiers and Storage:

Because the value of a variable can change, we will use explicit storage locations to track the current values of variables.

Environment is used to describe scope of variable (the box it is in)

There is a special global environment that can hold variable values. Other environments have parent pointers to keep track of nesting or scoping. Environments hold storage locations and map variables to values.

Every environment except the global one has parents

There is a special global environment that can hold variable values. Other environments have parent pointers to keep track of nesting or scoping. Environments hold storage locations and map variables to values.

For more information:

In which nested function declarations are introduced.

For more information:

For more information:

For more information:

Catching Errors:

Modern programming languages use exceptions to notice and handle run-time errors. “try-catch” or “try-except” blocks are syntax for handling such exceptions.

For more information:

  • Joseph Heller wrote Catch-22, a satirical novel. The protagonist pretends to be insane to avoid the dangers of war, but the desire to avoid danger is evidence of his sanity.
  • Definition: catch-22

Frames – Function Calls

# QUIZ : Frames

# Return will throw an excception

# Function Calls: new environments, catch return values

def eval_stmt(tree,environment):

stmttype = tree[0]

if stmttype == “call”: # (“call”, “sqrt”, [(“number”,”2″)])

fname = tree[1] # “sqrt”

args = tree[2] # [ (“number”, “2”) ]

fvalue = env_lookup(fname, environment)

if fvalue[0] == “function”:

# We’ll make a promise to ourselves:

# (“function”, params, body, env)

fparams = fvalue[1] # [“x”]

fbody = fvalue[2]

fenv = fvalue[3]

if len(fparams) <> len(args):

print “ERROR: wrong number of args”


newenv = ( fenv, {} ) # new environment

for I in range(len(args)):

argval = eval_exp(arges[i],env)

(newenv[1])[fparams[i]] = argval # add to env



return None

except Exception as return_value:

return return_value


print “ERROR: call to non-function”

elif stmttype == “return”:

retval = eval_exp(tree[1],environment)

raise Exception(retval)

elif stmttype == “exp”:


def env_lookup(vname,env):

if vname in env[1]:

return (env[1])[vname]

elif env[0] == None:

return None


return env_lookup(vname,env[0])

def env_update(vname,value,env):

if vname in env[1]:

(env[1])[vname] = value

elif not (env[0] == None):


def eval_exp(exp,env):

etype = exp[0]

if etype == “number”:

return float(exp[1])

elif etype == “binop”:

a = eval_exp(exp[1],env)

op = exp[2]

b = eval_exp(exp[3],env)

if op == “*”:

return a*b

elif etype == “identifier”:

vname = exp[1]

value = env_lookup(vname,env)

if value == None:

print “ERROR: unbound variable ” + vname


return value

def eval_stmts(stmts,env):

for stmt in stmts:


sqrt = (“function”,(“x”),((“return”,(“binop”,(“identifier”,”x”),”*”,(“identifier”,”x”))),),{})

environment = (None,{“sqrt”:sqrt})

print eval_stmt((“call”,”sqrt”,[(“number”,”2″)]),environment)

In languages like Python or JavaScript, functions can be values (i.e., evaluating an expression can result in a function). As a result, we must decide how to represent function values. We’ll use tuples that include the function body, parameter names, and the relevant environment.


Real World: language influences thought (somethings easier in 1 language than another)

Computing: languages are equally expressive (somethings easier in 1 language than another)

Interpreting is DEEP: the downside is that simulating a program often involves running it

Infinite Loop:

Want: look at program source, see if it loops forever or if it halts

This is impossible (proven)

Draw crossed swords to resemble contradiction

If assumed true, it becomes false. If false, it becomes true. No possible value for halts that works. This is a circular reference—paradox (self reference).

This sentence is false. paradox

Instead make programs that are correct with very high probability (like 99.9999%), but impossible to be correct 100% of the time

View this as an opportunity to come as close as possible (reach for the stars)

Unit 6

We describe the software architecture of our web browser. Our HTML lexer, parser and interpreter will drive the main process; our JavaScript lexer, parser and interpreter will serve as subroutines.

For more information:

  • A blueprint is a detailed plan documenting a design.

Web Browser Architecture:

  1. Web page is lexed and parsed
  2. HTML interpreter walks AST, calls JavaScript Interpreter
  3. JavaScript code calls write()
  4. JavaScript interpreter stores text from write()
  5. HTML Interpreter calls graphics library
  6. Final image of webpage is created

We change our HTML lexer to recognize embedded JavaScript fragments as single tokens. We’ll pass the contents of those tokens to our JavaScript lexer, parser and interpreter later.

For more information:

Save JavaScript to interpret for later because JavaScript might look like HTML and will be interpreted incorrectly

def t_javascript(token):

r’\script\ type=\”text\/javascript\”\>’

token.lexer.code_start = token.lexer.lexpos


def t_javascript_end(token):


token.value = token.lexer.lexdata[token.lexer.code_start:

token.lexer.lexpos-9] subtracting 9 to remove the ending token

token.type = ‘JAVASCRIPT’

token.lexer.lineno += token.value.count(‘\n’)


return token

We extend our HTML parser to handle our special token representing embedded JavaScript.

For more information:

For more information:

  • Yacc, or yet another compiler compiler, is the prototypical parser generator.

A JavaScript program may contain zero, one or many calls to write(). We will use environments to capture the output of a JavaScript program.

The sort of name collision discussed here is a significant concern in computer security. It would be safer to use a separate variable or channel to communicate such strings back to our interpreter.

Good to have a space because there is no way to create user variable with a space in it, prevents collision and security risk

An example of a webpage with embedded computation is shown.

For more information:

  • The Chinese Room thought experiment was published in 1980 by philosopher John Searle. In it, a computer is constructed that is able to carry on conversations in written Chinese, but intuition may suggest that the computer does not actually having understanding or consciousness. The thought experiment had such a lasting impact that the field of cognitive science has been jokingly described as “the ongoing research program of showing Searle’s Chinese Room Argument to be false.”

A good test case gives us confidence that a program implementation adheres to its specification. In this situation, a good test case reveals a bug.

For more information:

For more information:

We use testing to gain confidence that an implementation (a program) adheres to it specification (the task at hand). If a program accepts an infinite set of inputs, testing alone cannot prove that program’s correctness. Software maintenance (i.e., testing, debugging, refactoring) carries a huge cost.

The implementation adheres to the solution specifications and meets all of the requirements

Maintenance: 90% of cost/time of software

For more information:

Steve Fink of Mozilla talks about testing in the real world. The “range error” example he mentions is related to array bounds checking, a critical language feature for both correctness and security. He argues for simple test cases with obvious control flow, and talks about removing parts of a test case to help localize the fault.

It can be difficult to test nested or anonymous functions. We will add support for them to our JavaScript interpreter.

Remember from previous Units that we store function values as 4-tuples: (“function”, parameters, body, environment).

My old job localizing compiler bugs by shrinking test cases is now almost completely automated by Delta Debugging, Andreas Zeller‘s scientific approach to debugging.

On a personal note, Udacity will be offering a course on debugging by Andreas Zeller in a bit. I highly recommend that you take it — Andreas is a great guy and he explains things clearly.

An optimization improves the performance of a program while retaining its meaning (i.e., without changing the output).

For more information:

“JIT” Just-in-Time Improvement

Optimization Idea: replace input webpage with one that is faster but has the same meaning

For more information:

How to Implement Optimizations:

  1. Think of optimizations
  2. Transform parse tree

Replacing an expensive multiplication with a cheaper addition is an instance of strength reduction.

x/x cannot just equal 1 because of 0/0 (exception), cannot rule out exceptions when doing optimizations

Steve Fink of Mozilla talks about optimization in the real world. Mozilla’s JavaScript interpreter is called SpiderMonkey, and it has gone through a number of different just-in-time (JIT) optimization architectures. He reminds us that you must always be aware of the cost of an optimization. He also notes that it is rare that a software engineer spends the entire day writing new code — looking over old code is very important and time-consuming.

Optimizing Timing:

Change the parse tree before interpreting

Steps of Browser:

Program Text Lexing (using regular expressions to break apart) Tokens Parsing (using context-free grammars to make sure everything is syntactically sound) Tree Optimization (optional) Tree (simpler one, or at least the same one) Interpreting Result (meaning)

In this class we will optionally perform optimization after parsing but before interpreting. Our optimizer takes a parse tree as input and returns a (simpler) parse tree as output.

We desire an optimizer that is recursive. We should optimize the child nodes of a parse tree before optimizing the parent nodes.

# Optimization Phase

def optimize(tree): # Expression trees only

etype = tree[0]

if etype == “binop”:

# Fix this code so that it handles a + ( 5 * 0 )

# recursively! QUIZ!

a = optimize(tree[1])

op = tree[2]

b = optimize(tree[3])

if op == “*” and b == (“number”,”1″):

return a

elif op == “*” and b == (“number”,”0″):

return (“number”,”0″)

elif op == “+” and b == (“number”,”0″):

return a

return tree

return tree

print optimize((“binop”,(“number”,”5″),”*”,(“number”,”1″))) == (“number”,”5″)

print optimize((“binop”,(“number”,”5″),”*”,(“number”,”0″))) == (“number”,”0″)

print optimize((“binop”,(“number”,”5″),”+”,(“number”,”0″))) == (“number”,”5″)

print optimize((“binop”,(“number”,”5″),”+”,(“binop”,(“number”,”5″),”*”,(“number”,”0″)))) == (“number”,”5″)

A fuller answer would replace return tree with return (a,op,b) at the end. The question only asked you to handle a+(5*0), but we would really like to be more general as well.

For more information:

Genetic Programming uses evolutionary algorithms inspired by biological notions to find computer programs that have certain properties. It is possible to use such evolutionary computation to automatically repair defects in programs.

Wrap Up:

  1. Lexing


    1. Regular Expressions
    2. Finite State Machines
  2. Parsing


    1. Context Free Grammars
    2. Dynamic Programming / Parse States
  3. Optimizing


    1. Must retain meaning
  4. Interpreting


    1. Walking the Abstract-Syntax Tree recursively


      1. JavaScript interpreter calls the HTML interpreter, and then that calls the Graphics interpreter
  5. Debugging


    1. Gain confidence

Leave a Reply