Python, Udacity Class

Harvard Online Class: CS50 Introduction to Computer Science

Below are my notes from when I originally took the Harvard Online course CS50 Introduction to Computer Science. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

5Reminder for hardcoding:

Hardcoding with integers will set them as that data type in C, so either put in one decimal or declare type.

Float: 32 bits

Double: 64 bits

\* GetFloat is from cs50 and not normally available

Other ways to create the same effect *\

#include <cs50.h>

#include <stdio.h>


main(int argoc, char *argv[])


printf(“Give me temp in F: “)

float f = GetFloat();

float c = (5.0/9.0) * (f-32);

printf(“%.1f in F is %.1f in C!\n”,f,c);


./beer1 | more (program to show you one space at a time to show output)

Error checking:

If n<1


Printf(“That makes no sense/n.”);

return 1; (return 0 means it’s error-free unless something else is specified as return value, “1” is now this error-code type,


Function in C (put under main )


chorus(int b)

string s1, s2;

s1 = (b == 1) ? “bool

while (n) so long as we’ve required this number not be negative

chorus(n–); takes n– into account after iteration, –n takes into account before iteration


swamp(int a, int b)


Int tmp;

tmp = a;

a = b;

b = tmp;


Void increment() ; creating a new function, must be declared globally if written later in the cod

OR, can create .h file to store it in

Conditons – using (( )) will remove error messages

&& = and

// = or


Main(int argc, char * argv[])


Printf(“Insert integer.\n”)

Int n = GetInt()

If (n > 0)

printf(“The number is positive.\n”);

else if (n==0)

printf(“The number is 0.\n”);


printf(“The number is negative.\n”);


Switch (n)


case 1: case2: case3:

printf(“You picked a small number.\n”)


case 4: case 5: case 6:

printf(“You picked a bigger number.\n”)



printf(“You picked a different number.\n”)


while (i <= 100)


printf(“\rPercent complete: %d%%”,i);





for (initializations; condition; updates)



fflush is used to force the buffer to execute without waiting for /n (will execute screen-buffer when it sees that, so using /r will need to specify)

Sleep = puts this thread on hold for x seconds (specified at 1)

i++ used in place of i = i + 1 during code (or i–)

%% and // are used in strings to actually show them

\r is a carriage return (go to beginning of same line \r\n is Windows, \n for linux does same

for (int i = 0; i <= 100; i++)


printf(“\rPercent complete: %3d%%”,i);



%3d%% = formats variable “i” above as digit data type, 3 width, and a % at the end

bool was added in by cs50


main(int argc, char *agrv[])

int n;

bool thankful = false;



printf(“Give me a positive number./n”);

if ((n = getint()) > 0)

thankful= true;


while (!thankful);

printf(“Thanks for the positive integer!\n”)


Rot13 – rotating to crypt information by 13 letters (incyphering / crypting)

Use { } to enclose things like FOR statements to keep them executing together


Can use 65 as “A” if defined as character, same thing as INT(“A”) would be 65 (lower case alphabet starts after):

printf(“%c : %d”, (char) i, i)


returnvalue name (variablestopass)

local variables “overshadow” global variables of same name

x = increment(x) this “x” is a copy (int as “a” in function below)


increment(int a)


return a+1;


Example: If statement and other option:

string s1;

If (i== 1)

s1 = “bottle”;


s1 = “bottles”;

Same as:

s1 = (i == 1) ? “bottle” : “bottles”

For above, s1 will get value before or after colon depending on whether or not i==1 is true : false

Float point values will never be “precise”, so move decimal point over to try and avoid the weird float problem that can occur:

Example: 3.41

Float returns: 3.4100000853068847656

Workaround: if using for dollar and change, then multiply by 100, work with value, then divide by 100 to get decimal back in correct place

gcc demo.c – lcs50 -lm –o demo links in a library that has other functions


#include <stdio.h>

#include <math.h>

#include <cs50.h>

#define QUIZZES 7 this will define the number of “spaces” to make for array


main(int argc, char * argv[])


float grades[QUIZZES], sum;

int average, i;

sum = 0

printf(“\nWhat were your quiz scores?\n\n”);

for (i = 0; i < QUIZZES; i++);


printf(“Quiz #%d of %d: \n”,i+1,QUIZZES);

grades[i] = GetInt();

sum += grades[i]


average = (int) round(sum / QUIZZES);

printf(“\nYour Average is: %d\n\n”,average)


Apparently you can access other memory through arrays in the RAM

Segmentation fault (core dumped) overstepped the memory allotted to the file (core dump shows what was going on in the memory when the program failed)

String is just an array that holds each individual character, \o is NULL character to delimit separate strings

if s != NULL


for (i=0; n = strlen(s); i < n; i++) setting n to equal strlen(s) so it doesn’t have to keep checking, spending 32 bits to save time






Flipping from lower to upper case (32 is the difference normally though)

If (s[i] >= ‘a’ && s[i] <= ‘z’)

printgf(“%c”,s[i] – (‘a’ – ‘A’);



man 3 printf (type this in LINUX environment to see manual chapter 3. Or go to website that probably has better instructions anyway)

int main(int argc, char *argv[]);

argc is # of commands typed in for command

argv is a vector that is created when typing in at command line

argv[i][j] 2-dimensional array

when running programs, you ask for memory and don’t give it back

Enigma machine encodes Cryptography

Plaintext encryption ciphertext decryption plaintext

Caesar Cipher: Ci = (pi + k) % 26

Instead of 26, can figure out set “key” to map it/rotate (French, Vigonaire)

26^x, where x is the length of the word used to crypt

Digital Encryption Standard (DES):

Break up input into left and right, perform calculations, move around the bits, and repeat 16 times, total “Keys” become 72 quadrillion

Triple DES: same algorithm ran 3 times

Advanced Encryption Standard is new one.

Asymmetric Cryptography is also used, public and private Key, public Key can only be cracked by private Key

Plaintext encryption (using public key) ciphertext decryption (using private key) plaintext

Basically use all this to create HUUUGE prime numbers and multiply them together to create a larger number

Running Time T(n)

Parallel Processing (Divide and Conquer)

-worst: Big “O”

-best: Omega

-coincidence of both upper and lower bound: Theta

O(1) “constant” always take same amount of time. “Hello, world”

O(log n) “logarithmic” split in middle to reduce run time

O(n) “linear” i++ going through all available options, so running time would be “n”

O(n log n) “supralinear”

O(n^2) “quadratic

O(n^c) “polynomial”

O(c^n) “exponential”

O(n!) “factorial” checks ALL possible solutions…BAD

Linear pseudo-code:

On input n:

For each element i:

If i==n:

Return true.

Return true.

Top of program, set return value, name(declarations) to find functions if located later in program

int sigma(int) note, does NOT need to say what the variables name is though

later in code:


sigma(int m)

{ }

Each time you declare a variable, if inside curly braces {}, will only exist inside



sigma(int m)


// base case

If (m <= 0)

return 0;

// recursive case


return (m + sigma(m-1)); calling itself again, before being able to answer needs to “answer” counting down—similar to magic stack


-Each time function is called, you need to create a section of memory to add to “stack”, then need to “tear it down” after it’s done, so may not always be the best option

-Better for readability (elegant) than looping through to get answer, BUT will run out of memory in the “stack”

-tail-recursive calls (happens at end of code) smart compilers will prevent stacking to occur

int answer = (m(m+1))/2 best solution since you don’t need a function

in C there is NO good way to tell how big the array is (need to “remember” what it is)

-sort elements in order to make log n happen

-to sort by looping through and putting in order is n^2 (since you need to go find 1, then find next smallest, then again, etc)

Instead try:

-(Merge sorting): finding middle value and split in half (n steps to find middle value)

Linear search: n Big O of n or omega of n are running times due to best case being one searching for is in 1, and worst is n

Binary search: widdle down upper and lower bounds of problem (log n)

Needs to be sorted or have some sort of logic to help reduce (COST of massaging data first)

Hacking: use dictionary available with 400,000 words top-bottom left to right alphabetically

Bubble sort: swap if pairs are out of order
worst case is n^2 running time

Don’t need to store variable, just need to keep track of iteration OR just stop when you get through everything without needing to swap using a local variable, find out where swaps are taking place and make upper and lower bound set appropriately

Selection sort: look for smallest each time and move to correct spot (swap using temporary variable)

Best case is always n^2 since you need to go through n times

Merge Sort: Recursive

Sort left half

sort right half

merge sorted halves

38 27 43 3 9 82 10

38 27 43 3 9 82 10

38 27 43 3 9 82 10

38 27 3 43

Merge is not a swap, you use extra memory to create a new array while you are merging them

Compare two lists when merging, incrementing lists only when they are greater than the other integer being pointed at (that’s how it’s merged)

Merge sort: n log n

n times log n (in order to merge you need to touch each one, which is “n”, but you are splitting each time which is log n)

On input of n elements;

If n < 2 return;


Sort left half;

Sort right half;


Heap, Insertion, Quicksort, Radix, Shell – all other types of sorts

Libraries hold these different kinds….

-std=c99 is use most recent C

-Wall will show all errors

-gdb – better debugger that works well with gcc compiler

gdb “mode” will allow debugging, can put in “breaks” at main (whatever line it is at)

use next to move to next line

list to show code before and after line

print will show you what is in variable named after

continue will let run until completion

display will show you code and what it is holding

Passing Variables: used by reference (pointer)

Instead of telling what the function variables ARE, tell them WHERE they are (so instead of ByVal, use ByRef / Target / Address)

*Is a pointer SO

Declaring function: swap(int *a, int *b) saying points to address/location in RAM

Calling function: swap(&x, &y) ampersand gives address of operator, INSTEAD of value

Low-level details like this are hidden in Java and PhP, security is increased, but control is decreased

C and C++ take advantage of functions that can go to ANY address in memory


int tmp;

tmp = *a

*a = *b;

*b = tmp;


Type mismatch error if you forget * because a is obviously an address and NOT an int, but using *a will store the INT found there

So tmp equals what a is pointing at, and where a is pointing at equals what b is pointing at, and finally what b is pointing at equals what was stored in tmp

Pointers just relate to what is stored in RAM (where), declare the TYPE the pointer is pointing TO

So if int i, then int *p if using to point at i

int i, j;

int *p;

p = &i; storing in p the address of i (REMEMBER, NEED &)

*p = 5;

So value at the address p should become 5 (take p, follow the arrow, and put 5 in there)


main(argc, char *argv[])

an array of pointers to char

when you declare an array, you declare an address with a certain length that is defined, SO you must remember where the end of the array is

Array a[5]

*p = &a;

Would be pointing at a[0]

char *s1 (strings do not exist in C, must use pointers to view the string entered)

char *s1 = GetString() returns a pointer to the first byte of the string array entered

Do while s1 != NULL && s2 !=NULL loop, if you exit then message that they are not equal


printf(“You typed the same thing!”)

int strcmp(const char *s1, const char *s2) const is “constant” to make sure you don’t alter what is actually being pointed at (promising not changing variable)

will return 0 if strings are equal, less than 0 if s1 is less than s2, greater than 0 if s1 is greater than s2

! is used to invert what is returned from a function

char *s = GetString();

for(int i = 0, n = strlen(s); i < n; i++)

printf(“%c\n”,*(s+i)); this takes where “s” is pointing at and adding 1 to return the characters in array, same as if an array and not pointer could use s[i]

trick to figure out size of array:

if you declare an array in a function you can use sizeof(array) which returns the size of array in bytes that can be used to figure out length

sizeof(array[0]) gives the size of an individual element

int is 4 bytes each (or 32 bits, 8 0/1 per byte)

Dynamic Memory Allocation:

Stack – grows upward where you use more memory when calling each function, parameters, ect, when you are done using the memory (local variables) they are given up, variables disappear, freed up to use again

Heap – grows downward, allocating more memory dynamically, ask for more memory from stack but must “free” them to give back

//free string

free(s); where “s” is a variable storing a string that asked for dynamic amount of memory

toupper function to change to uppercase

copy a string:

char *s2 = malloc((strlen(s1) + 1) * sizeof(char) ) manually allocate (malloc), +1 to make room for NULL at end

if (s2 == NULL)

return 1; make sure you were handed something or else you will do something bad to memory you were not actually handed

Memory Management:


(All programs)

Initialized data

Global variables/parameters

Uninitialized data


Environment variables

Heap and Stack breakdown:


Allocated string i

Allocated string j
If this collides, core will be dumped
Function local variables
Function parameters (32 bit return address of function that called this)
Main local variables
Main parameters


If you overwrite memory you can overwrite the return address and veer the program in a different direction

Virtual Memory – using harddisk space as if it was RAM

-takes things in RAM and temporarily moves to harddrive if running out of RAM

-“paged to disc” is this process, takes longer to read because it needs to spin to read/use programs pages to disc

char *s2 = malloc((strlen(s1) + 1) * sizeof(s1)) THIS IS JUST GIVING MEMORY, NOT STORING ANYTHING


return 1;

//copy string

int n = strlen(s1);

for (int i = 0; i <= n;i++)

s2[i] = s1[i];

//change copy

If (strlen(s2) > 0)

s2 [0] = toupper(s2[0]); capitalizes first letter of string s2 (which is s2[0])

//Print output

printf(“The original value: %c\n”,s1);

printf(“The new value: %c\n”,s2);

//free memory

free(s1); since GetStr() allocates memory


using #foo.c macro (sharp defined) will copy and paste each time it’s found when compiling, which means it will define same variables, so there is a trick to find out if it’s defined or not

#include <stdbool.h> library that contains functions

enum { false = 0, true, meaningless }; used to put numbers to symbols to keep in order easily

typedef char *string; defining new type, can be simple like this, or more complex data structures

purpose of h file is to just make sure declarations are included and explain what it does, but protects from seeing code

double returns larger float

long returns larger int

return INT_MAX (this is in a library to specify maximum for allowable INT)

sscanf takes string and replaces things, as opposed to printf that does opposite

int n; char c;

if sscanf(line,” %d %c”, &n, &c) == 1) meaning this was ever found, using point in order to let scan change the values, otherwise removing “&” will pass a copy of the variable

sscanf will only assign what is found

equaling “1” is saying that the FIRST input detected (at the pointer to capture everything that was entered) was successfully assigned (I think…look up sscanf if necessary)

isdigit() says whether or not a digit

%lf is double

%f is float

%d is decimal

%c is char

%lld is long long decimal

unsigned int gives you an extra bit, which gives you twice as many INTs that are non-negative

stdin = stuff in the keyboard

stdout = stuff going to screen

fprintf(stdout,”hello, world”) send to screen

fprintf(stderr,”hello, world”) send to error log if you change stderr to log (default is screen)




// growable buffer for char

char *buffer = NULL;

// capacity of buffer

Unsigned int capacity = 0;

//number of chars actually

Unsigned int n = 0;

// character read or EOF

int c;

// iteratively get chars from standard input

while c = fgetc(stdin)) ! = ‘\n’ && c != ‘EOF’ EOF “end of file” is passed when end of string is reached


// grow buffer if necessary

If (n + 1 > capacity)


// determine new capcity

if (capacity == 0)

capacity = CAPACITY; this was set to 128 above

else if (capacity <= (UINT_MAX /2 )) UINT_MAX set to be 4 billion, making sure before you double size that you are able to assign that much

capacity *=2; doubles capacity, fewer times asking the better off






// extend buffer’s capacity

string tem = realloc(buffer, capacity & sizeof(char)) realloc will find other space and extend instead of copying and moving and freeing other space

if temp == NULL)



return NULL;


buffer = temp;


// append current character to buffer

buffer[n++] = c;

} end of while loop section, increases n, appends buffer and returns to top

// return NULL if user provided no input

if (n == 0 && c == EOF)

return NULL;

// minimize buffer

char *minimal = malloc((n+1) * sizeof(char)) making new variable with smaller memory



// terminate string (add end of line thing)

minimal[n] = ‘\0’;

// return string

return minimal;


char buffer[1024]; using an explicit array to give set amount of memory, instead of using a pointer and malloc to allocate the memory later

%d = 32 bits

%c = 8 bits

fgetc (file get character)

Dangerous Functions

gets (doesn’t check buffer size)


strcopy (does not check and make sure destination exists or see if it is long enough, will give segment fault error) strncpy (takes another variable of length, string copy)

strcat strncat (same as strncpy—string catenate)

Generally do not check size of buffers and length of arrays normally in older code

Script kitties google and get code to randomly find IP addresses and try to log on and get username and passwords, trying to get in thousands of times

Main(int argc, char *arg[v])

so you type main David in command line

argc = 5


Pearl script running program with this script called first

$arg = “1234” . “\xb8\xf5\xff\xbf” . “\x94\x87\x04\x08”;

print gotcha shows where in text it is being stored eco if you can crash making seg fault then there is a bug and a potential opportunity to do something bad (“Buffer Overflow” exploit)

Possible that things in memory will not be laid out in the same way, compilers can shift things around somewhat randomly, can also implant canaries and terminate programs if certain variables change, otherwise can be deterministic and make things be set up the same each time and keep the same addresses in memory

Typedef struct


int id;

char *name;

char *house


student; name of data type

// declare class (not java terminology, just literally the student’s class for example)

student class[STUDENTS]; array that makes for each student an id, name and house

class[i].id = GentInt();

class[i].name = GetString();

class[i].house= GetString();

–uses array[i].variable to figure out what part of array holds this information

//save to disk

FILE *fp = fopen(“database”,”w”); *fp (file pointer), file open, database(name of file), w(write)/r(read)/p(append)

if (fp != NULL) check and make sure you can write there

fprintf(fp, “%d\n”),class[id].id); printing to stdout that was defined as fp (pointing at FILE)


file format is deciding the convention of how it will be laid out (81 INTs, 81 INTs) or (id, class, house, id, class, house)

jpeg format starts with a certain pattern of bits, can analyze and find that pattern and can read in until you find the next pattern, might not completely work but is an artform and is certainly a place to start

bitmap is not compressed, from left to right, top to bottom, each bit represents one pixel

jpeg is compressed, multiple pixels shoved into one bit

resize program that will enlarge the pixels by a factor

took pictures, formatted card before removing, put sd card into mac, ran “dd” that allowed to suck the bits off the card and dumped copy into file

need to figure out jpeg signature

using struct to create one big array so you don’t need to make sure each array lines up at the correct [i] in individual arrays

problem with arrays:

need to preallocate memory with arrays (could end up with too little or too much memory) if hardcoded

OR need to use malloc and check for conditions to need to give more memory

If google has an array of ALL websites, would take forever to search if going through each line (even if merge sorting using n log n and then using log n)

Hashtables – take linked lists, rip them apart, and recreate in something better

System calls (functions) that talk to operating system, involves a lot of steps to shuffle around, find memory, and then give it to program and continue

Fragmentation can occur from malloc if other programs are running and from own code that might be using space next to where currently being used

Can link together different sections by including pointer/location at end of each

Typedef struct _node conventional name to allow link to occur


Int n;

Struct _node *next;



So inside of each struct is 2 things, an int called n and a pointer called next (int is 32 bits, pointers are 32 bits)

struct array[5]

array[1].n = 5;

array[0].next = 0328323; symbolizing the location of the previous one

grep = find utility (ctrl+f)

cat votes.txt | grep obama shows rows in file w/ obama

cat votes.txt | grep obama | wc –l counts # of rows in file w/ obama


useful for memory related bugs

C won’t seg fault if you overstep your memory boundaries (for instance, going out of an array’s determine range), but Valgrind will help catch that


Base 16 (base 2 = binary, base 10 = decimal normal counting)

0 1 2 3 4 5 6 7 8 9 A B C D E F

Hex: 1 = 0x01

Binary 1 = 00000001

Hex 15 = 0xF

0xFF is 255 or 11111111 in binary (first “F” is 1111, second “F” is 1111)

Convenient way to represent a byte, straight forward mapping to binary, condenses code


Register (32 bit can only hold that amount at once): 0A0B0C0D

Memory, register stored

Big-endian uses big-end first

a: 0A

a+1: 0B

a+2 0C

a+3: 0D

Little-endian uses big-end first

a: 0D

a+1: 0C

a+2 0B

a+3: 0A

bpm, header is 54 bytes, padding in multiples of 4, 58 is smallest possible that is an image

fseek(fp,2,seek_set) seek_set is special constant that shows start of file, skipping 2 bytes because those are the file type and the next byte holds the file size for bitmap structure

unsigned long 4 bytes, long long is 8 bytes

rewind(fp) rewinds to beginning of file

Bitwise Operators, instead of looking at thing as a “whole”, this goes bit by bit:

& bitwise AND

| bitwise OR


XOR (^) exclusive or, meaning they are different (<>)

Ones complement (~) flipping bits, reversing 0/1

Shift (<< left shift, >> right shift): 0001, left shift, 0010 (pads gaps with 0’s), multiplication/exponential by power of 2

Use shifting to move bits around in a byte to look at bits inside

If (buffer[i] & mask) implies that a match was found between the shifted value and the original

-use of printing without flushing, means you can loop and add to buffer of things to print

-bool is a byte, so can go bit by bit and make changes rather than find out if something is true/false and change entire thing

-A and a in 32’s column are different, otherwise identical (so can make the switch there to flip upper/lowercase)

Do not need temporary variable to swap values of two variables:

Bitwise Operators, can gradually swap them using XOR

Hash Tables

Array: fixed size, pain to grow and waste cpu cycles to create new buffers/swap/free, random-access available by indexing

Lists: linked lists, allowed to grow dynamically, wasteful of memory for having to store Mehta-data (pointers to next spot), give up random-access since you have to go by ‘n’

Both have linear time insertions/deletions/lookups

Hash Tables – linear probing

MOD is used to modify a variable

So try at MOD place then go left-to-right/n/linear to find blank space, could end up at end of list though

-Collision possible of same data? (birthday problem, sharing the same birthday), VERY likely to happen, need to plan for it

Coalesced chaining

-Give extra memory to store “breadcrumb” trail to show where other value end up so it can skip over there and try

Separate Chaining

-let it go where it wants, but maintain a separate linked list, O(n/m) running time (where n is the number of words and m is the size of the array)

Tree that all values hold arrays is a Tries

Used fixed size array that stores something along with metadata of pointers of linked lists

Typedef struct node


Char *name; must malloc things, free

Struct node *next;





Char name[1024]; overcommitting space, quicker, don’t have to malloc

Struct node *next;



Typedef struct student mixing data and Mehta-data, also only need name at the top IF using as a reference OUTSIDE of where being defined


Char *name;

Int id;

Char *house;

Struct node *next;



Typedeft struct


Int id;

Char *name;

Char *house;



typedef struct node


student *student; keeping linked list separate from pointer, points to list

struct node *next;



ptr->student->name :points at student-pointer that points at name value

Hash Table

#define SIZE 6


Hash(int id)


Return(id % SIZE); returns ID num MOD 6


#define SIZE 6


hash(char *s)


If (s==NULL)

return -1; signifies error and impossible location

char c = s[0];

c = toupper(c); puts lower/upper in same playing field

c -= ‘A’; subtracts value of capital A to get value between 0-25

return c % SIZE; mods the integer by SIZE defined above


Problem: going to get clustering since based on first letter of first name (not common Q)

More clever to hash on: birthday, use multiple letters

This will give a better distribution, but obviously alphabetical order/easily identifiable structure is lost


Int tally = 0;

For (int I = 0, n= strlen(s); i<n; i++)

tally += s[i];

// this point in the story put in the rest of function above

In order to insert into the hash table, just insert in the front of the linked list

Children, parent, siblings (same level as other children), leaves (all the way at the bottom)

Binary Search Tree (each only have 2 nodes to point to)

-each node only has 2 (or fewer) children nodes

-parent node is larger than left child and smaller than right child

Start at the root, run time could be log(n) or might end up being just a linked list O(n)

AVL, Red/Black, 2/3 Trees (refine definitons of insertion/deletion, instead rearrange the root to prevent creation of just long linked list) instead of just normal BST

Tries: a tree, each node is an array (generally size of 26) the point being to store a dictionary of words – time is O(k) (where k is the word length)

-root node is array of size 26, each element/node is a pointer to another array, hash on the first letter of a word (M) put in the 13th or array(12) spot, then spell rest of the word in the array it’s being pointed at and put an end character (NULL, DELTA, etc) to show end of word, if it finds a place where characters in the word stop sharing, points to a new place where more letters can be found

(can be a waste of memory because using array with 26 other arrays and so forth, especially if not going to be using words with q’s in them) trade-off: waste space to get constant-time and random-access processing

Heaps: a heap is a binary tree that

  1. It is complete (if you draw the tree from top to bottom and left to right, every node is complete except maybe the most bottom-right)
  2. If each nodes value is greater than or equal to its two children (no distinction between left/right like BST)

Heapify: take something that’s an almost heap, and swap the branches that need to be changed

-need to update child and parent pointers to do this

Heapifying a Binary Tree:

-take smallest tree on bottom-left and heapify, then go up one level and heapify, then move to right, etc

-make parent be switched with largest of the children

-need to consider entire tree at a single time to switch things around, so when you move something down a level, compare to the children, if swap occurs, go down next level and compare to bottom until at the leaves

-since they are complete, the height of a complete binary tree is log(n), so run time is n log(n) (at longest)

-Problem with merge sort because it needs memory to store separate array while rearranging

-heap sort can sort in place by switching around their pointers

-Means that the root is the biggest element, take and put it off to the side, repeat the heapify, take root and put it to the side, repeat….until there are none remaining

n log(n) for set-up, n log(n) for fixing, so run time is n log(n) (though we’d say 2n log(n), but doesn’t use more memory and so doesn’t matter)

-to set up tree, put numbers into array

Right child: 2i +1

Left child: 2i + 2

Can use binary trees to implement an algorithm to compress files Huffman Coding

-have text file (ASCII) and each takes up 1 byte on disc

-can use less than 8 bits to store information since letters share similar structure/redundancies

-count up how many a’s, b’s, c’s, punctuation, etc

-give nodes for each letter, count frequencies that each is used

-iteratively, combine as siblings the two smallest nodes as siblings, parent is sum of frequencies

-then take that parent and add it to the next smallest leaf and create a new parent of the sum

-left-children are 0, right-children are 1, so B would be 0000 /(least frequent), E=1 (most frequent)

-idea being that you spend less bits on representing most frequent characters and more on less frequent ones

What is the internet?

-network of networks

-LAN connected locally

-connected via ISP (internet service providers) to other ISPs

-the infrastructure on top of which services like email, web, etc are run

TCP/IP TCP: transmission control protocol, IP: internet protocol

-IP address is a unique identifier (0 – 255), that is the technological equivalent of a postal address

-Routers on the internet know which computers live where (IP Address, Direction) so that’s where it sends the information, just route data from one place to another

-some websites have multiple IP addresses, have words like to represent numeric IP addresses so people can remember these things

nslookup name server lookup, gives you IP addresses

tracerout row by row of communication between cnn and own server and routers in between

HTTP = hypertext transfer protocol

-can click on links and go to other web pages (language a web service and a web browser speak when talking with each other)

-language/protocol: set of rules 2 computers have agreed upon so one understands the other

Web browser sends this message: “GET /index.html HTTP/1.0” get, index, version of html wants to speak

Web server responds with a sequence of bytes that represent the /index.html file requested

-most in .html, .php is another

html is the markup code that make the aesthetics of the webpage come to life

HTML: application-layer protocol

HTTP: figures out requests/responses of what wanted

TCP/IP language or protocol used by computers to speak to other computers to ask B from A

Source code: HTML, Java, etc that make up page

Top-Level Domains (TLDs) .biz, .tv

DNS Records: Converts name to IP address


mkdir public_html

web page is just a text file filled with stuff (and that stuff is html)

<> tags, any time you open a tag you must close it symmetrically

2 main parts: head and body



<title>My Ugly Webpage</title>



My Ugly Webpage



Standard Error Codes:

403: Forbidden

404: File Not Found

  1. D (directory) L (link)

2 – 4 owner permission: RW is read/write

5 – 7 group permission: R00 (read only)

8 – 10 world: R00 (read only)

chmod 644 index.html

Octal base, 8 possible values 0 – 7 (3 bits)

4 is readability

2 is write-ability

1 is exacutability

So 644 would be 6 for the first group (4+2 = rw0), 4 for group is r, 4 for world is r

-ls –al : shows all permissions for directory and files

Directory needs to be executable

chmod Standards:

PHP files:: 600

Directories: 711 (to access contents), 755 (to see contents)

Everything Else (XHTML, CSS, GIF, JS , JPEG,…): 644

chmod 755 . (The . represents the current directory)



<title>My Ugly Webpage</title>


<body bgcolor=”black”> (attribute added)


<div style=”color: white: font-size: 72pt:”> division of the page and can give it style (like a paragraph)

<h1> header 1 tag

My Ugly Webpage



<img src=”manalrouge.jgp” /> img should end tag in same (need to give jpg chmod permission 644)

<br /> line break tag, / is to start and end tag

My Second Ugly Line




Need to provide browser with a hint of the version of HTML being used at top of HTML file

-for it to be valid, needs to adhere to certain specifications that the world has decided upon

-can let computer check the validity and show where problem is

<element attribute=”value”/>

<element attribute=”value”></element>

Format: Html, head, body (real “magic” happens in body tag)

HTML is not a programming language, it’s a mark-up language. No logical structures, just open and close tags and tells it aesthetically and structurally.

More sophisticated ways to make better web pages

<table border=”1”> 1 is borders, 0 is none

<tr> table row


<td>House</td> table data








Exercise more control over web page appearance than HTML

Font sizes look different in different browsers in HTML, no standardization. CSS is more specific.

<style type=”text/css”>

<!–no one will see this



Background: yellow:

Border: 20px #000000 solid;


–> comment tag, used for “legacy reasons” to let people know the intent of the design, #000000 is the hex code for black

Web browsers render differently with where they put the body, etc cross-browser issues (Chrome, Firefox, Opera, Explorer, etc) allow you to see screenshot of different browsers to see how it looks when loaded on it

Implementing columns (Structural), tutorials and other people’s sites are useful for learning


Anything you fill out that has text boxes, radio buttons, submit, etc

-noob mistake is to make changes to text file opened and think it’d change. Real data exists on server, text file is just copy

-source code for Google is compact, compressed pages to save bytes of data because it saves a lot of money for heavy traffic

-<form> stuffinbetween </form> in google, represents the form itself

<form action=”/search” name=f><table cellpadding=0 cellpsacing=0> etc, etc

Do not use “” around attribute values, means they are using HTML NOT XHTML (to save bytes on “”)

<form action=”/search” name=f> meaning there is a reference to “/search” and what that would do on its current server

<form action= name=f> switch to specify that /search is on google’s server

<img src=”fake.gif” /> uploaded file on server of fake.gif google image

Languages like PHP and databases like MySQL create Web2.0 where it’s driven by the users (not just the blinking webpages like before…for example, hamster dance)

<form action=”/search” name=f> notice that /search is not a. html or webfile, but is instead a program written in PhP, Python, Pearl to run the actual search, name is the name of the variable

much of facebook is written in PhP

when browser and server speak: HTTP

written in: HTML

Firefox has plug-ins that help make webpages (useful for development)

Live HTTP headers plug in, shows the HTTP traffic going back and forth when moving between pages

Anything after ? in a url is a parameter, so variable=value&variable=value (variables set and defined in HTML file)

<input name=”btnG” type=”submit” value=”Google Search” />

“q” in the url for Google is “query”, or whatever you set in the submit field

+ represents a space

Reasonable limit for length of URL around a few hundred characters in case cannot handle buffer

<form action=”/search” name=f method=”get”> default

method = “post” used when you want to submit a lot of data


Text Fields

<input name=”email” type=”text” />

Text Areas

<textarea name=”comments” rows=”24” cols=”80”>

Password Fields

<input name=”password” type=”password” />


<input name=”save” type=”checkbox” />

Radio Buttons

<input name=”gender” type=”radio” value=”F” />

<input name=”gender” type=”radio” value=”M” />

Drop-Down Menus

<select name=”dorm”>

<option value=””></option>

<option value=”Mathews”></option>

<option value=”Weld”></option>


HTML file:


<form action=”register.php” method=”get”> that is the action that will happen when submit occurs. PhP is an interpretive language (instead of being compiled like C, it is ran through an interpreter and runs the source code itself)

Name: <input name=”name” size=”40” />

<br />

Dorm: <input name=”dorm” size=”40” />

<br />

Email: <input name=”email” size=”40” />

<input type=”submit “ value=”Register!” /> this is an input type of “submit” that creates a button called “Register!” and executes the form action “register.php”





Array Functions:

-Can sort, iterate, intersect, splice, reverse, add, subtract, pretty much everything available (easier than C)

-For, while, do while, foreach

Register.php, don’t need to chmod PhP code that way you can protect your intellectual property



Hello, <? print($_GET[“name”]); ?>. You live in <? print($_GET[“dorm”]); ?>. Your email is <a href=”mail to: <? print($_GET[“email”]); ?>.”> <? print($_GET[“email”]); ?></a>



Can shorten to <? …. ?>

<a href=>Link to CNN </a> a hyperlink reference

Super Global Variables:




$_GET will have any parameters user provides

$_POST will have any parameters user provides




CAN INTERMINGLE HTML WITH PHP, since it’s associative you use the variable name you gave to look up the information that is super global and already stored


print(“<pre>”) will print in certain format

print_r “$_GET”; prints recursively


Associative Array (maps not numbers to words, but words to words) or Hash Table

PhP automatically creates this for you instead of having to create it yourself

<? for ($i = 0; $i < 5; $i++)


print(‘<img alt=”” src=”malanrouge.jpg” />);




Will print image malanrouge 5 times with a space in between for source code purposes


-Server is capable of remembering if logged in

-IP addresses are NOT static, webservers need to plant reminders on your hardware called COOKIES to remember who you are

-cookie tiny piece of memory implanted in browser to remind itself later who you are

Cookie: PHP can plant cookie using function called PHPSESSID=reallybigrandomnumberwithcharacters

-webbrowser holds the cookie, webserver remembers ids/settings/etc and corresponds that to that cookie for “quick access” (hand stamp metaphor)

#ffffff = white


<? print_r($_SERVER); ?>


$_SERVER -> ip address, pages requested, tons of stuff in super global variable



//start session


//connect to database

If(($connection = mysql_connect(“localhost”, “malan_real”, “12345”)) === FALSE

Dia(“Could not connect to database”);

//select database

If(mysql_select_db(“malan_real”,$connection) === FALSE)

Dia(“Could not select database”);

//were this not a demo, these would be in some database



//if username and password were submitted, check them

If (isset($_POST[“user”]) && isset($_POST[“pass”])


//if username and password are valid, log user in

If ($_POST[“user”] == USER && $_POST[“pass”] == PASS


//remember that user’s logged in

$_SESSION[“authenticated”] = TRUE;

//redirect user to home page, using abslute path, per


$host = $_SERVER[“HTTP_HOST”];

$path = rtrim(dirname($_SERVER[“PHP_SELF”]), “/\\”);

header(“Location: http://$host$path/home.php”);





<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional/………….”>

<html xmlns=”http:.//”>


<title> Log In</title>



<? If (count($_POST) > 0) echo “INVALID LOGIN”; ?>

<form action=” <? echo $_SERVER[“PHP_SELF”]; ?>“ method=”GET”> getting submitted to itself

Insert all the form that creates username and password


MySQL: Database Server

-Relational Database: store data in them by way of tables (rows and columns)

-examples: Sequel, Oracle, Access

Data types for MySQL: VARCHAR (variable char) = string

-concept of “keys”

-primary key, one of them has to be unique, id number or username, etc

-whatever one will uniquely identify rows in a table

-can also specify others to be unique, but primary key indexes created are the ones to be optimized to help make searches easy (creating best trees/tries/etc for searching)


Select, insert, update, delete

SELECT * FROM users WHERE user=’jharvard’


//start session


//connect to database

If(($connection = mysql_connect(“localhost”, “malan_real”, “12345”)) === FALSE

Die(“Could not connect to database”);

//select database

If(mysql_select_db(“malan_real”,$connection) === FALSE)

Die(“Could not select database”);

//if username and password were submitted, check them

If (isset($_POST[“user”]) && isset($_POST[“pass”])


//prepare SQL

$sql = springf(“SELECT * FROM users WHERE user=’%s’”, mysql_real_escape_string($POST(“user”)));

//execute query

$result = mysql_query($sql);

If($result === FALSE)

die(“Could not query database”);

//check whether we found a row

If(myql_num_rows($result) == 1)


//fetch row

$row = mysql_fetch_assoc($result);

//check password

If ($row[“pass”] == $_POST[“pass”]) checking if the row returned pass field is equal to what the user posted in the pass field defined


//remember that the user’s logged in

$_SESSION[“authenticated”] = TRUE;

//redirect user to home page, using abslute path, per


$host = $_SERVER[“HTTP_HOST”];

$path = rtrim(dirname($_SERVER[“PHP_SELF”]), “/\\”);

header(“Location: http://$host$path/home.php”);






<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional/………….”>

<html xmlns=”http:.//”>


<title> Log In</title>



<? If (count($_POST) > 0) echo “INVALID LOGIN”; ?>

<form action=” <? echo $_SERVER[“PHP_SELF”]; ?>“ method=”GET”> getting submitted to itself

Insert all the form that creates username and password


MySQL Functions


SELECT Employee.Name, Orders.Product

FROM Employees, Orders

JOIN Employees.Employee_Id = Orders.Employee_Id

Race Conditions:

Operation need to be atomic, meaning 2 processes need to happen at the same time or not at all

-could potentially give money from ATM and then not update database or vice versa

Atomic Queries:

INERT INTO tab le(a,b,c) VALUES (1,2,3)


UPDATE table SET c=c+1 WHERE a=1

MySQL support Transactions (InnoDB)

-write SQL statements and all that you want

-only once you write “commit” command, all happens together or not at all

-db is the engine you need if you want atomic sequences to occur

Stock Finance:

Executed an HTTP request to and queried the information for the stock price and returned the value into own website


$dictionary(strlower($word))=TRUE checking if word is in dictionary

$ = variable in PHP, otherwise very similar to C

-higher level language, so it automatically makes hash tables and allocates the memory, etc

At start of PHP file:

#!/usr/local/bin/php shebang (sharp + bang) hint to OS to what interpreter it should use to interpret the file, since it may not be compiled


RSS: (RDF SS or Resource Description Framework Sit Summary, often dubbed:Really Simple Syndication)

file format that is XML (extensible markup language)

text file that contains information you might want to syndicate

integrate other websites into own (like using NY Times info)

Querying Google Maps for News Article at given location:

  1. User goes to specific zip code
  2. Back end PHP uses zip code and converts to coordinates from SEQUEL database
  3. Uses that to query MSNBC news and get back RSS feed of news for that place
  4. JavaScript to put links on screen


Interpretive language, not executed by servers but by clients

Embed in files on server, shipped raw to user’s browser that needs to support

To embed in webpage, put this in head tag in HTML (called script tag)

<script >

// <![CDATA[

// ]]>


OR can store in separate file and just reference that file in the head tag (seen below), keeps it cleaner

<script src=”file.js” type=”text/javascript”> </script>

break, conts, continue, do while, for, for in, for each in, function, if else, return, switch, throw, try catch, var, while, with


focus() set focus like Access

<script type=”text/javascript”>

// <?[CDATA[

//get fields (set variables)

var username= document.getElementById(“username”);

var password= document.getElementById(“password”);

// put curser in username field if empty

If (username.value == “”) document is the webpage, forms is getting access to the form, login is the field and username is in there and set the value, very similar to access/vb



username.value = username.value; aesthetics reason to move cursor to right of already typed text since focus sets it to the beginning


//else put cursor in password field




password.value = password.value;


// ]]>

Define a function in JavaScript (put this in the header portion of HTML)

Does error-checking client-side

<script type=”text/javascript”>

// <?[CDATA[

function validate() checks if login fields are not blank, correct, etc


if ( =””)


alert(“Email is blank”);

return false;


else if (document.forms.registration.password.value =””)


alert(“Password is blank.”);

return false;


Return true;



Can mimic the behavior of browsers by sending HTTP requests

-doing error checking on client-side means that they can steal the source code, put it into text file, and change the process.php part and put in browser address in front

-then open that version and can run it

-will want to add additional error-checking in php file since that is not released freely

Eventhandlers in xhtml:

<form action=”process.php” method=”get” name=”registration” onsubmit =“return(validate(this);”> when submit portions runs, will do whatever is in “” meaning function validate will be ran (false will prevent anything from happening, true will continue to process)

By using “this” (which SHOULD NOT BE INCLUDED IN CODE BUT IS IMPLIED) you do not need to specify WHERE you are, because “THIS” is the location you are at (document.form.registration)

function validate(f) notice that document.form.registration has been called “f” and is used below


if ( =””)


alert(“Email is blank”);

return false;


else if (f.password.value =””)


alert(“Password is blank.”);

return false;


Else if (f.password1.value != f.password2.value) != is the same as <>


alert(“Make sure passwords match.”);

return false;


Else if (!f.agreement.checked) bang checked, meaning NOT checked


alert(“You must agree to our terms and conditions.”);

return false;


Return true;



Java is an object-oriented language like VB to group together like-objects (functions, forms, properties/variables – like disabled/enabled)

<br /> space in program

I agree to the terms and conditions: text

<input name=”agreement” onclick=”toggle();” type=”checkbox” /> terms/conds checkbox

<br /><br />

<input disabled=”disabled” name=”button” type=”submit” value=”Submit” /> submit button (disabled but probably enabled from toggle() function DEFINED IN HEAD TAG of page on checkbox, defaults to disabled)




Regular Expressions (validation of user data)\.edu$/) regular expression passed

. = any character

+ = one or more of whatever

@ = @ symbol

. = any character

+ – one or more of whatever

\. = a literal .

edu = literal “edu”

$ =syntax to make sure END of string matches that exactly

^ = at beginning means start matching at front of string

All of this is sandwiched between /^whatever$/ = this means entire string MUST match pattern exactly

Global Objects

Array, Boolean, Date, Function, Math, Number, Object, RegExp, String

-associated with each object is a bunch of functions

-don’t implement personal functions, use other functions made (like sort)


var obj=new Object();

var obj = {};

obj.key = value;

obj[“key”] = value;

var obj = { key: value }; {key : value}; {key : value};


var a = new Array();

var a = []; same as above

a[0] = “foo”

a[1] = “bar”

Vectors is an array that automatically resizes itself

a[a.length] = “foo”; grows array implicitly

Event Handlers:

onblur, onclick, onchange, onfocus, onkeydown, onkeyup, onload, onmousedown, onmouseout, onmouseover, onmouseup, onresize, onselect, onsubmit

AJAX used to instead reload new webpage, makes better user experience (like being able to drag/drop, or FB photos sliding), send fewer bits across

Ride coattails of Java success when Netscape launched this by calling it JavaScript and is completely different

Support for Objects is helpful, object in JavaScript allows you to associate a key with a value (like Associative Array from PHP/hash table)

CSS Properties (cascading style sheet properties, aesthetics of sheet)



-can be manipulated using JavaScript, CSS is like font name/font size/ font color

Attached this to the div element, need to put timeout/sleep functions by starting a thread

<div name =”blink()”


Function blinker()


var blinks = document.getElementsByName(“blink”); that way if multiple things have same name, will carry over all array

for(var i = 0; I < blinks.length; i++);


If (blinks[i].style.visibility == “visible”)

blinks[i].style.visibility = “hidden”;


blinks[i].style.visibility = “visible”;



SQL Injection Attacks

-put yourself at risk for people corrupting/deleting data

-want to hack into your server

mysql_queryl(sprintf(* SELECT uid FROM users

WHERE username=’%s’ AND password=’%s’ “,

$_POST[“username”], $_POST[“password”]));

Try to inject SQL into username and password fields malicious things (like update/delete)

Can put in 12345’ OR ‘1’ = ‘1’ into password field to get logged in

SHOULD BE (use this to check what users type in ALWAYS):

mysql_queryl(sprintf(* SELECT uid FROM users

WHERE username=’%s’ AND password=’%s’ “,



DOM – Document Object Model (how HTML looks in memory)

Simplest web page you could make:


“-//W3C/DTD XHTML 1.0 Transitional//EN”>

<html xmlns=””>


<title>My title</title>



<a href=””>My link</a>

<h1>My head</h1>



DOM: hierarchy based on html code from above

This DOM tree is built up in memory based on how a web browser sets, but now these are to be manipulated AFTER being built while in memory for client/user-side manipulation of webpage (AJAX)

Asynchronous JavaScript XML

So you don’t have to reload entire page, just grab additional content when needed

Web-based software and client-side software more interactive

Made possible by XMLHttpRequest object

LEFT: CLIENT, every second JavaScript function (1) is called, creates an XMLHttpRequest of HTTP parameters (2), sends HTTP Request (3), Server does stuff/queries/gets data and sends it back to user (4), client-side takes data and integrates it into the page by JavaScript calls (5)


XMLHttpRequest Object is standardized:

Methods: (like a library of functions, but in object-oriented programming functions are methods)

open(method, url)

open(method, url, async)

open(method, url, async, user)

open(method, url, async, user, password)







What this XMLHttpRequest Object looks like in memory

Be rendered first time, but can make subsequent queries without reloading webpage because it’s the webpage itself that is making a new HTTP query every x seconds for just specific information to be updated

Pre-fetch data in order to give seamless transactions for user interface (cost: uses more RAM memory-client’s memory, and costs bandwidth—every byte change costs money so limit number of bytes sent)

Using AJAX increases load on server, since it hits it every second

Properties available to be checked about XMLHttpRequest

<form onsubmit-“quote() return false’”> will run quote() function define in header and return false to prevent user from submitting HTTP request

function quote()


//instatiate XMLHttpRequest object

try new thing for error handling, instead of a bunch of if/then for all instances


xhr = new XMLHttpRequest();


catch (e) catches error and tries to create object in another way


xhr = new ActiveXObject(“Microsoft.XMLHTTP”);


//handle old browsers

if (xhr == NULL)


alert(“Ajax not supported by your browser!”);



//construct URL

var url = “quote1.php?symbol=” + document.getElementbyID(“symbol”).value id=”symbol” for user input field in html code

//get quote

xhr.onreadystatechange = handler; invoke this whenever you get a response“GET”,url,true); want to open a request to that url and true is to make it asynchronous (return right away and do function when ready)

xhr.send(null); send the response nothing else to say


function handler()


//only handle loaded requests

if (xhr=readstate == 4) 4 means readystate is loaded (see picture below)


if (xhr.status == 200) 200 meaning OK



alert(“Error with Ajax call!”);





//get quote

$handle = @fopen(’)&f-e1l1,”r”) whatever the url for downloading csv stock numbers, can use fopen in php to open urls and download the stock quote and read the value from CSV by pretending to be a browser

If ($handle !== FALSE)


$data = fgetcsv($handle); fgetcsv puts all info into array, CSV, comma separated variables

If ($data !==FALSE && $data[0] == “N/A” ) check to make sure it exists and first column is NA

print($data[1]); print first value in array




(+ for javascript and java for cat, . for php)

DIV element says here comes a division of the page, gives an automatic line break (block element, verses in-line element)

<span id=”price”> allows GetElementById

Document,getelementbyId(“price”).innertHTML= xhr.response (refers to any XHTML embedded in between close and open tag, so make that equal to the response given from xhr)

AJAX sends request but doesn’t update page until gets data returned, so it doesn’t look like anything is happened to user, which is why there are progress bars, gif is a rectangular image of a slideshow and the illusion of animation

//pretend server is slow


!== FALSE php does not have strong data typing, this is how you check values AND data types are identical (===) or not (!==)

Googe Maps API -> Applications Programming Interface, go to Demos for mash-ups

-give you Developers key

Python, PHP, Ruby –> not as high performing as C, but doesn’t matter

Write scripts in php to grab data from CSV and massage


  1. IP takes information you send and creates a packet and labels it with an appropriate name and type (senders address, receivers address) and gives address for Proxy Server if going to Internet
  2. Launched to LAN, (IP packets, Novel packets, Apple packets), local router reads address and puts in on appropriate path (head to Router Switch if heading for corporate internet)
  3. Router Switch is digital internet pinball wizard, routing them in best way to get to where they need to go (send to firewall)
  4. Network interface picks up packets to be sent to next level (could be sent to proxy, used as a middleman to lessen load on internet connection for server and for security reasons)
    1. Proxy opens packet and looks for web address (url) and if acceptable, packet is sent on to the firewall (corporate/manager guidelines stop the packet from being delivered if url is unacceptable)
  5. Firewall prevents nasty things from internet coming to intranet or sensitive information from being sent out to the internet, if approved it puts onto bandwidth which is the path it takes to get to server—telephone line, satellite, wireless, transatlantic cable

TCP/IP Packet – transmission control packet (TCP) and internet protocol (IP) work together to deliver data requests

-IP is the packet of information consisting of header (which has packet’s destination and optionally the routers to use) and the body (the data IP is transmitting). Need to break up data into multiple IP packets

-TCP detects problems of requests retransmission of lost packets, deletes duplicated packets, rearranges packets delivered out of order and passes them back to the application that requested them

ICMP Ping Packet

UDP Packet

The Router

The Router Switch – digital internet pinball wizard, routing them along the way

Network interface picks up packets to be sent to proxy, used as a middleman to lessen load on internet connection for server

Proxy opens packet and looks for web address (url) and if acceptable, packet is sent on through the internet (corporate/manager guidelines stop the packet from being delivered)

Corporate Firewall prevents nasty things from internet coming to intranet or from being sent out to the internet, puts onto bandwidth which is the road it takes to get to server—telephone line, satellite, wireless, transactlantic cable (IP sends replacement packet if does not get response that it was delivered)

Ping of Death – special version of normal request ping to mess up unsuspecting hosts

preg_split (match regular expression described and split them into an array) “/\s+/” is looking

implode(“,” $symbol) put it back together

asynchronous javascript and xml (AJAX)

XML you get control of tags (instead of XHTML and php that have set tags)

.htmlentities -> rewrites special characters (& becomes &amp; in order to prevent parsing errors)

php creates httprequest and is returned an xml and web browser formatted it on screen by using AJAX function to read it

onsubmit=”quote(this.symbols.value).php” runs function quote(symbols) passing in this form’s symbol value

xhtml = “” set as string

request.responseXML.getElementsByTagName(“quote”); looks at XML file and grabs every tag named “quote”

for (var i=0; i<quotes.length;i++)


var quote = quotes[i];

xhtml +=”<b>”; += is to append to string

xhtml += quote.getAttribute(“name”)

xhtml += “ (“ + quote.attribute(“symbol”) + “)”

xhtml += “</b>”

xhtml += “<br /><br />”;

xhtml += “Price: $” + quote.getElementsByTagName(“price”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value

xhtml += “<br />”

xhtml += “High: ” + quote.getElementsByTagName(“high”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value

xhtml += “<br />”

xhtml += “Low: ” + quote.getElementsByTagName(“low”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value

xhtml += “<br />”



<div id=”quotes”></div> putting in there the dynamically created string of xthml

//output response to debugging window

document.getElementById(“debugging”).value = request.responseText;

request.onreadystatechange = function(); { replaces “handler” function (LAMBDA FUNCTION)

put stuff that you’d normally put in “handler” function (can access local variables inside code so you don’t need to use global variables)


Google APIs written in JavaScript

Look through documentation, lots of methods already available w/ google maps:

map.addControl(new GMapTypeControl());


//Add marker

var marker = new GMarker(point);


//associate info window with marker

GEvent.addListener(marker,”click”, function() { will call function when user does something of interest—this is LAMDBA function so you can write following and will only be evoked when necessary

//prepare XHTML

var xhtml = “<b>Science Center</b>”;

xhtml += “<br /><br />”

xhtml += “<a href=’’ target=’_blank’>”;

xhtml += “Harvard Science Center”; link name

xhtml += “</a>”

//open info window

map.openinfoWindowHtml(point, xhtml);

Google Maps make points

//instantiate map

Var map = new GMap2(document.getElemeyById(“map”);

// prepare point

var point = new GLatLng(37.4419, -122.1419); Google’s lat/long coordinates

// center map on Science Center

Map.setCenter(point = 13);

// prepare Ajax call

var request = GXMLHttp.create(); figures out how to create request based on browser type, otherwise need to use try/catch to determine how to code, GXmlHttp is a library for AJAX requests provided by google to handle this for you

Debugger for Firefox (great for developing)

HTTP/JavaScript interpreter, compiler…assumptions of what must be happening

Underneath the Hood (Software):

  1. Pre-processing: look for sharp symbols
  2. Compiling: taking source code and outputting Assembly language
  3. Assembly code “assembled” into 0/1s
  4. Linking compares 0/1s to other 0/1s from other functions/libraries called
  5. Executing puts together 0/1s into one executable file that have been linked and run those bits

Shared/Dynamically linked libraries (dll) are already compiled since they are commonly linked and it would be wasteful to repeatedly break it down

Register: tiniest unit of memory that exists in a CPU (32), only on registers that things can happen

X + Y = Z, x is stored in a register (plus whatever function is to be performed like the addition function), y is stored in another, z is the sum and stored in a third register

Registers: the tiniest pieces of memory where calculations actually take place (where computer does its thing)

Program counter: computer equivalent of the variable i (keeps track of where currently executing since it hops around)

Memory (RAM): storage for program code and data

Control Unit/ Datapath: lower level detail, it combined with datapath allows arithmetic to occur

I/O Devices:prints and monitors that actually talk to the user

x86 Instruction Set is the Assembly Language

.file “hello.c”

.section .rodata read only data


.string “hello, world\n”


.globl main

.type main, @function


pushl %edp

movl %esp, %ebp

subl $8, %esp

andl $-16, %esp

movl $0, %eax

ADD = Arithmetic Addition

Usage = ADD

Modifies Flags: AF CED OF PF SF

286/386/486 old processors (green screen). Takes two clock cycles to add one register to another on 286, 1 on 486 (reg,reg)

Clock rate is the frequency of the oscillator crystal, and shows the number of cycles that can happen in a second (so 1.5 GHz is giga-hertz, or 1.5 billion cycles a second), the chart above shows the number of cycles necessary to computer the ADD function

Life After 50 (search for materials pertaining to below):

  1. Intro to CS II
  2. Privacy and Technology
  3. Data Structures and Algorithms
  4. Visualization – use of computers and technology to present in sociologically compelling ways
  5. Systems Programming and Machine Organization – close to hardware/hacking/cryptology
  6. gdb -> download this and use it if using C
  7. Introduction to Formal Systems and Computing – theory of computation, what cannot be solved by computing due to fundamental limitations
  8. Computing hardware – assembly language, compilation

Panel discussing using MySQL with high volume of users and challenges

Questions: Number of MySQL servers used (database servers), number of DBA (database administrator, tells developers which servers to insert data into and grab data from), number of memcached servers, operating system, number of web servers

# of mySQL servers

MySQL (1 master servers, 3 slaves), all writes to master, all reads to slaves. 1/10th of DBA, 2 memchached servers, Fedora (same version of Linux used in Cloud), 2 web servers

Sun (Sun makes computer hardware and software) 4 servers, 1.5 (DBA), 8 memcached servers, OpenSolaris (own Sun’s), 160+ web servers

Flickr 166 servers, 0 (DBA, normally 1 guy just quit), 14 memcached servers, Linux (some type), 244 web servers

Fotlog 37 servers (1 DBA), 40 memcached servers, Solaris 10 (also by Sun), 70 web servers

Wikipedia N/A, everyone wears different hats, 79 memcached servers, Fedora/Ubuntu (Ubuntu is Linux), 70 + 200 web servers

Facebook 900 masters, 900 slaves (30,000 databases) 2 DBA, 805 memcached servers, Fedora/Redhat Enterprise Linux (pay for Redhat and get support), 1200 web servers

Youtube “I can’t say”, 3 DBA, “I can’t say”, SuSE (another flavor of Linux), “I can’t say”

memcached server software to cache output of php files, using database plus code to output xhtml will keep to prevent from regenerating same thing and instead use local copy from memcacheD

Can have clusters of servers, web servers serve up static content or turn around to database and ask for data for user

How to make start-up

  1. Buy domain name at registrar (GoDaddy $9.99)
  2. Need to tell them your DNS: Domain Name System (the ip address of your web host)
    1. DNS: Servers that translate IP address into domain name and vice versa
    2. Type in address that belongs to your web host (Top-Level Domains TLDs):
      1. .biz
      2. .com
      3. .edu
      4. .gov
      5. .info
      6. .int
      7. .mil
      8. .name
      9. .net
      10. .org
      11. .aero
      12. .asia
      13. .cat
      14. .coop
      15. .jobs
      16. .mobi
      17. .museum
      18. .pro
      19. .tel
      20. .travel
      21. .us
    3. DNS also has other information
      1. Informs who mail servers are (google for more info)

    1. Hierarchy to DNS as well

    1. Web Hosting providers: DreamHost, ServInt
  1. Start uploading FTP, sFTP and make web page
  2. Can use tools to benchmark your servers
    1. Take software to simulate requests to pretend to be browser to figure out how many hits per second that could be handled (PHP to simulate httpRequests to get information)
    2. Business server for $5,000 could get decent server to handle 2 to 3 thousand hits per second
  3. Will hit bottlenecks:
    1. Bandwidth, if running out of your home you are limited by your speed (if your ISP even allows it), limit on connection, Web Host can specify how many bits per second you can spit out, or cap at a total per month (MBs or GBs)
    2. CPU limits (Linux TOP reports what is going on in server), how much CPU/RAM/disc is being consumed has limitations.
    3. Easiest solution is to throw money at problem, but will pay a premium (2 times costs 4 times as much)
    4. “slashdotting” web site will get pounded with utilization that was not expected
  4. Vertical Scaling – buying better and better – problem is you max out and still don’t get desired results
    1. CPU
      1. Core, L2 Cache,
    2. Disk
      1. PATA, SATA, SAS
      2. RAID
    3. RAM
  5. Horizontal Scaling – buying/renting a lot of same cheaper things (redundancy) but it’s good because if one fails you have others to fall back on BEST METHOD
    1. Data Centers
      1. Lots of air conditioning, hardware, just holds all the hardware for servers linked together

How do you make use hardware when you have multiple servers and CPUs (core processing units), some PCs have multiple cores (like dual-core processors)

-use virtual machines to link these together (Virtualization)

What can you do to improve the performance of your own code?

  1. Use conservative calls to “expensive” functions like malloc because it needs to make a call to the operating system
  2. Use better hash tables, data structures and algorithms (worrying a little less about hardware actually executing it)
  3. Tightening up loops, moving around code, make sure unnecessary calls are not made
  4. Bitwise operators to get at lower level bits instead of using 32-bit characters/ints

PHP Accelerating:

PHP is slower because it’s interpreted, not compiled like C. Analyzes top to bottom, left to right, and then translates.

Rather than make server do this over and over (interpreted “Just In Time”/compiled not by you, but by web server or php interpreter) cache the results –Opcode Caching

Code Optimization – can compile using different optimization option settings

One server won’t cut it anymore even with best software and hardware, you need to have multiple servers. Need to present this to rest of the world as one server (instead of telling people to go to different address) by using DNS. Set up two identical web servers, put php files/jpegs/gifs on both, but have different ip addresses. Go into DNS server (web host-however done) and inform that the ip address of is not just but also,, that you can go to via TCP/IP to find this “”

nslookup in linux box to look up name server and ip addresses

could go to to get (or

Problem with Load Balancing at Layer 4 (DNS): one server goes down out of 1,800 means that 1/1800th of users will go to dead server (since DNS is still giving address). So you need to take the ip address off the list, but cache DNS answers for 60 seconds, 4 hours, 3 days, etc (DNS propagation problems), so 1/4th of users might have cached the answer for up to x hours and will continue to go to dead ends. So there must be a better way than just removing it from the list

Better way: Layer 7 is at the http or application layer, higher than 0/1s at the TCP/IP layer

Embed location of server in actual host name of address, so when it’s resolved it will be sent to different server via DNS. All of them could resolve to the same IP address, look at the header and send them to different server, so in own code could send to different places

Sticky Session

Shared Storage:

php uses disc space to cache settings by implanting cookies

-associated with browser by way of cookie to make session-object and stored server-side

-so if you store things at server number 1 and store session file there, and you go to server 2, you need to either remember the info is on server 1 or figure out a way to do this

Multiple ways:

CS50 design:

-make (/scratch) front-end server and use filesharing to expose all other backend servers

-networks are slower than keeping everything locally (performance impact)

-single point of failure at front-end VM

-even if 1 datacenter goes down that has front-end VM, going through Amazon that have multiple datacenters (A, B, C) can move to different geography in 10 minutes

-design decision with calculated risks

Hardware Load Balancers: Really fast pieces of hardware that route to different servers to balance load (companies can spend $10,000-$100,000+ on these)


Serving up HTML pages is faster than interpreting, executing, spitting out pages from php files

fopen and spit out bits

craigslist = .html ending, design decision was to take user’s input and generate once a .html post and if changed they delete the file they overwrite it. Delay for posting ads because it’s a free site and design decision. HTML cache with all ads (are reads more important, or are writes?)

Config file MySQL for receive request for SELECT statement, remember the results for a few minutes. Some of the features are available and free depending on what you use

Open source, facebook uses it, php code is slower,

Speed: avoid database by keeping on disc, avoid disc by keeping in cache (memory)

Databases have different ways of storing data

HEAP (Memory) MySQL table that only exists in RAM (so once you shut down server then it’s gone. Good for temporary databases)

Replication: Master – Slaves

Having to tolerate lots of requests and inserts per second

Or pay twice as much for additional masters to prevent single point of failure

Lag time and consistency issues with master databases not updated yet

Fancier set-up:


Clustering (RAID xor, n+1 copies of data, make n+1th the xor result of everything else, and if you lose any of the n’s you can recover them thanks to bitwise operator xor):

CS51: Intensive Introduction to CS II

Transition from being able to program, to computer science, so you understand all the aspects

Look at things from the right perspective, write elegant code, be understood and easily modified

New kinds of algorithms that cs is engaging to transform, solve problems in seconds. Recognize easy-looking problems that are actually hard

Programs and Efficiency: Goal is simple, readable, extensible code

Formal proof that there is no buffer overrun in Windows: be able to prove that software does what it says it does definitively

Languages (APIs): SQL, Regular Expressions

Learn key concepts like enclosures, objects, patterns in software, etc more important than what language you currently know

Correct language is important, parallelism is important in today’s society to choosing code that does this is important

Not 10 minutes, but might take until the end of the universe

Easy way 1: Try every possibility (exhaustive search), in 7 person network it wouldn’t take long, but if it were larger it would take really long time

Easy way 2: Simply send it to someone who hasn’t heard the news (pick a neighbor), maybe make it smarter by sending to someone with a lot of friends since they are popular

Sometimes you find out that the news can’t be shared. They have no friends.

The solution is to use the “greeting” method by sending to the right neighbor with a little bit of cleverness. But for the other, you can’t do better than the exhaustive search option.

Conversation Problem: Can be done with O(n) time

Notification Problem: No one knows how to do better than exhaustive search, one of the open problems (only hearing news once)

Pick 25 people who have most friends

Exhaustive search: try EVERY combination of 25 and see which has combined total of largest

Greedy algorithm: give to most # of friends

Problem: friends are probably shared/overlap

CS61 Systems Programming

Introduction to Machine Organization (how computers work on the inside), C programming experience or CS50 prereq

How registers work, compilers work, how processors work, how memory works

Understand what affects performance of programs actually running, how it accesses memory and how they interact with each other

Caching and memory performance, makes things running faster, use code that makes good use of cache

Blue screen of death

What was going on in memory of kernel when crashed

Segmentation fault: processor has raised an exception by accessing a bad piece of memory, core dumped means it left a file in the directory called “core” that contained the memory image at the time it crashed, would be able to debug program and find source of problem if you can read it

gdb can read in core files

Want to look at the assembly to understand what went wrong (dig below C code) and see what happened at machine. Use disassemble command to see what processor is doing down inside as result of what was compiled in C down to assembly

Maybe not be able to write assembly, but should be able to read it. Find tutorial on assembly language

mdw shell is executable file produced by compiling C code. If you can read assembly language you could understand what it’s going on

The latin-looking characters are hexadecimal translated into ASCII

This is hexadecimal, not very helpful

gdb disassembler shows this x86 assembly language

That’s where you are able to get the user input for password, put it into a register for comparison

So translate into ASCII at that point to get password: takecs61

Co-inventor of UNIX (OS)

Open source so anyone could see this hack and find it

So since he wrote the c compiler, he hacked it to add the backdoor code into the executable and deleted the original login.c

Source code for c compiler was sitting around, so he would have to make the c compiler when compiling itself insert the backdoor into itself and delete the original c compiler

CS 171: Visualization

Conveying information through visual representations

William Playfair invented the concept of bar, line and pie charts to make sense of data

Not only want to look at information, but want to interact with it – zoom, filter, customize

Tree map: represents companies and shows market change between close of market each day

Visualization: better than looking at rows/columns of data to communicate information, answer questions

Best way to produce mash-ups to show good information

Visualization isn’t telling you the whole story: not telling you the population

Shows by county, but still doesn’t take into account population size of specific county

Shaded according to population density

Make sure conveying right information in appropriate ways

Can uncover patterns when looking at visual of data (Barry Bonds increase in batting average during allegations of steroid usage)

Word tree: size of word larger based on number of times used. Deposition, suggests coached for deposition for repeating words

Wordles: way of representing most used words in speeches (Obama’s acceptance speech)

Obama’s acceptance speech

Program in Java, using Python scraping to implement own web crawlers

Data Models: scraping, web crawling

LOLCats RSS feed for website

Recursion: function calling itself over and over again

Heap is memory you can always access during program, stack is segmented into frames

a.out this is awesome what is in argv[2]: is

a.out this is awesome printf(%c,*argv[2]+2), would give seg fault (accessing memory outside array)

compling: changing source code into object code (assembly then machine language)

preventing memory leaks: free()

state the number 17 in binary: 10001

http: hypertext transfer protocol

CS124: Problem Solving Tools

Checkbook Financial arbitrage (cryptographic stuff to protect checkbook transactions)

Writer interface to pull information off google learn algorithm to calculate shortest paths

How does google find duplicate/near duplicate documents and collapse

How to find optimal strategy for games like RPS

Leave a Reply