Below are my notes from when I originally took the Harvard Online course CS50 Introduction to Computer Science. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

5Reminder for hardcoding:
Hardcoding with integers will set them as that data type in C, so either put in one decimal or declare type.
Float: 32 bits
Double: 64 bits
\* GetFloat is from cs50 and not normally available
Other ways to create the same effect *\
#include <cs50.h>
#include <stdio.h>
int
main(int argoc, char *argv[])
{
printf(“Give me temp in F: “)
float f = GetFloat();
float c = (5.0/9.0) * (f-32);
printf(“%.1f in F is %.1f in C!\n”,f,c);
}
./beer1 | more (program to show you one space at a time to show output)
Error checking:
If n<1
{
Printf(“That makes no sense/n.”);
return 1; (return 0 means it’s error-free unless something else is specified as return value, “1” is now this error-code type,
}
Function in C (put under main )
Void
chorus(int b)
string s1, s2;
s1 = (b == 1) ? “bool
while (n) so long as we’ve required this number not be negative
chorus(n–); takes n– into account after iteration, –n takes into account before iteration
void
swamp(int a, int b)
{
Int tmp;
tmp = a;
a = b;
b = tmp;
}
Void increment() ; creating a new function, must be declared globally if written later in the cod
OR, can create .h file to store it in
Conditons – using (( )) will remove error messages
&& = and
// = or
Int
Main(int argc, char * argv[])
{
Printf(“Insert integer.\n”)
Int n = GetInt()
If (n > 0)
printf(“The number is positive.\n”);
else if (n==0)
printf(“The number is 0.\n”);
else
printf(“The number is negative.\n”);
}
Switch (n)
{
case 1: case2: case3:
printf(“You picked a small number.\n”)
break;
case 4: case 5: case 6:
printf(“You picked a bigger number.\n”)
break;
default:
printf(“You picked a different number.\n”)
}
while (i <= 100)
{
printf(“\rPercent complete: %d%%”,i);
fflush(stdout);
sleep(1);
i++;
}
for (initializations; condition; updates)
{
}
fflush is used to force the buffer to execute without waiting for /n (will execute screen-buffer when it sees that, so using /r will need to specify)
Sleep = puts this thread on hold for x seconds (specified at 1)
i++ used in place of i = i + 1 during code (or i–)
%% and // are used in strings to actually show them
\r is a carriage return (go to beginning of same line \r\n is Windows, \n for linux does same
for (int i = 0; i <= 100; i++)
{
printf(“\rPercent complete: %3d%%”,i);
sleep(1);
}
%3d%% = formats variable “i” above as digit data type, 3 width, and a % at the end
bool was added in by cs50
int
main(int argc, char *agrv[])
int n;
bool thankful = false;
do
{
printf(“Give me a positive number./n”);
if ((n = getint()) > 0)
thankful= true;
}
while (!thankful);
printf(“Thanks for the positive integer!\n”)
}
Rot13 – rotating to crypt information by 13 letters (incyphering / crypting)
Use { } to enclose things like FOR statements to keep them executing together
ASCIITABLE.COM
Can use 65 as “A” if defined as character, same thing as INT(“A”) would be 65 (lower case alphabet starts after):
printf(“%c : %d”, (char) i, i)
Declaration:
returnvalue name (variablestopass)
local variables “overshadow” global variables of same name
x = increment(x) this “x” is a copy (int as “a” in function below)
int
increment(int a)
{
return a+1;
}
Example: If statement and other option:
string s1;
If (i== 1)
s1 = “bottle”;
Else
s1 = “bottles”;
Same as:
s1 = (i == 1) ? “bottle” : “bottles”
For above, s1 will get value before or after colon depending on whether or not i==1 is true : false
Float point values will never be “precise”, so move decimal point over to try and avoid the weird float problem that can occur:
Example: 3.41
Float returns: 3.4100000853068847656
Workaround: if using for dollar and change, then multiply by 100, work with value, then divide by 100 to get decimal back in correct place
gcc demo.c – lcs50 -lm –o demo links in a library that has other functions
EXAMPLE OF ARRAY IN C
#include <stdio.h>
#include <math.h>
#include <cs50.h>
#define QUIZZES 7 this will define the number of “spaces” to make for array
int
main(int argc, char * argv[])
{
float grades[QUIZZES], sum;
int average, i;
sum = 0
printf(“\nWhat were your quiz scores?\n\n”);
for (i = 0; i < QUIZZES; i++);
{
printf(“Quiz #%d of %d: \n”,i+1,QUIZZES);
grades[i] = GetInt();
sum += grades[i]
}
average = (int) round(sum / QUIZZES);
printf(“\nYour Average is: %d\n\n”,average)
}
Apparently you can access other memory through arrays in the RAM
Segmentation fault (core dumped) overstepped the memory allotted to the file (core dump shows what was going on in the memory when the program failed)
String is just an array that holds each individual character, \o is NULL character to delimit separate strings
if s != NULL
{
for (i=0; n = strlen(s); i < n; i++) setting n to equal strlen(s) so it doesn’t have to keep checking, spending 32 bits to save time
{
c=s[i];
printf(“%c\n”,c);
}
}
Flipping from lower to upper case (32 is the difference normally though)
If (s[i] >= ‘a’ && s[i] <= ‘z’)
printgf(“%c”,s[i] – (‘a’ – ‘A’);
else
printf(“%c”,s[i]);
man 3 printf (type this in LINUX environment to see manual chapter 3. Or go to website that probably has better instructions anyway)
int main(int argc, char *argv[]);
argc is # of commands typed in for command
argv is a vector that is created when typing in at command line
argv[i][j] 2-dimensional array
when running programs, you ask for memory and don’t give it back
Enigma machine encodes Cryptography
Plaintext encryption ciphertext decryption plaintext
Caesar Cipher: Ci = (pi + k) % 26
Instead of 26, can figure out set “key” to map it/rotate (French, Vigonaire)
26^x, where x is the length of the word used to crypt
Digital Encryption Standard (DES):
Break up input into left and right, perform calculations, move around the bits, and repeat 16 times, total “Keys” become 72 quadrillion
Triple DES: same algorithm ran 3 times
Advanced Encryption Standard is new one.
Asymmetric Cryptography is also used, public and private Key, public Key can only be cracked by private Key
Plaintext encryption (using public key) ciphertext decryption (using private key) plaintext
Basically use all this to create HUUUGE prime numbers and multiply them together to create a larger number
Running Time T(n)
Parallel Processing (Divide and Conquer)
-worst: Big “O”
-best: Omega
-coincidence of both upper and lower bound: Theta
O(1) “constant” always take same amount of time. “Hello, world”
O(log n) “logarithmic” split in middle to reduce run time
O(n) “linear” i++ going through all available options, so running time would be “n”
O(n log n) “supralinear”
O(n^2) “quadratic
O(n^c) “polynomial”
O(c^n) “exponential”
O(n!) “factorial” checks ALL possible solutions…BAD
Linear pseudo-code:
On input n:
For each element i:
If i==n:
Return true.
Return true.
Top of program, set return value, name(declarations) to find functions if located later in program
int sigma(int) note, does NOT need to say what the variables name is though
later in code:
int
sigma(int m)
{ }
Each time you declare a variable, if inside curly braces {}, will only exist inside
Recursion:
int
sigma(int m)
{
// base case
If (m <= 0)
return 0;
// recursive case
Else
return (m + sigma(m-1)); calling itself again, before being able to answer needs to “answer” counting down—similar to magic stack
}
-Each time function is called, you need to create a section of memory to add to “stack”, then need to “tear it down” after it’s done, so may not always be the best option
-Better for readability (elegant) than looping through to get answer, BUT will run out of memory in the “stack”
-tail-recursive calls (happens at end of code) smart compilers will prevent stacking to occur
int answer = (m(m+1))/2 best solution since you don’t need a function
in C there is NO good way to tell how big the array is (need to “remember” what it is)
-sort elements in order to make log n happen
-to sort by looping through and putting in order is n^2 (since you need to go find 1, then find next smallest, then again, etc)
Instead try:
-(Merge sorting): finding middle value and split in half (n steps to find middle value)
Linear search: n Big O of n or omega of n are running times due to best case being one searching for is in 1, and worst is n
Binary search: widdle down upper and lower bounds of problem (log n)
Needs to be sorted or have some sort of logic to help reduce (COST of massaging data first)
Hacking: use dictionary available with 400,000 words top-bottom left to right alphabetically
Bubble sort: swap if pairs are out of order
worst case is n^2 running time
Don’t need to store variable, just need to keep track of iteration OR just stop when you get through everything without needing to swap using a local variable, find out where swaps are taking place and make upper and lower bound set appropriately
Selection sort: look for smallest each time and move to correct spot (swap using temporary variable)
Best case is always n^2 since you need to go through n times
Merge Sort: Recursive
Sort left half
sort right half
merge sorted halves
38 27 43 3 9 82 10
38 27 43 3 9 82 10
38 27 43 3 9 82 10
38 27 3 43
Merge is not a swap, you use extra memory to create a new array while you are merging them
Compare two lists when merging, incrementing lists only when they are greater than the other integer being pointed at (that’s how it’s merged)
Merge sort: n log n
n times log n (in order to merge you need to touch each one, which is “n”, but you are splitting each time which is log n)
On input of n elements;
If n < 2 return;
Else
Sort left half;
Sort right half;
Merge;
Heap, Insertion, Quicksort, Radix, Shell – all other types of sorts
Libraries hold these different kinds….
-std=c99 is use most recent C
-Wall will show all errors
-gdb – better debugger that works well with gcc compiler
gdb “mode” will allow debugging, can put in “breaks” at main (whatever line it is at)
use next to move to next line
list to show code before and after line
print will show you what is in variable named after
continue will let run until completion
display will show you code and what it is holding
Passing Variables: used by reference (pointer)
Instead of telling what the function variables ARE, tell them WHERE they are (so instead of ByVal, use ByRef / Target / Address)
*Is a pointer SO
Declaring function: swap(int *a, int *b) saying points to address/location in RAM
Calling function: swap(&x, &y) ampersand gives address of operator, INSTEAD of value
Low-level details like this are hidden in Java and PhP, security is increased, but control is decreased
C and C++ take advantage of functions that can go to ANY address in memory
{
int tmp;
tmp = *a
*a = *b;
*b = tmp;
}
Type mismatch error if you forget * because a is obviously an address and NOT an int, but using *a will store the INT found there
So tmp equals what a is pointing at, and where a is pointing at equals what b is pointing at, and finally what b is pointing at equals what was stored in tmp
Pointers just relate to what is stored in RAM (where), declare the TYPE the pointer is pointing TO
So if int i, then int *p if using to point at i
int i, j;
int *p;
p = &i; storing in p the address of i (REMEMBER, NEED &)
*p = 5;
So value at the address p should become 5 (take p, follow the arrow, and put 5 in there)
int
main(argc, char *argv[])
an array of pointers to char
when you declare an array, you declare an address with a certain length that is defined, SO you must remember where the end of the array is
Array a[5]
*p = &a;
Would be pointing at a[0]
char *s1 (strings do not exist in C, must use pointers to view the string entered)
char *s1 = GetString() returns a pointer to the first byte of the string array entered
Do while s1 != NULL && s2 !=NULL loop, if you exit then message that they are not equal
If(!strcmp(s1,s2))
printf(“You typed the same thing!”)
int strcmp(const char *s1, const char *s2) const is “constant” to make sure you don’t alter what is actually being pointed at (promising not changing variable)
will return 0 if strings are equal, less than 0 if s1 is less than s2, greater than 0 if s1 is greater than s2
! is used to invert what is returned from a function
char *s = GetString();
for(int i = 0, n = strlen(s); i < n; i++)
printf(“%c\n”,*(s+i)); this takes where “s” is pointing at and adding 1 to return the characters in array, same as if an array and not pointer could use s[i]
trick to figure out size of array:
if you declare an array in a function you can use sizeof(array) which returns the size of array in bytes that can be used to figure out length
sizeof(array[0]) gives the size of an individual element
int is 4 bytes each (or 32 bits, 8 0/1 per byte)
Dynamic Memory Allocation:
Stack – grows upward where you use more memory when calling each function, parameters, ect, when you are done using the memory (local variables) they are given up, variables disappear, freed up to use again
Heap – grows downward, allocating more memory dynamically, ask for more memory from stack but must “free” them to give back
//free string
free(s); where “s” is a variable storing a string that asked for dynamic amount of memory
toupper function to change to uppercase
copy a string:
char *s2 = malloc((strlen(s1) + 1) * sizeof(char) ) manually allocate (malloc), +1 to make room for NULL at end
if (s2 == NULL)
return 1; make sure you were handed something or else you will do something bad to memory you were not actually handed
Memory Management:
Text
(All programs) |
Initialized data
Global variables/parameters |
Uninitialized data |
Heap
Stack |
Environment variables |
Heap and Stack breakdown:
Heap
Allocated string i |
Allocated string j |
If this collides, core will be dumped |
Function local variables |
Function parameters (32 bit return address of function that called this) |
Main local variables |
Main parameters
Stack |
If you overwrite memory you can overwrite the return address and veer the program in a different direction
Virtual Memory – using harddisk space as if it was RAM
-takes things in RAM and temporarily moves to harddrive if running out of RAM
-“paged to disc” is this process, takes longer to read because it needs to spin to read/use programs pages to disc
char *s2 = malloc((strlen(s1) + 1) * sizeof(s1)) THIS IS JUST GIVING MEMORY, NOT STORING ANYTHING
if (s2 == NULL) THIS MAKES SURE SOMETHING WAS PASSED, OR IF ERRORED, PROCEEDING WOULD ERROR S[0]
return 1;
//copy string
int n = strlen(s1);
for (int i = 0; i <= n;i++)
s2[i] = s1[i];
//change copy
If (strlen(s2) > 0)
s2 [0] = toupper(s2[0]); capitalizes first letter of string s2 (which is s2[0])
//Print output
printf(“The original value: %c\n”,s1);
printf(“The new value: %c\n”,s2);
//free memory
free(s1); since GetStr() allocates memory
free(s2);
using #foo.c macro (sharp defined) will copy and paste each time it’s found when compiling, which means it will define same variables, so there is a trick to find out if it’s defined or not
#include <stdbool.h> library that contains functions
enum { false = 0, true, meaningless }; used to put numbers to symbols to keep in order easily
typedef char *string; defining new type, can be simple like this, or more complex data structures
purpose of h file is to just make sure declarations are included and explain what it does, but protects from seeing code
double returns larger float
long returns larger int
return INT_MAX (this is in a library to specify maximum for allowable INT)
sscanf takes string and replaces things, as opposed to printf that does opposite
int n; char c;
if sscanf(line,” %d %c”, &n, &c) == 1) meaning this was ever found, using point in order to let scan change the values, otherwise removing “&” will pass a copy of the variable
sscanf will only assign what is found
equaling “1” is saying that the FIRST input detected (at the pointer to capture everything that was entered) was successfully assigned (I think…look up sscanf if necessary)
isdigit() says whether or not a digit
%lf is double
%f is float
%d is decimal
%c is char
%lld is long long decimal
unsigned int gives you an extra bit, which gives you twice as many INTs that are non-negative
stdin = stuff in the keyboard
stdout = stuff going to screen
fprintf(stdout,”hello, world”) send to screen
fprintf(stderr,”hello, world”) send to error log if you change stderr to log (default is screen)
GETSTRING FUNCTION, DYNAMIC MEMORY ALLOCATION
GetString()
{
// growable buffer for char
char *buffer = NULL;
// capacity of buffer
Unsigned int capacity = 0;
//number of chars actually
Unsigned int n = 0;
// character read or EOF
int c;
// iteratively get chars from standard input
while c = fgetc(stdin)) ! = ‘\n’ && c != ‘EOF’ EOF “end of file” is passed when end of string is reached
{
// grow buffer if necessary
If (n + 1 > capacity)
{
// determine new capcity
if (capacity == 0)
capacity = CAPACITY; this was set to 128 above
else if (capacity <= (UINT_MAX /2 )) UINT_MAX set to be 4 billion, making sure before you double size that you are able to assign that much
capacity *=2; doubles capacity, fewer times asking the better off
else
{
free(buffer);
return(NULL);
}
// extend buffer’s capacity
string tem = realloc(buffer, capacity & sizeof(char)) realloc will find other space and extend instead of copying and moving and freeing other space
if temp == NULL)
{
free(buffer);
return NULL;
}
buffer = temp;
}
// append current character to buffer
buffer[n++] = c;
} end of while loop section, increases n, appends buffer and returns to top
// return NULL if user provided no input
if (n == 0 && c == EOF)
return NULL;
// minimize buffer
char *minimal = malloc((n+1) * sizeof(char)) making new variable with smaller memory
strncpy(minimal,buffer,n);
free(buffer);
// terminate string (add end of line thing)
minimal[n] = ‘\0’;
// return string
return minimal;
}
char buffer[1024]; using an explicit array to give set amount of memory, instead of using a pointer and malloc to allocate the memory later
%d = 32 bits
%c = 8 bits
fgetc (file get character)
Dangerous Functions
gets (doesn’t check buffer size)
scanf
strcopy (does not check and make sure destination exists or see if it is long enough, will give segment fault error) strncpy (takes another variable of length, string copy)
strcat strncat (same as strncpy—string catenate)
Generally do not check size of buffers and length of arrays normally in older code
Script kitties google and get code to randomly find IP addresses and try to log on and get username and passwords, trying to get in thousands of times
Main(int argc, char *arg[v])
so you type main David in command line
argc = 5
argv[D,a,v,i,d]
Pearl script running program with this script called first
$arg = “1234” . “\xb8\xf5\xff\xbf” . “\x94\x87\x04\x08”;
print gotcha shows where in text it is being stored
hack.pl eco if you can crash making seg fault then there is a bug and a potential opportunity to do something bad (“Buffer Overflow” exploit)
Possible that things in memory will not be laid out in the same way, compilers can shift things around somewhat randomly, can also implant canaries and terminate programs if certain variables change, otherwise can be deterministic and make things be set up the same each time and keep the same addresses in memory
Typedef struct
{
int id;
char *name;
char *house
}
student; name of data type
// declare class (not java terminology, just literally the student’s class for example)
student class[STUDENTS]; array that makes for each student an id, name and house
class[i].id = GentInt();
class[i].name = GetString();
class[i].house= GetString();
–uses array[i].variable to figure out what part of array holds this information
//save to disk
FILE *fp = fopen(“database”,”w”); *fp (file pointer), file open, database(name of file), w(write)/r(read)/p(append)
if (fp != NULL) check and make sure you can write there
fprintf(fp, “%d\n”),class[id].id); printing to stdout that was defined as fp (pointing at FILE)
fclose(fp);
file format is deciding the convention of how it will be laid out (81 INTs, 81 INTs) or (id, class, house, id, class, house)
jpeg format starts with a certain pattern of bits, can analyze and find that pattern and can read in until you find the next pattern, might not completely work but is an artform and is certainly a place to start
bitmap is not compressed, from left to right, top to bottom, each bit represents one pixel
jpeg is compressed, multiple pixels shoved into one bit
resize program that will enlarge the pixels by a factor
took pictures, formatted card before removing, put sd card into mac, ran “dd” that allowed to suck the bits off the card and dumped copy into file
need to figure out jpeg signature
using struct to create one big array so you don’t need to make sure each array lines up at the correct [i] in individual arrays
problem with arrays:
need to preallocate memory with arrays (could end up with too little or too much memory) if hardcoded
OR need to use malloc and check for conditions to need to give more memory
If google has an array of ALL websites, would take forever to search if going through each line (even if merge sorting using n log n and then using log n)
Hashtables – take linked lists, rip them apart, and recreate in something better
System calls (functions) that talk to operating system, involves a lot of steps to shuffle around, find memory, and then give it to program and continue
Fragmentation can occur from malloc if other programs are running and from own code that might be using space next to where currently being used
Can link together different sections by including pointer/location at end of each
Typedef struct _node conventional name to allow link to occur
{
Int n;
Struct _node *next;
}
Node;
So inside of each struct is 2 things, an int called n and a pointer called next (int is 32 bits, pointers are 32 bits)
struct array[5]
array[1].n = 5;
array[0].next = 0328323; symbolizing the location of the previous one
grep = find utility (ctrl+f)
cat votes.txt | grep obama shows rows in file w/ obama
cat votes.txt | grep obama | wc –l counts # of rows in file w/ obama
Valgrind
http://valgrind.org/docs/manual/quick-start.html
useful for memory related bugs
C won’t seg fault if you overstep your memory boundaries (for instance, going out of an array’s determine range), but Valgrind will help catch that
Hexadecimal
Base 16 (base 2 = binary, base 10 = decimal normal counting)
0 1 2 3 4 5 6 7 8 9 A B C D E F
Hex: 1 = 0x01
Binary 1 = 00000001
Hex 15 = 0xF
0xFF is 255 or 11111111 in binary (first “F” is 1111, second “F” is 1111)
Convenient way to represent a byte, straight forward mapping to binary, condenses code
Endianness
Register (32 bit can only hold that amount at once): 0A0B0C0D
Memory, register stored
Big-endian uses big-end first
a: 0A
a+1: 0B
a+2 0C
a+3: 0D
Little-endian uses big-end first
a: 0D
a+1: 0C
a+2 0B
a+3: 0A
bpm, header is 54 bytes, padding in multiples of 4, 58 is smallest possible that is an image
fseek(fp,2,seek_set) seek_set is special constant that shows start of file, skipping 2 bytes because those are the file type and the next byte holds the file size for bitmap structure
unsigned long 4 bytes, long long is 8 bytes
rewind(fp) rewinds to beginning of file
Bitwise Operators, instead of looking at thing as a “whole”, this goes bit by bit:
& bitwise AND
| bitwise OR
Etc…
XOR (^) exclusive or, meaning they are different (<>)
Ones complement (~) flipping bits, reversing 0/1
Shift (<< left shift, >> right shift): 0001, left shift, 0010 (pads gaps with 0’s), multiplication/exponential by power of 2
Use shifting to move bits around in a byte to look at bits inside
If (buffer[i] & mask) implies that a match was found between the shifted value and the original
-use of printing without flushing, means you can loop and add to buffer of things to print
-bool is a byte, so can go bit by bit and make changes rather than find out if something is true/false and change entire thing
-A and a in 32’s column are different, otherwise identical (so can make the switch there to flip upper/lowercase)
Do not need temporary variable to swap values of two variables:
Bitwise Operators, can gradually swap them using XOR
Hash Tables
Array: fixed size, pain to grow and waste cpu cycles to create new buffers/swap/free, random-access available by indexing
Lists: linked lists, allowed to grow dynamically, wasteful of memory for having to store Mehta-data (pointers to next spot), give up random-access since you have to go by ‘n’
Both have linear time insertions/deletions/lookups
Hash Tables – linear probing
MOD is used to modify a variable
So try at MOD place then go left-to-right/n/linear to find blank space, could end up at end of list though
-Collision possible of same data? (birthday problem, sharing the same birthday), VERY likely to happen, need to plan for it
Coalesced chaining
-Give extra memory to store “breadcrumb” trail to show where other value end up so it can skip over there and try
Separate Chaining
-let it go where it wants, but maintain a separate linked list, O(n/m) running time (where n is the number of words and m is the size of the array)
Tree that all values hold arrays is a Tries
Used fixed size array that stores something along with metadata of pointers of linked lists
Typedef struct node
{
Char *name; must malloc things, free
Struct node *next;
}
Node;
OR
{
Char name[1024]; overcommitting space, quicker, don’t have to malloc
Struct node *next;
}
Node;
Typedef struct student mixing data and Mehta-data, also only need name at the top IF using as a reference OUTSIDE of where being defined
{
Char *name;
Int id;
Char *house;
Struct node *next;
{
Node:
Typedeft struct
{
Int id;
Char *name;
Char *house;
}
Student;
typedef struct node
{
student *student; keeping linked list separate from pointer, points to list
struct node *next;
}
node;
ptr->student->name :points at student-pointer that points at name value
Hash Table
#define SIZE 6
Int
Hash(int id)
{
Return(id % SIZE); returns ID num MOD 6
}
#define SIZE 6
Int
hash(char *s)
{
If (s==NULL)
return -1; signifies error and impossible location
char c = s[0];
c = toupper(c); puts lower/upper in same playing field
c -= ‘A’; subtracts value of capital A to get value between 0-25
return c % SIZE; mods the integer by SIZE defined above
}
Problem: going to get clustering since based on first letter of first name (not common Q)
More clever to hash on: birthday, use multiple letters
This will give a better distribution, but obviously alphabetical order/easily identifiable structure is lost
{
Int tally = 0;
For (int I = 0, n= strlen(s); i<n; i++)
tally += s[i];
// this point in the story put in the rest of function above
In order to insert into the hash table, just insert in the front of the linked list
Children, parent, siblings (same level as other children), leaves (all the way at the bottom)
Binary Search Tree (each only have 2 nodes to point to)
-each node only has 2 (or fewer) children nodes
-parent node is larger than left child and smaller than right child
Start at the root, run time could be log(n) or might end up being just a linked list O(n)
AVL, Red/Black, 2/3 Trees (refine definitons of insertion/deletion, instead rearrange the root to prevent creation of just long linked list) instead of just normal BST
Tries: a tree, each node is an array (generally size of 26) the point being to store a dictionary of words – time is O(k) (where k is the word length)
-root node is array of size 26, each element/node is a pointer to another array, hash on the first letter of a word (M) put in the 13th or array(12) spot, then spell rest of the word in the array it’s being pointed at and put an end character (NULL, DELTA, etc) to show end of word, if it finds a place where characters in the word stop sharing, points to a new place where more letters can be found
(can be a waste of memory because using array with 26 other arrays and so forth, especially if not going to be using words with q’s in them) trade-off: waste space to get constant-time and random-access processing
Heaps: a heap is a binary tree that
- It is complete (if you draw the tree from top to bottom and left to right, every node is complete except maybe the most bottom-right)
- If each nodes value is greater than or equal to its two children (no distinction between left/right like BST)
Heapify: take something that’s an almost heap, and swap the branches that need to be changed
-need to update child and parent pointers to do this
Heapifying a Binary Tree:
-take smallest tree on bottom-left and heapify, then go up one level and heapify, then move to right, etc
-make parent be switched with largest of the children
-need to consider entire tree at a single time to switch things around, so when you move something down a level, compare to the children, if swap occurs, go down next level and compare to bottom until at the leaves
-since they are complete, the height of a complete binary tree is log(n), so run time is n log(n) (at longest)
-Problem with merge sort because it needs memory to store separate array while rearranging
-heap sort can sort in place by switching around their pointers
-Means that the root is the biggest element, take and put it off to the side, repeat the heapify, take root and put it to the side, repeat….until there are none remaining
n log(n) for set-up, n log(n) for fixing, so run time is n log(n) (though we’d say 2n log(n), but doesn’t use more memory and so doesn’t matter)
-to set up tree, put numbers into array
Right child: 2i +1
Left child: 2i + 2
Can use binary trees to implement an algorithm to compress files Huffman Coding
-have text file (ASCII) and each takes up 1 byte on disc
-can use less than 8 bits to store information since letters share similar structure/redundancies
-count up how many a’s, b’s, c’s, punctuation, etc
-give nodes for each letter, count frequencies that each is used
-iteratively, combine as siblings the two smallest nodes as siblings, parent is sum of frequencies
-then take that parent and add it to the next smallest leaf and create a new parent of the sum
-left-children are 0, right-children are 1, so B would be 0000 /(least frequent), E=1 (most frequent)
-idea being that you spend less bits on representing most frequent characters and more on less frequent ones
What is the internet?
-network of networks
-LAN connected locally
-connected via ISP (internet service providers) to other ISPs
-the infrastructure on top of which services like email, web, etc are run
TCP/IP TCP: transmission control protocol, IP: internet protocol
-IP address is a unique identifier 1.2.3.4 (0 – 255), that is the technological equivalent of a postal address
-Routers on the internet know which computers live where (IP Address, Direction) so that’s where it sends the information, just route data from one place to another
-some websites have multiple IP addresses, have words like cnn.com to represent numeric IP addresses so people can remember these things
nslookup cnn.com name server lookup, gives you IP addresses
tracerout cnn.com row by row of communication between cnn and own server and routers in between
HTTP = hypertext transfer protocol
-can click on links and go to other web pages (language a web service and a web browser speak when talking with each other)
-language/protocol: set of rules 2 computers have agreed upon so one understands the other
Web browser sends this message: “GET /index.html HTTP/1.0” get, index, version of html wants to speak
Web server responds with a sequence of bytes that represent the /index.html file requested
-most in .html, .php is another
html is the markup code that make the aesthetics of the webpage come to life
HTML: application-layer protocol
HTTP: figures out requests/responses of what wanted
TCP/IP language or protocol used by computers to speak to other computers to ask B from A
Source code: HTML, Java, etc that make up page
Top-Level Domains (TLDs) .biz, .tv
DNS Records: Converts name to IP address
XHTML
mkdir public_html
web page is just a text file filled with stuff (and that stuff is html)
<> tags, any time you open a tag you must close it symmetrically
2 main parts: head and body
<html>
<head>
<title>My Ugly Webpage</title>
</head>
<body>
My Ugly Webpage
</body>
</html>
Standard Error Codes:
403: Forbidden
404: File Not Found
- D (directory) L (link)
2 – 4 owner permission: RW is read/write
5 – 7 group permission: R00 (read only)
8 – 10 world: R00 (read only)
chmod 644 index.html
Octal base, 8 possible values 0 – 7 (3 bits)
4 is readability
2 is write-ability
1 is exacutability
So 644 would be 6 for the first group (4+2 = rw0), 4 for group is r, 4 for world is r
-ls –al : shows all permissions for directory and files
Directory needs to be executable
chmod Standards:
PHP files:: 600
Directories: 711 (to access contents), 755 (to see contents)
Everything Else (XHTML, CSS, GIF, JS , JPEG,…): 644
chmod 755 . (The . represents the current directory)
<html>
<head>
<title>My Ugly Webpage</title>
</head>
<body bgcolor=”black”> (attribute added)
<center>
<div style=”color: white: font-size: 72pt:”> division of the page and can give it style (like a paragraph)
<h1> header 1 tag
My Ugly Webpage
</h1>
</div>
<img src=”manalrouge.jgp” /> img should end tag in same (need to give jpg chmod permission 644)
<br /> line break tag, / is to start and end tag
My Second Ugly Line
</center>
</body>
</html>
Need to provide browser with a hint of the version of HTML being used at top of HTML file
-for it to be valid, needs to adhere to certain specifications that the world has decided upon
-can let computer check the validity and show where problem is
<element attribute=”value”/>
<element attribute=”value”></element>
Format: Html, head, body (real “magic” happens in body tag)
HTML is not a programming language, it’s a mark-up language. No logical structures, just open and close tags and tells it aesthetically and structurally.
More sophisticated ways to make better web pages
<table border=”1”> 1 is borders, 0 is none
<tr> table row
<td>Student</td>
<td>House</td> table data
</tr>
<tr>
<td>David</td>
<td>Mather</td>
</tr>
</table>
CSS
Exercise more control over web page appearance than HTML
Font sizes look different in different browsers in HTML, no standardization. CSS is more specific.
<style type=”text/css”>
<!–no one will see this
Body
{
Background: yellow:
Border: 20px #000000 solid;
}
–> comment tag, used for “legacy reasons” to let people know the intent of the design, #000000 is the hex code for black
Web browsers render differently with where they put the body, etc cross-browser issues (Chrome, Firefox, Opera, Explorer, etc)
http://browsershots.org allow you to see screenshot of different browsers to see how it looks when loaded on it
Implementing columns (Structural), tutorials and other people’s sites are useful for learning
Form:
Anything you fill out that has text boxes, radio buttons, submit, etc
-noob mistake is to make changes to text file opened and think it’d change. Real data exists on server, text file is just copy
-source code for Google is compact, compressed pages to save bytes of data because it saves a lot of money for heavy traffic
-<form> stuffinbetween </form> in google, represents the form itself
<form action=”/search” name=f><table cellpadding=0 cellpsacing=0> etc, etc
Do not use “” around attribute values, means they are using HTML NOT XHTML (to save bytes on “”)
<form action=”/search” name=f> meaning there is a reference to “/search” and what that would do on its current server
<form action=http://google.com/search name=f> switch to specify that /search is on google’s server
<img src=”fake.gif” /> uploaded file on server of fake.gif google image
Languages like PHP and databases like MySQL create Web2.0 where it’s driven by the users (not just the blinking webpages like before…for example, hamster dance)
<form action=”/search” name=f> notice that /search is not a. html or webfile, but is instead a program written in PhP, Python, Pearl to run the actual search, name is the name of the variable
much of facebook is written in PhP
when browser and server speak: HTTP
written in: HTML
Firefox has plug-ins that help make webpages (useful for development)
Live HTTP headers plug in, shows the HTTP traffic going back and forth when moving between pages
Anything after ? in a url is a parameter, so variable=value&variable=value (variables set and defined in HTML file)
<input name=”btnG” type=”submit” value=”Google Search” />
“q” in the url for Google is “query”, or whatever you set in the submit field
+ represents a space
Reasonable limit for length of URL around a few hundred characters in case cannot handle buffer
<form action=”/search” name=f method=”get”> default
method = “post” used when you want to submit a lot of data
Forms:
Text Fields
<input name=”email” type=”text” />
Text Areas
<textarea name=”comments” rows=”24” cols=”80”>
Password Fields
<input name=”password” type=”password” />
Checkboxes
<input name=”save” type=”checkbox” />
Radio Buttons
<input name=”gender” type=”radio” value=”F” />
<input name=”gender” type=”radio” value=”M” />
Drop-Down Menus
<select name=”dorm”>
<option value=””></option>
<option value=”Mathews”></option>
<option value=”Weld”></option>
</select>
HTML file:
<body>
<form action=”register.php” method=”get”> that is the action that will happen when submit occurs. PhP is an interpretive language (instead of being compiled like C, it is ran through an interpreter and runs the source code itself)
Name: <input name=”name” size=”40” />
<br />
Dorm: <input name=”dorm” size=”40” />
<br />
Email: <input name=”email” size=”40” />
<input type=”submit “ value=”Register!” /> this is an input type of “submit” that creates a button called “Register!” and executes the form action “register.php”
</form>
</body>
PHP
Manual: http://us3.php.net/manual/en/
Array Functions: http://us2.php.net/array
-Can sort, iterate, intersect, splice, reverse, add, subtract, pretty much everything available (easier than C)
-For, while, do while, foreach
Register.php, don’t need to chmod PhP code that way you can protect your intellectual property
<html>
<body>
Hello, <? print($_GET[“name”]); ?>. You live in <? print($_GET[“dorm”]); ?>. Your email is <a href=”mail to: <? print($_GET[“email”]); ?>.”> <? print($_GET[“email”]); ?></a>
</body>
</html>
Can shorten to <? …. ?>
<a href=http://cnn.com/>Link to CNN </a> a hyperlink reference
Super Global Variables:
$_COOKIE
$_EVN
$_FILES
$_GET will have any parameters user provides
$_POST will have any parameters user provides
$_REQUEST
$_SERVER
$_SESSION
CAN INTERMINGLE HTML WITH PHP, since it’s associative you use the variable name you gave to look up the information that is super global and already stored
<?php
print(“<pre>”) will print in certain format
print_r “$_GET”; prints recursively
?>
Associative Array (maps not numbers to words, but words to words) or Hash Table
PhP automatically creates this for you instead of having to create it yourself
<? for ($i = 0; $i < 5; $i++)
{
print(‘<img alt=”” src=”malanrouge.jpg” />);
print(‘/n”);
}
?>
Will print image malanrouge 5 times with a space in between for source code purposes
LOG IN:
-Server is capable of remembering if logged in
-IP addresses are NOT static, webservers need to plant reminders on your hardware called COOKIES to remember who you are
-cookie tiny piece of memory implanted in browser to remind itself later who you are
Cookie: PHP can plant cookie using function called PHPSESSID=reallybigrandomnumberwithcharacters
-webbrowser holds the cookie, webserver remembers ids/settings/etc and corresponds that to that cookie for “quick access” (hand stamp metaphor)
#ffffff = white
<pre>
<? print_r($_SERVER); ?>
</pre>
$_SERVER -> ip address, pages requested, tons of stuff in super global variable
EXAMPLE OF LOGIN
<?
//start session
Session_start();
//connect to database
If(($connection = mysql_connect(“localhost”, “malan_real”, “12345”)) === FALSE
Dia(“Could not connect to database”);
//select database
If(mysql_select_db(“malan_real”,$connection) === FALSE)
Dia(“Could not select database”);
//were this not a demo, these would be in some database
Define(“USER”,”jharvard”);
Define(“PASS”,”crimson”);
//if username and password were submitted, check them
If (isset($_POST[“user”]) && isset($_POST[“pass”])
{
//if username and password are valid, log user in
If ($_POST[“user”] == USER && $_POST[“pass”] == PASS
{
//remember that user’s logged in
$_SESSION[“authenticated”] = TRUE;
//redirect user to home page, using abslute path, per
// http://us2.php.net/manual/en/function.header.php
$host = $_SERVER[“HTTP_HOST”];
$path = rtrim(dirname($_SERVER[“PHP_SELF”]), “/\\”);
header(“Location: http://$host$path/home.php”);
exit;
}
}
?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional/………….”>
<html xmlns=”http:.//www.w3.org/1999/xhtml”>
<head>
<title> Log In</title>
</head>
<body>
<? If (count($_POST) > 0) echo “INVALID LOGIN”; ?>
<form action=” <? echo $_SERVER[“PHP_SELF”]; ?>“ method=”GET”> getting submitted to itself
Insert all the form that creates username and password
</body>
MySQL: Database Server
-Relational Database: store data in them by way of tables (rows and columns)
-examples: Sequel, Oracle, Access
Data types for MySQL: VARCHAR (variable char) = string
-concept of “keys”
-primary key, one of them has to be unique, id number or username, etc
-whatever one will uniquely identify rows in a table
-can also specify others to be unique, but primary key indexes created are the ones to be optimized to help make searches easy (creating best trees/tries/etc for searching)
SQL
Select, insert, update, delete
SELECT * FROM users WHERE user=’jharvard’
<?
//start session
Session_start();
//connect to database
If(($connection = mysql_connect(“localhost”, “malan_real”, “12345”)) === FALSE
Die(“Could not connect to database”);
//select database
If(mysql_select_db(“malan_real”,$connection) === FALSE)
Die(“Could not select database”);
//if username and password were submitted, check them
If (isset($_POST[“user”]) && isset($_POST[“pass”])
{
//prepare SQL
$sql = springf(“SELECT * FROM users WHERE user=’%s’”, mysql_real_escape_string($POST(“user”)));
//execute query
$result = mysql_query($sql);
If($result === FALSE)
die(“Could not query database”);
//check whether we found a row
If(myql_num_rows($result) == 1)
{
//fetch row
$row = mysql_fetch_assoc($result);
//check password
If ($row[“pass”] == $_POST[“pass”]) checking if the row returned pass field is equal to what the user posted in the pass field defined
{
//remember that the user’s logged in
$_SESSION[“authenticated”] = TRUE;
//redirect user to home page, using abslute path, per
// http://us2.php.net/manual/en/function.header.php
$host = $_SERVER[“HTTP_HOST”];
$path = rtrim(dirname($_SERVER[“PHP_SELF”]), “/\\”);
header(“Location: http://$host$path/home.php”);
exit;
}
}
}
?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional/………….”>
<html xmlns=”http:.//www.w3.org/1999/xhtml”>
<head>
<title> Log In</title>
</head>
<body>
<? If (count($_POST) > 0) echo “INVALID LOGIN”; ?>
<form action=” <? echo $_SERVER[“PHP_SELF”]; ?>“ method=”GET”> getting submitted to itself
Insert all the form that creates username and password
</body>
MySQL Functions
http://dev.mysql.com/doc/refman/5.0/en/functions.html
JOIN
SELECT Employee.Name, Orders.Product
FROM Employees, Orders
JOIN Employees.Employee_Id = Orders.Employee_Id
Race Conditions:
Operation need to be atomic, meaning 2 processes need to happen at the same time or not at all
-could potentially give money from ATM and then not update database or vice versa
Atomic Queries:
INERT INTO tab le(a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1
MySQL support Transactions (InnoDB)
-write SQL statements and all that you want
-only once you write “commit” command, all happens together or not at all
-db is the engine you need if you want atomic sequences to occur
Stock Finance:
Executed an HTTP request to finance.yahoo.com and queried the information for the stock price and returned the value into own website
Misspellings
$dictionary(strlower($word))=TRUE checking if word is in dictionary
$ = variable in PHP, otherwise very similar to C
-higher level language, so it automatically makes hash tables and allocates the memory, etc
At start of PHP file:
#!/usr/local/bin/php shebang (sharp + bang) hint to OS to what interpreter it should use to interpret the file, since it may not be compiled
<?
RSS: (RDF SS or Resource Description Framework Sit Summary, often dubbed:Really Simple Syndication)
http://cyber.law.harvard.edu/rss/rss.html
file format that is XML (extensible markup language)
text file that contains information you might want to syndicate
integrate other websites into own (like using NY Times info)
Querying Google Maps for News Article at given location:
- User goes to specific zip code
- Back end PHP uses zip code and converts to coordinates from SEQUEL database
- Uses that to query MSNBC news and get back RSS feed of news for that place
- JavaScript to put links on screen
JavaScript:
Interpretive language, not executed by servers but by clients
Embed in files on server, shipped raw to user’s browser that needs to support
To embed in webpage, put this in head tag in HTML (called script tag)
<script >
// <![CDATA[
…
// ]]>
</script>
OR can store in separate file and just reference that file in the head tag (seen below), keeps it cleaner
<script src=”file.js” type=”text/javascript”> </script>
break, conts, continue, do while, for, for in, for each in, function, if else, return, switch, throw, try catch, var, while, with
JAVASCRIPT TO PUT FOCUS ON USERNAME OR PASSWORD FOR LOGIN
focus() set focus like Access
<script type=”text/javascript”>
// <?[CDATA[
//get fields (set variables)
var username= document.getElementById(“username”);
var password= document.getElementById(“password”);
// put curser in username field if empty
If (username.value == “”) document is the webpage, forms is getting access to the form, login is the field and username is in there and set the value, very similar to access/vb
{
username.focus();
username.value = username.value; aesthetics reason to move cursor to right of already typed text since focus sets it to the beginning
}
//else put cursor in password field
Else
{
password.focus();
password.value = password.value;
}
// ]]>
Define a function in JavaScript (put this in the header portion of HTML)
Does error-checking client-side
<script type=”text/javascript”>
// <?[CDATA[
function validate() checks if login fields are not blank, correct, etc
{
if (document.forms.registration.email.value =””)
{
alert(“Email is blank”);
return false;
}
else if (document.forms.registration.password.value =””)
{
alert(“Password is blank.”);
return false;
}
Return true;
}
]]>
Can mimic the behavior of browsers by sending HTTP requests
-doing error checking on client-side means that they can steal the source code, put it into text file, and change the process.php part and put in browser address in front
-then open that version and can run it
-will want to add additional error-checking in php file since that is not released freely
Eventhandlers in xhtml:
<form action=”process.php” method=”get” name=”registration” onsubmit =“return(validate(this);”> when submit portions runs, will do whatever is in “” meaning function validate will be ran (false will prevent anything from happening, true will continue to process)
By using “this” (which SHOULD NOT BE INCLUDED IN CODE BUT IS IMPLIED) you do not need to specify WHERE you are, because “THIS” is the location you are at (document.form.registration)
function validate(f) notice that document.form.registration has been called “f” and is used below
{
if (f.email.value =””)
{
alert(“Email is blank”);
return false;
}
else if (f.password.value =””)
{
alert(“Password is blank.”);
return false;
}
Else if (f.password1.value != f.password2.value) != is the same as <>
{
alert(“Make sure passwords match.”);
return false;
}
Else if (!f.agreement.checked) bang checked, meaning NOT checked
{
alert(“You must agree to our terms and conditions.”);
return false;
}
Return true;
}
]]>
Java is an object-oriented language like VB to group together like-objects (functions, forms, properties/variables – like disabled/enabled)
<br /> space in program
I agree to the terms and conditions: text
<input name=”agreement” onclick=”toggle();” type=”checkbox” /> terms/conds checkbox
<br /><br />
<input disabled=”disabled” name=”button” type=”submit” value=”Submit” /> submit button (disabled but probably enabled from toggle() function DEFINED IN HEAD TAG of page on checkbox, defaults to disabled)
</form>
</body>
</html>
Regular Expressions (validation of user data)
document.forms.registration.email.value.match(/.+@.+\.edu$/) regular expression passed
. = any character
+ = one or more of whatever
@ = @ symbol
. = any character
+ – one or more of whatever
\. = a literal .
edu = literal “edu”
$ =syntax to make sure END of string matches that exactly
^ = at beginning means start matching at front of string
All of this is sandwiched between /^whatever$/ = this means entire string MUST match pattern exactly
Global Objects
Array, Boolean, Date, Function, Math, Number, Object, RegExp, String
-associated with each object is a bunch of functions
http://developer.mozilla.org/core_javascript_1.5_reference@global_objects
-don’t implement personal functions, use other functions made (like sort)
Objects:
var obj=new Object();
var obj = {};
obj.key = value;
obj[“key”] = value;
var obj = { key: value }; {key : value}; {key : value};
Arrays:
var a = new Array();
var a = []; same as above
a[0] = “foo”
a[1] = “bar”
Vectors is an array that automatically resizes itself
a[a.length] = “foo”; grows array implicitly
Event Handlers:
onblur, onclick, onchange, onfocus, onkeydown, onkeyup, onload, onmousedown, onmouseout, onmouseover, onmouseup, onresize, onselect, onsubmit
http://www.w3schools.com/jsref/jsref_events.asp
AJAX used to instead reload new webpage, makes better user experience (like being able to drag/drop, or FB photos sliding), send fewer bits across
Ride coattails of Java success when Netscape launched this by calling it JavaScript and is completely different
Support for Objects is helpful, object in JavaScript allows you to associate a key with a value (like Associative Array from PHP/hash table)
CSS Properties (cascading style sheet properties, aesthetics of sheet)
http://codepunk.hardwar.org.uk/css2js.htm
className
style
-can be manipulated using JavaScript, CSS is like font name/font size/ font color
Attached this to the div element, need to put timeout/sleep functions by starting a thread
<div name =”blink()”
Blink
Function blinker()
{
var blinks = document.getElementsByName(“blink”); that way if multiple things have same name, will carry over all array
for(var i = 0; I < blinks.length; i++);
{
If (blinks[i].style.visibility == “visible”)
blinks[i].style.visibility = “hidden”;
else
blinks[i].style.visibility = “visible”;
}
}
SQL Injection Attacks
-put yourself at risk for people corrupting/deleting data
-want to hack into your server
mysql_queryl(sprintf(* SELECT uid FROM users
WHERE username=’%s’ AND password=’%s’ “,
$_POST[“username”], $_POST[“password”]));
Try to inject SQL into username and password fields malicious things (like update/delete)
Can put in 12345’ OR ‘1’ = ‘1’ into password field to get logged in
SHOULD BE (use this to check what users type in ALWAYS):
mysql_queryl(sprintf(* SELECT uid FROM users
WHERE username=’%s’ AND password=’%s’ “,
mysql_real_escape_string($_POST[“username”]),
mysql_real_escape_string($_POST[“password”])));
DOM – Document Object Model (how HTML looks in memory)
Simplest web page you could make:
<!DOCTYPE html PUBLIC
“-//W3C/DTD XHTML 1.0 Transitional//EN”
http://www.w3.org/TR/xhtml1/DTD/html1-tranditional.dtd>
<html xmlns=”http:www.w3.org/1999/xhtml”>
<head>
<title>My title</title>
</head>
<body>
<a href=””>My link</a>
<h1>My head</h1>
</body>
</html>
DOM: hierarchy based on html code from above
This DOM tree is built up in memory based on how a web browser sets, but now these are to be manipulated AFTER being built while in memory for client/user-side manipulation of webpage (AJAX)
Asynchronous JavaScript XML
So you don’t have to reload entire page, just grab additional content when needed
Web-based software and client-side software more interactive
Made possible by XMLHttpRequest object
LEFT: CLIENT, every second JavaScript function (1) is called, creates an XMLHttpRequest of HTTP parameters (2), sends HTTP Request (3), Server does stuff/queries/gets data and sends it back to user (4), client-side takes data and integrates it into the page by JavaScript calls (5)
RIGHT: WEB SERVER (LIKE GOOGLE)
XMLHttpRequest Object is standardized:
Methods: (like a library of functions, but in object-oriented programming functions are methods)
open(method, url)
open(method, url, async)
open(method, url, async, user)
open(method, url, async, user, password)
send()
send(data)
abort()
getALLResponseHeaders()
getResponseHeader(header)
setRequestHeader(header,value)
What this XMLHttpRequest Object looks like in memory
Be rendered first time, but can make subsequent queries without reloading webpage because it’s the webpage itself that is making a new HTTP query every x seconds for just specific information to be updated
Pre-fetch data in order to give seamless transactions for user interface (cost: uses more RAM memory-client’s memory, and costs bandwidth—every byte change costs money so limit number of bytes sent)
Using AJAX increases load on server, since it hits it every second
Properties available to be checked about XMLHttpRequest
<form onsubmit-“quote() return false’”> will run quote() function define in header and return false to prevent user from submitting HTTP request
function quote()
{
//instatiate XMLHttpRequest object
try new thing for error handling, instead of a bunch of if/then for all instances
{
xhr = new XMLHttpRequest();
}
catch (e) catches error and tries to create object in another way
{
xhr = new ActiveXObject(“Microsoft.XMLHTTP”);
}
//handle old browsers
if (xhr == NULL)
{
alert(“Ajax not supported by your browser!”);
return;
}
//construct URL
var url = “quote1.php?symbol=” + document.getElementbyID(“symbol”).value id=”symbol” for user input field in html code
//get quote
xhr.onreadystatechange = handler; invoke this whenever you get a response
xhr.open(“GET”,url,true); want to open a request to that url and true is to make it asynchronous (return right away and do function when ready)
xhr.send(null); send the response nothing else to say
}
function handler()
{
//only handle loaded requests
if (xhr=readstate == 4) 4 means readystate is loaded (see picture below)
{
if (xhr.status == 200) 200 meaning OK
alert(xhr.responsetext);
else
alert(“Error with Ajax call!”);
}
}
quote1.php
<?
//get quote
$handle = @fopen(http://download.finance.yahoo.com/d/quote.csv?s=o)’)&f-e1l1,”r”) whatever the url for downloading csv stock numbers, can use fopen in php to open urls and download the stock quote and read the value from CSV by pretending to be a browser
If ($handle !== FALSE)
{
$data = fgetcsv($handle); fgetcsv puts all info into array, CSV, comma separated variables
If ($data !==FALSE && $data[0] == “N/A” ) check to make sure it exists and first column is NA
print($data[1]); print first value in array
fclose($handle);
}
?>
(+ for javascript and java for cat, . for php)
DIV element says here comes a division of the page, gives an automatic line break (block element, verses in-line element)
<span id=”price”> allows GetElementById
Document,getelementbyId(“price”).innertHTML= xhr.response (refers to any XHTML embedded in between close and open tag, so make that equal to the response given from xhr)
AJAX sends request but doesn’t update page until gets data returned, so it doesn’t look like anything is happened to user, which is why there are progress bars, gif is a rectangular image of a slideshow and the illusion of animation
//pretend server is slow
sleep(5);
!== FALSE php does not have strong data typing, this is how you check values AND data types are identical (===) or not (!==)
Googe Maps API -> Applications Programming Interface, go to Demos for mash-ups
-give you Developers key
Python, PHP, Ruby –> not as high performing as C, but doesn’t matter
Write scripts in php to grab data from CSV and massage
INTERNET:
- IP takes information you send and creates a packet and labels it with an appropriate name and type (senders address, receivers address) and gives address for Proxy Server if going to Internet
- Launched to LAN, (IP packets, Novel packets, Apple packets), local router reads address and puts in on appropriate path (head to Router Switch if heading for corporate internet)
- Router Switch is digital internet pinball wizard, routing them in best way to get to where they need to go (send to firewall)
- Network interface picks up packets to be sent to next level (could be sent to proxy, used as a middleman to lessen load on internet connection for server and for security reasons)
- Proxy opens packet and looks for web address (url) and if acceptable, packet is sent on to the firewall (corporate/manager guidelines stop the packet from being delivered if url is unacceptable)
- Firewall prevents nasty things from internet coming to intranet or sensitive information from being sent out to the internet, if approved it puts onto bandwidth which is the path it takes to get to server—telephone line, satellite, wireless, transatlantic cable
TCP/IP Packet – transmission control packet (TCP) and internet protocol (IP) work together to deliver data requests
-IP is the packet of information consisting of header (which has packet’s destination and optionally the routers to use) and the body (the data IP is transmitting). Need to break up data into multiple IP packets
-TCP detects problems of requests retransmission of lost packets, deletes duplicated packets, rearranges packets delivered out of order and passes them back to the application that requested them
ICMP Ping Packet
UDP Packet
The Router
The Router Switch – digital internet pinball wizard, routing them along the way
Network interface picks up packets to be sent to proxy, used as a middleman to lessen load on internet connection for server
Proxy opens packet and looks for web address (url) and if acceptable, packet is sent on through the internet (corporate/manager guidelines stop the packet from being delivered)
Corporate Firewall prevents nasty things from internet coming to intranet or from being sent out to the internet, puts onto bandwidth which is the road it takes to get to server—telephone line, satellite, wireless, transactlantic cable (IP sends replacement packet if does not get response that it was delivered)
Ping of Death – special version of normal request ping to mess up unsuspecting hosts
preg_split (match regular expression described and split them into an array) “/\s+/” is looking
implode(“,” $symbol) put it back together
asynchronous javascript and xml (AJAX)
XML you get control of tags (instead of XHTML and php that have set tags)
.htmlentities -> rewrites special characters (& becomes & in order to prevent parsing errors)
php creates httprequest and is returned an xml and web browser formatted it on screen by using AJAX function to read it
onsubmit=”quote(this.symbols.value).php” runs function quote(symbols) passing in this form’s symbol value
xhtml = “” set as string
request.responseXML.getElementsByTagName(“quote”); looks at XML file and grabs every tag named “quote”
for (var i=0; i<quotes.length;i++)
{
var quote = quotes[i];
xhtml +=”<b>”; += is to append to string
xhtml += quote.getAttribute(“name”)
xhtml += “ (“ + quote.attribute(“symbol”) + “)”
xhtml += “</b>”
xhtml += “<br /><br />”;
xhtml += “Price: $” + quote.getElementsByTagName(“price”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value
xhtml += “<br />”
xhtml += “High: ” + quote.getElementsByTagName(“high”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value
xhtml += “<br />”
xhtml += “Low: ” + quote.getElementsByTagName(“low”)[0].firstChild.nodeValue; creates and array of the children tags “price” under current “quote”, since there is only one node go to [0] spot of node and return first its first Child’s value
xhtml += “<br />”
}
document.getElementById(“quotes”).innerHTML
<div id=”quotes”></div> putting in there the dynamically created string of xthml
//output response to debugging window
document.getElementById(“debugging”).value = request.responseText;
request.onreadystatechange = function(); { replaces “handler” function (LAMBDA FUNCTION)
put stuff that you’d normally put in “handler” function (can access local variables inside code so you don’t need to use global variables)
}
Google APIs written in JavaScript
Look through documentation, lots of methods already available w/ google maps:
map.addControl(new GMapTypeControl());
map.enablescroolWheelZoom();
//Add marker
var marker = new GMarker(point);
map.addoverlay(marker);
//associate info window with marker
GEvent.addListener(marker,”click”, function() { will call function when user does something of interest—this is LAMDBA function so you can write following and will only be evoked when necessary
//prepare XHTML
var xhtml = “<b>Science Center</b>”;
xhtml += “<br /><br />”
xhtml += “<a href=’http://en.wikipedia.org/wiki/Havard_Science_Center’ target=’_blank’>”;
xhtml += “Harvard Science Center”; link name
xhtml += “</a>”
//open info window
map.openinfoWindowHtml(point, xhtml);
Google Maps make points
//instantiate map
Var map = new GMap2(document.getElemeyById(“map”);
// prepare point
var point = new GLatLng(37.4419, -122.1419); Google’s lat/long coordinates
// center map on Science Center
Map.setCenter(point = 13);
// prepare Ajax call
var request = GXMLHttp.create(); figures out how to create request based on browser type, otherwise need to use try/catch to determine how to code, GXmlHttp is a library for AJAX requests provided by google to handle this for you
Debugger for Firefox (great for developing)
HTTP/JavaScript interpreter, compiler…assumptions of what must be happening
Underneath the Hood (Software):
- Pre-processing: look for sharp symbols
- Compiling: taking source code and outputting Assembly language
- Assembly code “assembled” into 0/1s
- Linking compares 0/1s to other 0/1s from other functions/libraries called
- Executing puts together 0/1s into one executable file that have been linked and run those bits
Shared/Dynamically linked libraries (dll) are already compiled since they are commonly linked and it would be wasteful to repeatedly break it down
Register: tiniest unit of memory that exists in a CPU (32), only on registers that things can happen
X + Y = Z, x is stored in a register (plus whatever function is to be performed like the addition function), y is stored in another, z is the sum and stored in a third register
Registers: the tiniest pieces of memory where calculations actually take place (where computer does its thing)
Program counter: computer equivalent of the variable i (keeps track of where currently executing since it hops around)
Memory (RAM): storage for program code and data
Control Unit/ Datapath: lower level detail, it combined with datapath allows arithmetic to occur
I/O Devices:prints and monitors that actually talk to the user
x86 Instruction Set is the Assembly Language
.file “hello.c”
.section .rodata read only data
.LC0
.string “hello, world\n”
.text
.globl main
.type main, @function
main:
pushl %edp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
ADD = Arithmetic Addition
Usage = ADD
Modifies Flags: AF CED OF PF SF
286/386/486 old processors (green screen). Takes two clock cycles to add one register to another on 286, 1 on 486 (reg,reg)
Clock rate is the frequency of the oscillator crystal, and shows the number of cycles that can happen in a second (so 1.5 GHz is giga-hertz, or 1.5 billion cycles a second), the chart above shows the number of cycles necessary to computer the ADD function
Life After 50 (search for materials pertaining to below):
- Intro to CS II
- Privacy and Technology
- Data Structures and Algorithms
- Visualization – use of computers and technology to present in sociologically compelling ways
- Systems Programming and Machine Organization – close to hardware/hacking/cryptology
- gdb -> download this and use it if using C
- Introduction to Formal Systems and Computing – theory of computation, what cannot be solved by computing due to fundamental limitations
- Computing hardware – assembly language, compilation
Panel discussing using MySQL with high volume of users and challenges
Questions: Number of MySQL servers used (database servers), number of DBA (database administrator, tells developers which servers to insert data into and grab data from), number of memcached servers, operating system, number of web servers
# of mySQL servers
MySQL (1 master servers, 3 slaves), all writes to master, all reads to slaves. 1/10th of DBA, 2 memchached servers, Fedora (same version of Linux used in Cloud), 2 web servers
Sun (Sun makes computer hardware and software) 4 servers, 1.5 (DBA), 8 memcached servers, OpenSolaris (own Sun’s), 160+ web servers
Flickr 166 servers, 0 (DBA, normally 1 guy just quit), 14 memcached servers, Linux (some type), 244 web servers
Fotlog 37 servers (1 DBA), 40 memcached servers, Solaris 10 (also by Sun), 70 web servers
Wikipedia N/A, everyone wears different hats, 79 memcached servers, Fedora/Ubuntu (Ubuntu is Linux), 70 + 200 web servers
Facebook 900 masters, 900 slaves (30,000 databases) 2 DBA, 805 memcached servers, Fedora/Redhat Enterprise Linux (pay for Redhat and get support), 1200 web servers
Youtube “I can’t say”, 3 DBA, “I can’t say”, SuSE (another flavor of Linux), “I can’t say”
memcached server software to cache output of php files, using database plus code to output xhtml will keep to prevent from regenerating same thing and instead use local copy from memcacheD
Can have clusters of servers, web servers serve up static content or turn around to database and ask for data for user
How to make start-up
- Buy domain name at registrar (GoDaddy $9.99)
- Need to tell them your DNS: Domain Name System (the ip address of your web host)
- DNS: Servers that translate IP address into domain name and vice versa
- Type in address that belongs to your web host (Top-Level Domains TLDs):
- .biz
- .com
- .edu
- .gov
- .info
- .int
- .mil
- .name
- .net
- .org
- .aero
- .asia
- .cat
- .coop
- .jobs
- .mobi
- .museum
- .pro
- .tel
- .travel
- .us
- …
- DNS also has other information
- Informs who mail servers are (google for more info)
-
- Hierarchy to DNS as well
-
- Web Hosting providers: DreamHost, ServInt
- Start uploading FTP, sFTP and make web page
- Can use tools to benchmark your servers
- Take software to simulate requests to pretend to be browser to figure out how many hits per second that could be handled (PHP to simulate httpRequests to get information)
- Business server for $5,000 could get decent server to handle 2 to 3 thousand hits per second
- Will hit bottlenecks:
- Bandwidth, if running out of your home you are limited by your speed (if your ISP even allows it), limit on connection, Web Host can specify how many bits per second you can spit out, or cap at a total per month (MBs or GBs)
- CPU limits (Linux TOP reports what is going on in server), how much CPU/RAM/disc is being consumed has limitations.
- Easiest solution is to throw money at problem, but will pay a premium (2 times costs 4 times as much)
- “slashdotting” web site will get pounded with utilization that was not expected
- Vertical Scaling – buying better and better – problem is you max out and still don’t get desired results
- CPU
- Core, L2 Cache,
- Disk
- PATA, SATA, SAS
- RAID
- RAM
- CPU
- Horizontal Scaling – buying/renting a lot of same cheaper things (redundancy) but it’s good because if one fails you have others to fall back on BEST METHOD
- Data Centers
- Lots of air conditioning, hardware, just holds all the hardware for servers linked together
- Data Centers
How do you make use hardware when you have multiple servers and CPUs (core processing units), some PCs have multiple cores (like dual-core processors)
-use virtual machines to link these together (Virtualization)
What can you do to improve the performance of your own code?
- Use conservative calls to “expensive” functions like malloc because it needs to make a call to the operating system
- Use better hash tables, data structures and algorithms (worrying a little less about hardware actually executing it)
- Tightening up loops, moving around code, make sure unnecessary calls are not made
- Bitwise operators to get at lower level bits instead of using 32-bit characters/ints
PHP Accelerating:
PHP is slower because it’s interpreted, not compiled like C. Analyzes top to bottom, left to right, and then translates.
Rather than make server do this over and over (interpreted “Just In Time”/compiled not by you, but by web server or php interpreter) cache the results –Opcode Caching
Code Optimization – can compile using different optimization option settings
One server won’t cut it anymore even with best software and hardware, you need to have multiple servers. Need to present this to rest of the world as one server (instead of telling people to go to different address) by using DNS. Set up two identical web servers, put php files/jpegs/gifs on both, but have different ip addresses. Go into DNS server (web host-however done) and inform that the ip address of CNN.com is not just 1.2.3.4 but also 1.2.3.5, 1.2.3.6, that you can go to via TCP/IP to find this “CNN.com”
nslookup cnn.com in linux box to look up name server and ip addresses
could go to http://157.166.226.25/ to get cnn.com (or http://157.166.226.25/index.html
Problem with Load Balancing at Layer 4 (DNS): one server goes down out of 1,800 means that 1/1800th of users will go to dead server (since DNS is still giving address). So you need to take the ip address off the list, but cache DNS answers for 60 seconds, 4 hours, 3 days, etc (DNS propagation problems), so 1/4th of users might have cached the answer for up to x hours and will continue to go to dead ends. So there must be a better way than just removing it from the list
Better way: Layer 7 is at the http or application layer, higher than 0/1s at the TCP/IP layer
Embed location of server in actual host name of address, so when it’s resolved it will be sent to different server via DNS. All of them could resolve to the same IP address, look at the header and send them to different server, so in own code could send to different places
Sticky Session
Shared Storage:
php uses disc space to cache settings by implanting cookies
-associated with browser by way of cookie to make session-object and stored server-side
-so if you store things at server number 1 and store session file there, and you go to server 2, you need to either remember the info is on server 1 or figure out a way to do this
Multiple ways:
CS50 design:
-make (/scratch) front-end server and use filesharing to expose all other backend servers
-networks are slower than keeping everything locally (performance impact)
-single point of failure at front-end VM
-even if 1 datacenter goes down that has front-end VM, going through Amazon that have multiple datacenters (A, B, C) can move to different geography in 10 minutes
-design decision with calculated risks
Hardware Load Balancers: Really fast pieces of hardware that route to different servers to balance load (companies can spend $10,000-$100,000+ on these)
Caching:
Serving up HTML pages is faster than interpreting, executing, spitting out pages from php files
fopen and spit out bits
craigslist = .html ending, design decision was to take user’s input and generate once a .html post and if changed they delete the file they overwrite it. Delay for posting ads because it’s a free site and design decision. HTML cache with all ads (are reads more important, or are writes?)
Config file MySQL for receive request for SELECT statement, remember the results for a few minutes. Some of the features are available and free depending on what you use
Open source, facebook uses it, php code is slower,
Speed: avoid database by keeping on disc, avoid disc by keeping in cache (memory)
Databases have different ways of storing data
HEAP (Memory) MySQL table that only exists in RAM (so once you shut down server then it’s gone. Good for temporary databases)
Replication: Master – Slaves
Having to tolerate lots of requests and inserts per second
Or pay twice as much for additional masters to prevent single point of failure
Lag time and consistency issues with master databases not updated yet
Fancier set-up:
Partitioning:
Clustering (RAID xor, n+1 copies of data, make n+1th the xor result of everything else, and if you lose any of the n’s you can recover them thanks to bitwise operator xor):
CS51: Intensive Introduction to CS II
Transition from being able to program, to computer science, so you understand all the aspects
Look at things from the right perspective, write elegant code, be understood and easily modified
New kinds of algorithms that cs is engaging to transform, solve problems in seconds. Recognize easy-looking problems that are actually hard
Programs and Efficiency: Goal is simple, readable, extensible code
Formal proof that there is no buffer overrun in Windows: be able to prove that software does what it says it does definitively
Languages (APIs): SQL, Regular Expressions
Learn key concepts like enclosures, objects, patterns in software, etc more important than what language you currently know
Correct language is important, parallelism is important in today’s society to choosing code that does this is important
Not 10 minutes, but might take until the end of the universe
Easy way 1: Try every possibility (exhaustive search), in 7 person network it wouldn’t take long, but if it were larger it would take really long time
Easy way 2: Simply send it to someone who hasn’t heard the news (pick a neighbor), maybe make it smarter by sending to someone with a lot of friends since they are popular
Sometimes you find out that the news can’t be shared. They have no friends.
The solution is to use the “greeting” method by sending to the right neighbor with a little bit of cleverness. But for the other, you can’t do better than the exhaustive search option.
Conversation Problem: Can be done with O(n) time
Notification Problem: No one knows how to do better than exhaustive search, one of the open problems (only hearing news once)
Pick 25 people who have most friends
Exhaustive search: try EVERY combination of 25 and see which has combined total of largest
Greedy algorithm: give to most # of friends
Problem: friends are probably shared/overlap
CS61 Systems Programming
Introduction to Machine Organization (how computers work on the inside), C programming experience or CS50 prereq
How registers work, compilers work, how processors work, how memory works
Understand what affects performance of programs actually running, how it accesses memory and how they interact with each other
Caching and memory performance, makes things running faster, use code that makes good use of cache
Blue screen of death
What was going on in memory of kernel when crashed
Segmentation fault: processor has raised an exception by accessing a bad piece of memory, core dumped means it left a file in the directory called “core” that contained the memory image at the time it crashed, would be able to debug program and find source of problem if you can read it
gdb can read in core files
Want to look at the assembly to understand what went wrong (dig below C code) and see what happened at machine. Use disassemble command to see what processor is doing down inside as result of what was compiled in C down to assembly
Maybe not be able to write assembly, but should be able to read it. Find tutorial on assembly language
mdw shell is executable file produced by compiling C code. If you can read assembly language you could understand what it’s going on
The latin-looking characters are hexadecimal translated into ASCII
This is hexadecimal, not very helpful
gdb disassembler shows this x86 assembly language
That’s where you are able to get the user input for password, put it into a register for comparison
So translate into ASCII at that point to get password: takecs61
Co-inventor of UNIX (OS)
Open source so anyone could see this hack and find it
So since he wrote the c compiler, he hacked it to add the backdoor code into the executable and deleted the original login.c
Source code for c compiler was sitting around, so he would have to make the c compiler when compiling itself insert the backdoor into itself and delete the original c compiler
CS 171: Visualization
Conveying information through visual representations
William Playfair invented the concept of bar, line and pie charts to make sense of data
Not only want to look at information, but want to interact with it – zoom, filter, customize
Tree map: represents companies and shows market change between close of market each day
Visualization: better than looking at rows/columns of data to communicate information, answer questions
Best way to produce mash-ups to show good information
Visualization isn’t telling you the whole story: not telling you the population
Shows by county, but still doesn’t take into account population size of specific county
Shaded according to population density
Make sure conveying right information in appropriate ways
Can uncover patterns when looking at visual of data (Barry Bonds increase in batting average during allegations of steroid usage)
Word tree: size of word larger based on number of times used. Deposition, suggests coached for deposition for repeating words
Wordles: way of representing most used words in speeches (Obama’s acceptance speech)
Obama’s acceptance speech
Program in Java, using Python scraping to implement own web crawlers
Data Models: scraping, web crawling
cs171.org
LOLCats RSS feed for website
Recursion: function calling itself over and over again
Heap is memory you can always access during program, stack is segmented into frames
a.out this is awesome what is in argv[2]: is
a.out this is awesome printf(%c,*argv[2]+2), would give seg fault (accessing memory outside array)
compling: changing source code into object code (assembly then machine language)
preventing memory leaks: free()
malalalala.com
state the number 17 in binary: 10001
http: hypertext transfer protocol
CS124: Problem Solving Tools
Checkbook Financial arbitrage (cryptographic stuff to protect checkbook transactions)
Writer interface to pull information off google learn algorithm to calculate shortest paths
How does google find duplicate/near duplicate documents and collapse
How to find optimal strategy for games like RPS