Let’s start with recapping the Udacity and other classes I’ve taken throughout my self paced learning journey. I’m going through a tough time and I need to remind myself of my wins.
- SAS Programming
- Introduction to Computer Science (blog notes, Udacity course)
- Design of Computer Programs (blog notes, Udacity course)
- Programming Languages (blog notes, Udacity course)
- This is my favorite because it is taught by the SUPER INTELLIGENT and totally dreamy Westley Weimer)
- And OMG he teaches at University of Michigan in Ann Arbor!!! That is literally only a few hours away from me!)
- Introduction to Statistics – Partially Complete (blog notes, Udacity course)
- Applied Cryptography – Partially Complete (blog notes, Udacity course)
- Version Control with Git (blog notes, Udacity course)
- Shell Workshop (blog notes, Udacity course)
- Intro to HTML and CSS
- Intro to Relational Databases (covered in this blog and next)
Next up I want to revisit and dust off my knowledge with relational databases by taking Udacity‘s Intro to Relational Databases. About 12 years ago I took beginner and advanced courses in SAS programming, which is a statistical programming language that was heavy on databases and SQL. That said, it’s always a good idea to revisit training, so I’m definitely exploring this opportunity to refresh my memory on good ol’ SQL (aka Structured Query Language)!
Right off the bat the course gave a great overview of what it’ll cover. It’ll start with relational database information, SQL, and then how to use python to fetch/write to the database on a web app!!! Below are a few screen shots from the video that explain what a database is, and why it is preferred over a flat file like text, XML, JSON.
Mainly, the databases are faster and allow for multiple users to edit the data simultaneously without overwriting each other’s changes. They are have flexible query languages with options to join and aggregate data, and allow constraints to protect the data itself. All of the databases store the data in table format.
The next part of the lesson reviewed what a table is, data types, how tables function, what a query does, unique keys, and how to join tables. Since I’ve been working with datasets for about 10 years now in my job along with the SAS Advanced Programming course, I don’t feel a strong need to document this information. If you’re bummed, then go take the class yourself. 🙂
Next lesson we finally get into using SQL in python on a web page to fetch data. NEAT! First up, some clarifications on data types and their name equivalents in python:
Also important to remember: text and dates must be surrounded by single quotes (‘text here’), otherwise the date will be read as and integer. For more on data types in SQL, see this reference guide.
- Text and string types
- text — a string of any length, like Python str or unicode types.
- char(n) — a string of exactly n characters.
- varchar(n) — a string of up to n characters.
- Numeric types
- integer — an integer value, like Python int.
- real — a floating-point value, like Python float. Accurate up to six decimal places.
- double precision — a higher-precision floating-point value. Accurate up to 15 decimal places.
- decimal — an exact decimal value.
- Date and time types
- date — a calendar date; including year, month, and day.
- time — a time of day.
- timestamp — a date and time together.
The course is starting to get into the syntax of the SQL statement, and while I’m very familiar it’s a good thing to brush up on the lingo the cool kids use.
It’s also a great time to be reminded of the proper syntax of SQL in python, since each language utilizes SQL in different ways. For instance, != is the proper not equal sign in python and <> is not used like it is in VBA. Also important to note is there are multiple ways to write a where statements (the one I used in the quiz was the bottom one listed in the picture below).
Showing a list of tables is also different for each database program being utilized, which makes it hard to identify all the information in the database itself.
Most modern languages now have the ability to query introspective data, but SQL was made a long time ago and does not have a standard way to do this.
There are also SELECT clauses such as Limit/Offset, Order By, and Group By. Below are the notes and problem set for reference on how these work to solve problems.
So the next question is why use SQL when you can leverage Python operations to get to the same results? Below is a picture mapping the different SQL clauses to Python code that could be used instead. The biggest advantages for doing this in the database are speed that the operations can be carried out, and less space utilized in memory and on the network.
Finally we put all of this into practice by making a query to count all the species in our tables. The count(*) as num to identify a count column is a great refresher for me since I’m definitely fuzzy on these details.
select count(*) as num, species from animals group by species order by num desc
To add rows to a table, the SQL statement would be INSERT instead of SELECT. Again, the syntax is something I’ve forgotten from not using SQL much these days, so these little reminders are greatly needed.
insert into animals values ('Gary', 'opossum', '2020-02-02')
Next up a simple join statement to filter by a column found in another table. Because the column names as the same in both tables, the table name qualifier before the column name is required to distinguish between the two data sources.
Select animals.name from animals join diet on animals.species = diet.species where food = 'fish';
Next was an important distinction between the “where” and the “having” clause. The “where” clause will only restrict the source table, but “having” will restrict the results table (if new columns have been added to the dataset then use “having” to restrict).
select food, count(animals.name) as num from diet join animals on diet.species = animals.species group by food having num = 1 select food, count(animals.name) as num from diet, animals where diet.species = animals.species group by food having num = 1
Even more practice with joins:
select ordernames.name, count(*) as num from animals, taxonomy, ordernames where animals.species = taxonomy.name and taxonomy.t_order = ordernames.t_order group by ordernames.name order by num desc select ordernames.name, count(*) as num from (animals join taxonomy on animals.species = taxonomy.name) as ani_tax join ordernames on ani_tax.t_order = ordernames.t_order group by ordernames.name order by num desc
SQL and other languages are one of those things where you really get the syntax down by using it regularly. So right now I’m more concerned with reminding myself of the concepts and names than I am with nailing the syntax.
The very last part of the session is to install a Linux VM (virtual machine) to run an SQL database server along with a web app that uses it. SO EFFING COOL!!! This is exactly why these intro classes are great, these little tips and tricks to add to my arsenal will greatly expand my use of the knowledge I have gained from my hobby learning over the years.
To set up the server, follow these instructions or see below for my super lazy abridged version:
- Installed Oracle’s Virtual Box (the VM software)
- Installed Vagrant (configuration and file sharing for the VM)
- Download the Vagrant configuration file
- There are several ways to do this, but I opted to use my GitHub Desktop to clone the repository
- Go into the ‘vagrant’ folder of the downloaded file and right-click and select “Git Bash here’ (this option is available because I installed Git during the Udacity class Version Control with Git (blog notes, Udacity course)
- Typed in the following command to begin vagrant installation: vagrant up
- if you get the error “vagrant command not found” then go back to step 2 and make sure vagrant is actually installed and the system has been restarted.
- If you still have a problem, you’ll need to add a new variable to environment variable path so bash to find the vagrant command
- After vagrant finished installing the Linux OS, I logged into it by running the command ‘vagrant ssh’
- To exit the server type exit or Ctrl-D into the shell
- To log back in type vagrant ssh
And with that we’re done with Lesson 2 and half way through the course! HUZZAH! Next up is using Python on our new Linux server to provided data to a web app, but we’ll continue that next time since this post is already waaaaaaaaaaaaaay too long. I’ll end with a picture of my cat Madam Crookshanks (smoooooshy) being held by my incredibly beautiful wife! <3