SQL Discussion

1. Introduction

Let’s begin by loading in our packages and our connection to run SQL

library(rmdformats)
library(RSQLite)
library(DBI)
library(knitr)
db <- dbConnect(RSQLite::SQLite(), ":memory:")

In SQL, data is organized into tables. A table has a fixed number of named columns. A row of the table represents a single data record and has one value for each column. For example, we have a table named records that stores information about the employees at a small company. Each of the eight rows represents an employee.

CREATE TABLE records AS
SELECT    "Ben Bitdiddle" AS Name, "Computer" AS Division, "Wizard" AS Title, 60000 AS Salary, "Oliver Warbucks" AS Supervisor UNION
SELECT "Alyssa P Hacker",          "Computer",             "Programmer",      40000,             "Ben Bitdiddle"               UNION
SELECT       "Cy D Fect",          "Computer",             "Programmer",      35000,             "Ben Bitdiddle"               UNION
SELECT   "Lem E Tweakit",          "Computer",             "Technician",      25000,             "Ben Bitdiddle"               UNION
SELECT  "Louis Reasoner",          "Computer",     "Programmer Trainee",      30000,           "Alyssa P Hacker"               UNION
SELECT "Oliver Warbucks",    "Administration",             "Big Wheel" ,     150000,           "Oliver Warbucks"               UNION 
SELECT    "Eben Scrooge",        "Accounting",       "Chief Accountant",      75000,           "Oliver Warbucks"               UNION
SELECT "Robert Cratchet",        "Accounting",              "Scrivener",      18000,               "Eben Scroge";
SELECT * FROM records
8 records
Name Division Title Salary Supervisor
Alyssa P Hacker Computer Programmer 40000 Ben Bitdiddle
Ben Bitdiddle Computer Wizard 60000 Oliver Warbucks
Cy D Fect Computer Programmer 35000 Ben Bitdiddle
Eben Scrooge Accounting Chief Accountant 75000 Oliver Warbucks
Lem E Tweakit Computer Technician 25000 Ben Bitdiddle
Louis Reasoner Computer Programmer Trainee 30000 Alyssa P Hacker
Oliver Warbucks Administration Big Wheel 150000 Oliver Warbucks
Robert Cratchet Accounting Scrivener 18000 Eben Scroge

2. Creating Tables

We can use a SELECT statement to create tables. The following statement creates a table with a single row, with columns named “first” and “last”:

SELECT "Ben" AS first, "Bitdiddle" AS last UNION 
SELECT "Louis",         "Reasoner";
2 records
first last
Ben Bitdiddle
Louis Reasoner

We can choose which columns to show in the first part of the SELECT, we can filter out rows using a WHERE clause, and sort the resulting rows with an ORDER BY clause.

For instance, the following statement lists all information about employees with the “Programmer” title:

SELECT * FROM records WHERE title = "Programmer"
2 records
Name Division Title Salary Supervisor
Alyssa P Hacker Computer Programmer 40000 Ben Bitdiddle
Cy D Fect Computer Programmer 35000 Ben Bitdiddle

The following statement lists the names and salaries of each employee under the accounting division, sorted in descending order by their salaries

SELECT Name, Salary FROM records WHERE Division = "Accounting" ORDER BY Salary DESC;
2 records
Name Salary
Eben Scrooge 75000
Robert Cratchet 18000

An alternative option for listing the names and salaries of each employee under the accounting division in descending order by their salaries is the following:

SELECT Name, Salary FROM records WHERE Division = "Accounting" ORDER BY -Salary;
2 records
Name Salary
Eben Scrooge 75000
Robert Cratchet 18000

Note that all valid SQL statements must be terminated by a semicolon (;). Additionally, you can split up your statement over many lines and add as much whitespace as you want, much like Scheme. But keep in mind that having consistent indentation and line breaking does make your code more readable to others (and your future self!)

2.1) Write a query that outputs the names of employees that Oliver Warbucks directly supervises.

SELECT Name FROM records WHERE Supervisor = "Oliver Warbucks"
3 records
Name
Ben Bitdiddle
Eben Scrooge
Oliver Warbucks

2.2) Write a query that outputs all information about employees that supervise themselves

SELECT * FROM records WHERE Name = Supervisor
1 records
Name Division Title Salary Supervisor
Oliver Warbucks Administration Big Wheel 150000 Oliver Warbucks

2.3) Write a query that outputs the names of all employees with salary greater than 50,000 in alphabetical order.

SELECT Name FROM records WHERE Salary > 50000 ORDER BY Name
3 records
Name
Ben Bitdiddle
Eben Scrooge
Oliver Warbucks

3. Joins

Suppose we have another table meetings which records the divisional meetings

CREATE TABLE meetings AS
SELECT "Accounting" AS Division, "Monday" AS Day, "9am" AS Time UNION 
SELECT "Computer"              , "Wednesday"    , "4pm"         UNION
SELECT "Administration"        , "Monday"       , "11am"        UNION
SELECT "Administration"        , "Wednesday"    , "4pm";
SELECT * FROM meetings
4 records
Division Day Time
Accounting Monday 9am
Administration Monday 11am
Administration Wednesday 4pm
Computer Wednesday 4pm

Data are combined by joining multiple tables together into one, a fundamental operation in database systems. There are many methods of joining, all closely related, but we will focus on just one method (the inner join) in this class.

When tables are joined, the resulting table contains a new row for each combination of rows in the input tables. If two tables are joined and the left table has m rows and the right table has n rows, then the joined table will have mn rows. Joins are expressed in SQL by separating table names by commas in the FROM clause of a SELECT statement.

For example, let’s look at the following

SELECT Name, Day FROM records, meetings LIMIT 12;
Displaying records 1 - 10
Name Day
Alyssa P Hacker Monday
Alyssa P Hacker Monday
Alyssa P Hacker Wednesday
Alyssa P Hacker Wednesday
Ben Bitdiddle Monday
Ben Bitdiddle Monday
Ben Bitdiddle Wednesday
Ben Bitdiddle Wednesday
Cy D Fect Monday
Cy D Fect Monday

The records table has 7 rows and the meetings table has 4 rows therefore the table records, meetings has 7*4 = 28 rows.

Tables may have overlapping column names, and so we need a method for disambiguating column names by table. A table may also be joined with itself, and so we need a method for disambiguating tables. To do so, SQL allows us to give aliases to tables within a FROM clause using the keyword AS and to a column within a particular table using a dot expression. In the example below we find the name and title of Alysa P Hacker’s supervisor.

SELECT b.Name, b.Title FROM records AS a, records AS b
WHERE a.Name = "Alyssa P Hacker" AND a.Supervisor = b.Name;
1 records
Name Title
Ben Bitdiddle Wizard

3.1) Write a query that outputs the meeting days and times of all employees directly supervised by Oliver Warbucks.

SELECT a.Name,b.Day,b.Time FROM records as a, meetings as b
WHERE a.Supervisor = "Oliver Warbucks" AND a.Division = b.Division
4 records
Name Day Time
Ben Bitdiddle Wednesday 4pm
Eben Scrooge Monday 9am
Oliver Warbucks Monday 11am
Oliver Warbucks Wednesday 4pm

3.2) Write a query that outputs the names of employees whose supervisor is in a different division.

SELECT a.Name, a.Supervisor  FROM records AS a, records AS b
WHERE a.Supervisor = b.Name AND a.Division != b.Division;
2 records
Name Supervisor
Ben Bitdiddle Oliver Warbucks
Eben Scrooge Oliver Warbucks

3.3) Write a query that outputs the names of all the pairs of employees that have a meeting at the same time. Make sure that if A|B appears in your output, then B|A does not appear as well (A|A should additionally not appear).

SELECT a.Name, b.Name FROM records AS a, records AS b
WHERE a.Name != b.Name AND a.Division = b.Division LIMIT 11
Displaying records 1 - 10
Name Name
Alyssa P Hacker Ben Bitdiddle
Alyssa P Hacker Cy D Fect
Alyssa P Hacker Lem E Tweakit
Alyssa P Hacker Louis Reasoner
Ben Bitdiddle Alyssa P Hacker
Ben Bitdiddle Cy D Fect
Ben Bitdiddle Lem E Tweakit
Ben Bitdiddle Louis Reasoner
Cy D Fect Alyssa P Hacker
Cy D Fect Ben Bitdiddle

In the output above we do have duplicates. That is both A|B and B|A both appear in the output. To remove these duplicates we can add an additional constraint in the WHERE clause as follows:

SELECT a.Name, b.Name FROM records AS a, records AS b
WHERE a.Name < b.Name AND a.Name != b.Name AND a.Division = b.Division
Displaying records 1 - 10
Name Name
Alyssa P Hacker Ben Bitdiddle
Alyssa P Hacker Cy D Fect
Alyssa P Hacker Lem E Tweakit
Alyssa P Hacker Louis Reasoner
Ben Bitdiddle Cy D Fect
Ben Bitdiddle Lem E Tweakit
Ben Bitdiddle Louis Reasoner
Cy D Fect Lem E Tweakit
Cy D Fect Louis Reasoner
Eben Scrooge Robert Cratchet

4. Aggregation

So far, we have joined and manipulated individual rows using SELECT statements. But we can also perform aggregation operations over multiple rows with the same SELECT statements.

We can use the MAX, MIN, COUNT, and SUM functions to retrieve more information from our initial tables.

If we wanted to find the name and salary of the employee who makes the most money, we might say

SELECT Name, MAX(Salary) FROM records;
1 records
Name MAX(Salary)
Oliver Warbucks 150000

Using the special COUNT(*) syntax, we can count the number of rows in our table to see the number of employees at the company.

SELECT COUNT(*) FROM records
1 records
COUNT(*)
8

These commands can be performed on specific sets of rows in our table by using the GROUP BY [column name] clause. This clause takes all of the rows that have the same value in column name and groups them together.

We can find the minimum salary earned in each division of the company.

SELECT Division, MIN(Salary) FROM records GROUP BY Division
3 records
Division MIN(Salary)
Accounting 18000
Administration 150000
Computer 25000

These groupings can be additionally filtered by the HAVING clause. In contrast to the WHERE clause, which filters out rows, the HAVING clause filters out entire groups.

To find all titles that are held by more than one person, we say

SELECT Title FROM records GROUP BY Title HAVING COUNT(*) > 1;
1 records
Title
Programmer

4.1) Write a query that outputs each supervisor and the sum of salaries of all the employees they supervise.

SELECT Supervisor, SUM(SALARY) FROM records GROUP BY Supervisor
4 records
Supervisor SUM(SALARY)
Alyssa P Hacker 30000
Ben Bitdiddle 100000
Eben Scroge 18000
Oliver Warbucks 285000

4.2) Write a query that outputs the days of the week for which fewer than 5 employees have a meeting. You may assume no department has more than one meeting on a given day.

SELECT a.Division, b.Day, COUNT(a.Division) FROM records AS a, meetings AS b 
WHERE a.Division = b.Division GROUP BY b.Day HAVING COUNT(a.Division) < 5;
1 records
Division Day COUNT(a.Division)
Accounting Monday 3

4.3) Write a query that outputs all divisions for which there is more than one employee, and all pairs of employees within that division that have a combined salary less than 100,000.

SELECT a.Division, a.Name, a.Salary, b.Name, b.Salary, a.Salary + b.Salary FROM records AS a, records AS b 
WHERE a.Name != b.Name AND a.Division = b.Division AND a.Salary + b.Salary < 100000
ORDER BY a.Division
Displaying records 1 - 10
Division Name Salary Name Salary a.Salary + b.Salary
Accounting Eben Scrooge 75000 Robert Cratchet 18000 93000
Accounting Robert Cratchet 18000 Eben Scrooge 75000 93000
Computer Alyssa P Hacker 40000 Cy D Fect 35000 75000
Computer Alyssa P Hacker 40000 Lem E Tweakit 25000 65000
Computer Alyssa P Hacker 40000 Louis Reasoner 30000 70000
Computer Ben Bitdiddle 60000 Cy D Fect 35000 95000
Computer Ben Bitdiddle 60000 Lem E Tweakit 25000 85000
Computer Ben Bitdiddle 60000 Louis Reasoner 30000 90000
Computer Cy D Fect 35000 Alyssa P Hacker 40000 75000
Computer Cy D Fect 35000 Ben Bitdiddle 60000 95000

Just like in 3.3) here we can see that we have duplicates. For example, we have both pairs A|B and B|A. To not include duplicates we can do the following:

SELECT a.Division, a.Name, a.Salary, b.Name, b.Salary, a.Salary + b.Salary FROM records AS a, records AS b 
WHERE a.Name < b.Name AND a.Name != b.Name AND a.Division = b.Division AND a.Salary + b.Salary < 100000
ORDER BY a.Division
Displaying records 1 - 10
Division Name Salary Name Salary a.Salary + b.Salary
Accounting Eben Scrooge 75000 Robert Cratchet 18000 93000
Computer Alyssa P Hacker 40000 Cy D Fect 35000 75000
Computer Alyssa P Hacker 40000 Lem E Tweakit 25000 65000
Computer Alyssa P Hacker 40000 Louis Reasoner 30000 70000
Computer Ben Bitdiddle 60000 Cy D Fect 35000 95000
Computer Ben Bitdiddle 60000 Lem E Tweakit 25000 85000
Computer Ben Bitdiddle 60000 Louis Reasoner 30000 90000
Computer Cy D Fect 35000 Lem E Tweakit 25000 60000
Computer Cy D Fect 35000 Louis Reasoner 30000 65000
Computer Lem E Tweakit 25000 Louis Reasoner 30000 55000

5. Extra Questions

Use the following table called courses for the questions below:

CREATE TABLE courses AS
SELECT "Dan Garcia" AS Professor, "CS 61C" AS Course, "Sp19" AS Semester UNION
SELECT "John Denero"            , "CS 61A"          , "Fa18"             UNION 
SELECT "Dan Garcia"             , "CS 10"           , "Fa18"             UNION
SELECT "Josh Hug"               , "CS 61B"          , "Sp18"             UNION
SELECT "John Denero"            , "CS 61A"          , "Sp18"             UNION
SELECT "John Denero"            , "CS 61A"          , "Fa17"             UNION
SELECT "Paul Hilfinger"         , "CS 61A"          , "Fa17"             UNION  
SELECT "Paul Hilfinger"         , "CS 61A"          , "Sp17"             UNION
SELECT "John Denero"            , "Data 8"          , "Sp17"             UNION
SELECT "Josh Hug"               , "CS 61B"          , "Sp17"             UNION
SELECT "Satish Rao"             , "CS 70"           , "Sp17"             UNION
SELECT "Nicolas Weaver"         , "CS 61C"          , "Sp17"             UNION
SELECT "Gerald Friedland"       , "CS 61C"          , "Sp17";
SELECT * FROM courses
Displaying records 1 - 10
Professor Course Semester
Dan Garcia CS 10 Fa18
Dan Garcia CS 61C Sp19
Gerald Friedland CS 61C Sp17
John Denero CS 61A Fa17
John Denero CS 61A Fa18
John Denero CS 61A Sp18
John Denero Data 8 Sp17
Josh Hug CS 61B Sp17
Josh Hug CS 61B Sp18
Nicolas Weaver CS 61C Sp17

5.1 Create a table called num_taught that contains three columns: professor, the course they taught, and the number of times they taught each course.

Hint For this problem, it may help to GROUP BY multiple columns. Multiple columns and full expressions can appear in the group by clause, and groups will be formed for every unique combination of values that result.

CREATE TABLE num_taught AS
SELECT Professor, Course, COUNT(Course) AS Times FROM courses GROUP BY Professor, Course
SELECT * FROM num_taught
9 records
Professor Course Times
Dan Garcia CS 10 1
Dan Garcia CS 61C 1
Gerald Friedland CS 61C 1
John Denero CS 61A 3
John Denero Data 8 1
Josh Hug CS 61B 2
Nicolas Weaver CS 61C 1
Paul Hilfinger CS 61A 2
Satish Rao CS 70 1

5.2 Write a query that outputs two professors and a course if they have taught that course the same number of times. You may use the num_taught table you created in the previous question.

SELECT a.Professor, b.Professor, a.Course AS Course FROM num_taught AS a, num_taught AS b
WHERE a.Professor < b.Professor AND a.Professor != b.Professor AND a.Course = b.Course AND a.Times = b.Times
3 records
Professor Professor Course
Dan Garcia Gerald Friedland CS 61C
Dan Garcia Nicolas Weaver CS 61C
Gerald Friedland Nicolas Weaver CS 61C

5.3 Write a query that outputs two professors if they co-taught (taught the same course at the same time) the same course more than once.

Let’s take a quick look at the courses table

SELECT * FROM courses ORDER BY Semester
Displaying records 1 - 10
Professor Course Semester
John Denero CS 61A Fa17
Paul Hilfinger CS 61A Fa17
Dan Garcia CS 10 Fa18
John Denero CS 61A Fa18
Gerald Friedland CS 61C Sp17
John Denero Data 8 Sp17
Josh Hug CS 61B Sp17
Nicolas Weaver CS 61C Sp17
Paul Hilfinger CS 61A Sp17
Satish Rao CS 70 Sp17

Therefore we have the following

SELECT * FROM courses AS a, courses as b
WHERE a.Professor < b.Professor AND a.Professor != b.Professor AND a.Course = b.Course AND a.Semester = b.Semester
2 records
Professor Course Semester Professor Course Semester
Gerald Friedland CS 61C Sp17 Nicolas Weaver CS 61C Sp17
John Denero CS 61A Fa17 Paul Hilfinger CS 61A Fa17