MacOS Installation and Configuration

  • Login to MacOS as user with Administrative rights
  • Install: brew install postres
  • Login to PostgreSQL as a admin user: sudo -u user_name sql db_name
  • Note: PostgreSQL and MacOS admin users are different and independent.
  • Creating a db_user: CREATE ROLE user_name WITH LOGIN PASSWORD 'your secret here’;
  • Changing permissions: ALTER ROLE user_name CREATEDB;
  • Connecting to a server running as a regular user
  • pgAdmin > right-click server > Create > Sever
    • Main tab: Name it
    • Connection tab: Specify IP address (e.g. 127.0.0.1) > admin db-user role

Invoke psql command shell

  • psql -d <database> -U <user> -W <password>
  • Note: Add -E to echo queries for learning
Command What it does
\l list databases
\c <database> connect to a database
\dt list tables
\dt *.* list selected tables
\d <table> show table definition
\dv list views
\dn list schemas
\df list functions
\df+ <function> show SQL of function
\h psql help
\? <psql cmd> psql command help
\h <sql> sql command help
\i <script> runs the sql script file
\q quit psql
q quit current task
\x better looking output

Troubleshooting

  • Finding all postgres processes: ps aux | grep postgres
  • Finding process using a port: lsof -i tcp:5432
  • Finding process using a port: netstat -vanp tÂcp | grep 5432
  • Finding all running processes: ps -A

Starting and Stopping the Server on MacOS

  • start and stop the server: su - <admin_user> #
  • create aliases to manage the server storing them in ~/.bash_profile
    • alias pgq='pg_ctl -D /usr/local/var/postgres stop -s -m fast'
    • alias pgs='pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start'
  • Use psql to work with PostgreSQL on the command line or the pgAdmin4 GUI application (available as a homebrew cask)

Creating a tables

  • Using pgAdmin
  • Database: Right-click on Databases > Create > Database
  • Double-click on database > Double click on Schemes > Right-click on tables > Create
  • Columns: Right-click on table > Create > Column > Give it a name > Give it a type
  • Constraint: Right-click on table > Properties > Constraints tab > Add an constraint > Edit icon > Name it > Select Definition tab > Select column
  • Using psql command line
CREATE TABLE animals
(
    species character varying (25),
    vertebrate_class character varying(25),
    appearance character varying(25),
    num_legs int4,
    CONSTRAINT animal_pkey PRIMARY KEY (species)
);

Import data

  • pgadmin: right-click table > Import / Export > Define location, delimiter, header … > OK

  • Check by showing the data

SELECT *
FROM pets;

Statistics

Limit number of results:

SELECT *
FROM staff
LIMIT 10;

Counting and grouping results

SELECT gender, count(*)
FROM staff
GROUP BY gender;

Finding the max and min

SELECT max(salary), min(salary)
FROM staff;

Apply that across all departments

SELECT department, max(salary), min(salary)
FROM staff
GROUP BY department;

Aggregate and round values

SELECT department, round(sum(salary), 0), round(avg(salary), 0), round(var_pop(salary), 0), round(stddev_pop(salary), 0)
FROM staff
GROUP BY department;

Truncating, ceil and rounding

SELECT department, avg(salary), trunc(avg(salary),2 ), ceil(avg(salary)), round(avg(salary), 2)
FROM staff
GROUP BY department;

For more information see: PostrgreSQL Aggregation Functions Documentation

Classification

Use CASE to show pet names and a column to indicate whether the pet's name is long or short (a long name is strictly more than 6 characters long). Filter to select only female pets.

SELECT name ,
    CASE WHEN length(name) > 6
    THEN 'long'
    ELSE 'short' END
FROM pets
WHERE gender = 'female';

Filtering

Find data that meets a condition

SELECT last_name, department, salary
FROM staff
WHERE salary > 100000;

Adding multiple conditions to the WHERE clause

SELECT last_name, department, salary
FROM staff
WHERE department = 'Tools' AND salary > 100000;

Wild card in where clause

SELECT last_name, department, salary
FROM staff
WHERE department
LIKE 'B%';

Combining filters and aggregates

SELECT department, sum(salary)
FROM staff WHERE department
LIKE 'B%'
GROUP BY department;
-- NOTE:  Slow queries result from putting ‘&’ first as scans every row ignoring the index

Find distinct values

SELECT DISTINCT LOWER(department)
FROM staff;

Munging

Concatenating values

SELECT job_title || '-' || department
FROM staff;

… and create a new column

SELECT job_title || '-' || department title_dept
FROM staff;

Trim and length functions

SELECT length(' software engineer '), length(trim(' software engineer '));

Create new boolean field

SELECT job_title, (job_title LIKE '%Assistant%') is_assist
FROM staff;

Extracting substring of length 3

SELECT SUBSTRING('abcdefghijkl'
FROM 6 FOR 3) test_string;

Extracting remainder of a string

SELECT SUBSTRING('abcdefghijkl' FOR 4) test_string;

Replacing text

SELECT OVERLAY(job_title PLACING 'Asst.'
FROM 1 FOR 9)
FROM staff
WHERE job_title
LIKE 'Assistant%';

Regex

Find assistants at specified levels

SELECT job_title
FROM staff
WHERE job_title
SIMILAR TO '%Assistant%(III|IV)';

… now match levels starting with I

SELECT job_title
FROM staff
HERE job_title
SIMILAR TO '%Assistant% I_';

… or job title that start with E, P or S

SELECT job_title
FROM staff
WHERE job_title
SIMILAR TO '[EPS]%';

Sub-Queries

Consider using a view and or a windowing functions for simpler way to implement the same functionality.

Adding a calculated field

SELECT s1.last_name, s1.salary, s1.department,
    (SELECT round(avg(salary))
    FROM staff s2
    WHERE s2.department = s1.department )
FROM staff s1;

Using a sub-query instead of a table name

SELECT s1.department, round(avg(s1.salary))
FROM
    (SELECT department, salary
    FROM staff
    WHERE salary > 100000) s1
GROUP BY department;

Using a sub-query to find max salary

SELECT s1.department, s1.last_name, s1.salary
FROM staff s1
WHERE s1.salary = (SELECT max(s2.salary)
FROM staff s2);

For more information see: PostgreSQL SubQuery Expressions Documentation

Joining Tables

Join two tables using department name

SELECT s.last_name, s.department, cd.company_division
FROM staff
JOIN company_divisions cd
ON s.department = cd.department;
    -- Note:  it is missing rows because not all departments
    -- in the staff table are found in the company_divisions table,
    -- so we need to prefer the division from the staff table.
    -- This is called a left outer join.

Left outer-join gets all rows

SELECT s.last_name, s.department, cd.company_division
FROM staff s
LEFT JOIN company_divisions cd
ON s.department = cd.department;

… find departments without a division

SELECT s.last_name, s.department, cd.company_division
FROM staff s
LEFT JOIN company_divisions cd
ON s.department = cd.department
WHERE cd.company_division IS NULL;
    -- Note:  Turns out it is the Books department

Using Views

Save a long join by creating a view

CREATE VIEW staff_div_reg AS
    SELECT s.*, cd.company_division, cr.company_regions
    FROM staff s
    LEFT JOIN company_divisions cd
    ON s.department = cd.department
    LEFT JOIN company_regions cr
    ON s.region_id = cr.region_id;

Check that all rows are present in the view

SELECT count(*)
FROM staff_div_reg;

Query to be stored as a view

SELECT company_regions, count(*)
FROM staff_div_reg
GROUP BY company_regions
ORDER BY company_regions;

Use view to provide counts of staff in each region

CREATE OR REPLACE VIEW staff_div_reg_country
AS
    SELECT s.*, cd.company_division, cr.company_regions, cr.country
    FROM staff s, company_divisions cd, company_regions cr;

Create a view that joins user and sales tables

CREATE VIEW sales_by_person
AS
    SELECT s.name, s.id, o.amount
    FROM salesperson s
    LEFT JOIN orders o
    ON s.id = o.salesperson_id;
        -- Note: Every time the view is used
        --       its SQL is re-evaluated.

Create a materialized view that is stored on the server

CREATE MATERIALIZED VIEW sales_by_person
AS
    SELECT s.name, s.id, o.amount
    FROM salesperson s
    LEFT JOIN orders o
    ON s.id = o.salesperson_id;
        -- Note: Once created materialized views
        --       can be queried many times but are
        --       not automatically updated.

Updating a materialized view

REFRESH MATERIALIZED VIEW sales_by_person;

Report number and amount of successful sales by name

SELECT name, count(amount), sum(amount)
FROM sales_by_person
GROUP BY name
HAVING count(amount) > 1
ORDER BY name;

For more information see: * PostgreSQL Views and the Rule System Documentation * PostgreSQL Materialized Views Documentation

Temporary Tables

Temporary tables are alternative to Views and Subqueries. Here are two ways to create a temporary table in SQL.

Create and drop a temporary table

-- create a temporary table
CREATE TABLE temp_table AS
(
    SELECT col1, col2, col3
    FROM another_table;
)

-- select the data you need
SELECT tt.col1, tt.col2
FROM temp_table AS tt;

-- delete the temporary table from the database
DROP TABLE temp_table;

True temporary table created using WITH

-- create the temporary table
WITH temp_table AS
(
    SELECT col1, col2, col3
    FROM another_table
);

-- select the data you need
SELECT tt.col1, tt.col2
FROM temp_table AS tt;

Grouping & Totaling

Breakout sales number and amount of sales by name

SELECT s.name, count(o.amount), sum(o.amount)
FROM salesperson s
LEFT JOIN orders o
ON s.id = o.salesperson_id
GROUP BY s.name
HAVING count(o.amount) > 1
ORDER BY s.name;

Breakout the total number of employees by division and region using grouping set sql SELECT company_division, company_regions, count(*) FROM staff_div_reg GROUP BY GROUPING SETS (company_division, company_regions) ORDER BY company_regions, company_division; -- Note: Cells are blank when sub-totaled by another quantity

Create subtotals using rollup

SELECT company_regions, country, count(*)
FROM staff_div_reg_country
GROUP BY ROLLUP(country, company_regions)
ORDER BY country, company_regions;

Create all possible subtotals using cube

SELECT company_division, company_regions, count(*)
FROM staff_div_reg_country
GROUP BY CUBE(company_division, company_regions);

For more information see: PostgreSQL Grouping, Cube and Rollup Documentation

Sorting / Ordering

Find the top N values

SELECT last_name, job_title, salary
FROM staff
ORDER BY salary DESC
FETCH FIRST 10 ROWS ONLY;
    -- Note:  LIMIT get the first N elements, FETCH scans all
    --        the rows and get the highest N values.

Ascending ordering by count

SELECT company_division, count(*)
FROM staff_div_reg_country
GROUP BY company_division
ORDER BY count(*);

Descending order by count

SELECT company_division, count(*)
FROM staff_div_reg_country
GROUP BY company_division
ORDER BY count(*) DESC;

Limit to first 3 rows

SELECT company_division, count(*)
FROM staff_div_reg_country GROUP BY company_division
ORDER BY count(*) DESC
FETCH FIRST 3 ROWS ONLY;

Sorting by date

SELECT facility_id, bed_census_date
FROM beds
WHERE facility_id = '6057'
ORDER BY bed_census_date::timestamp ASC;

Finding the largest totals by sorting an aggregated field

SELECT facility_name, facility_id, bed_census_date, sum(available_residential_beds::integer)
FROM beds
GROUP BY facility_name, facility_id, bed_census_date
ORDER BY sum(available_residential_beds::integer) DESC
FETCH FIRST 10 ROWS ONLY;

For more information see: PostgreSQL Sorting Documentation

Window Functions

Window functions are simpler than subqueries and produce similar results they operate rows adjacent to the current row.

SELECT company_regions, last_name, salary, min(salary)
OVER
    (
        PARTITION BY company_regions
    )
FROM staff_div_reg;

Employees by department sorted by salary

SELECT department, last_name, salary, first_value(salary)
OVER
    (
        PARTITION BY department
        ORDER BY salary DESC
    )
FROM staff;

Employees by department sorted by last_name

SELECT department, last_name, salary, first_value(salary)
OVER
    (
        PARTITION BY department
        ORDER BY last_name
    )
FROM staff;

Ranking employees within a department by salary

SELECT department, last_name, salary, rank()
OVER
(
    PARTITION BY department
    ORDER BY salary DESC
)
FROM staff;

Lag function references previous row

SELECT department, last_name, salary, lag(salary)
OVER
    (
        PARTITION BY department
        ORDER BY salary DESC
    )
FROM staff;

Lead function references the next row

SELECT department, last_name, salary, lead(salary)
OVER
(
    PARTITION BY department
    ORDER BY salary DESC
)
FROM staff;

Calculating salary quartiles using ntile

SELECT department, last_name, salary, ntile(4)
OVER
(
    PARTITION BY department
    ORDER BY salary DESC
)
FROM staff;

For more information see: PostgreSQL Windowing Function Expressions Documentation

Telling Stories with Data

  • Start with a problem
    • Losing customers
    • Product sales decreasing
  • Gather data it will take time, figure in more time when using different data sources to make them all consistently encoded
  • Understand your data using statistics
  • Reformat and check it before attempting to do joins
  • Use views to capture complex SQL statements
  • Use joins
    • Inner joins only return row that both table share
    • Use an outer join if you want all rows from one table even if they are missing from the other table
  • For cross tabulations use cubes, rollups, and grouping sets rather than subqueries
  • Use window functions to work on sets of related rows

Resources