Lesson 4
Complex Queries and Conditional Logic in PostgreSQL
Introduction and Overview

Welcome back! In our previous lessons, we've explored various ways to query and analyze data using PostgreSQL. We began by mastering text-based queries, moved on to understanding subqueries, and learned to create new columns and perform mathematical operations.

In this lesson, we'll dive deeper into complex queries and conditional logic:

  1. Using the CASE statement to categorize data.
  2. Combining multiple conditions in the WHERE clause.
  3. Enhancing queries with subqueries.

These techniques will help you make your data queries more powerful and insightful.

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Understanding the CASE Statement in PostgreSQL

The CASE statement allows you to perform conditional logic directly within your queries. It's similar to an IF-THEN-ELSE statement in programming languages.

The syntax of a CASE statement is as follows:

SQL
1SELECT 2 column1, 3 column2, 4 CASE 5 WHEN condition1 THEN result1 6 WHEN condition2 THEN result2 7 ... 8 ELSE resultN 9 END AS alias_name 10FROM 11 table_name;
  • CASE: Begins the conditional logic block.
  • WHEN condition1 THEN result1: If condition1 is true, the statement returns result1.
  • WHEN condition2 THEN result2: If condition2 is true, the statement returns result2.
  • ELSE resultN: If none of the conditions are true, the statement returns resultN. This part is optional but recommended for completeness.
  • END: Ends the CASE block.
  • AS alias_name: (Optional) Provides a temporary name for the resulting column.

Let's see a real-world application of the CASE statement. We'll categorize movies based on their box office earnings into three categories: Flop, Hit, and Blockbuster.

Categorizing Movies by Box Office Earnings

We want to categorize movies as either a "Flop", "Hit", or "Blockbuster" based on box office sales. The query is:

SQL
1SELECT 2 movies.movie_name, 3 CASE 4 WHEN movie_details.box_office_million_usd < 400 THEN 'Flop' 5 WHEN movie_details.box_office_million_usd BETWEEN 400 AND 1000 THEN 'Hit' 6 ELSE 'Blockbuster' 7 END AS success_category 8FROM 9 movies 10INNER JOIN 11 movie_details ON movies.movie_id = movie_details.movie_id;

Let's break down this query:

  • We use INNER JOIN to combine data from the movies and movie_details tables based on movie_id.
  • The SELECT statement retrieves movie names.
  • CASE begins the conditional logic block to categorize the movies based on their box office earnings.
    • WHEN movie_details.box_office_million_usd < 400 THEN 'Flop': If the movie's box office earnings are less than 400 million USD, the movie is categorized as 'Flop'.
    • WHEN movie_details.box_office_million_usd BETWEEN 400 AND 1000 THEN 'Hit': If the movie's box office earnings are between 400 and 1000 million USD (inclusive), the movie is categorized as 'Hit'.
    • ELSE 'Blockbuster': If the movie's box office earnings are greater than 1000 million USD, the movie is categorized as 'Blockbuster'.
    • END AS success_category: Ends the CASE block and gives the result column the alias success_category.

The output is:

Plain text
1 movie_name | success_category 2---------------------------------------------+------------------ 3 Iron Man | Hit 4 The Incredible Hulk | Flop 5 Iron Man 2 | Hit 6 Thor | Hit 7 Captain America: The First Avenger | Flop 8 The Avengers | Blockbuster 9 Iron Man 3 | Blockbuster 10 Thor: The Dark World | Hit 11 Captain America: The Winter Soldier | Hit 12 Guardians of the Galaxy | Hit 13 Avengers: Age of Ultron | Blockbuster 14 Ant-Man | Hit 15 Captain America: Civil War | Blockbuster 16 Doctor Strange | Hit 17 Guardians of the Galaxy Vol. 2 | Hit 18 Spider-Man: Homecoming | Hit 19 Thor: Ragnarok | Hit 20 Black Panther | Blockbuster 21 Avengers: Infinity War | Blockbuster 22 Ant-Man and The Wasp | Hit 23 Captain Marvel | Blockbuster 24 Avengers: Endgame | Blockbuster 25 Spider-Man: Far From Home | Blockbuster 26 Black Widow | Flop 27 Shang-Chi and the Legend of the Ten Rings | Hit 28 Eternals | Hit 29 Spider-Man: No Way Home | Blockbuster 30 Doctor Strange in the Multiverse of Madness | Blockbuster 31 Thor: Love and Thunder | Hit 32 Black Panther: Wakanda Forever | Hit 33 Ant-Man and The Wasp: Quantumania | Hit 34 Guardians of the Galaxy Vol. 3 | Hit 35 The Marvels | Flop 36(33 rows)

From the output, we can see that "The Incredible Hulk" is listed as a 'Flop', "Iron Man" is listed as a 'Hit', and "The Avengers" is listed as a 'Blockbuster'.

This example demonstrates how you can use the CASE statement to derive meaningful categories from numerical data.

Putting It All Together: Super Query

Congratulations on reaching the last lesson of the "Intro to SQL with PostgreSQL" course path! Now let's put your PostgreSQL super powers to the test. In this example, we combine every query technique we have learned.

We want to create a table with the following columns:

  • Movie Name: The name of the movie.
  • Success Level: Categorized as 'Flop', 'Hit', or 'Blockbuster' based on the box office earnings.
  • Profit in Millions: The difference between the box office earnings and the budget, representing the movie's profit.
  • Average Screen Time: The rounded average screen time of characters in the movie, displayed in minutes with 'mins' appended.

Only movies with an IMDb rating higher than average should be included, and the output should be sorted by profit in descending order.

Let's take a look at this super query:

SQL
1SELECT 2 movies.movie_name, 3 CASE 4 WHEN movie_details.box_office_million_usd < 400 THEN 'Flop' 5 WHEN movie_details.box_office_million_usd BETWEEN 400 AND 1000 THEN 'Hit' 6 ELSE 'Blockbuster' 7 END AS "Success Level", 8 movie_details.box_office_million_usd - movie_details.budget_million_usd AS "Profit in Millions", 9 CONCAT(ROUND(AVG(characters.screen_time_minutes),0), ' mins') AS "Average Screen Time" 10FROM 11 movies 12INNER JOIN 13 movie_details ON movies.movie_id = movie_details.movie_id 14INNER JOIN 15 characters ON movies.movie_id = characters.movie_id 16WHERE 17 movie_details.imdb_rating > ( 18 SELECT AVG(movie_details.imdb_rating) 19 FROM movie_details) 20GROUP BY movies.movie_name, movie_details.box_office_million_usd, movie_details.budget_million_usd 21ORDER BY "Profit in Millions" DESC;

Let's break this query down line by line:

SELECT Clause:

  • Retrieves the movie_name.
  • Uses a CASE statement to determine the "Success Level" of each movie based on its box_office_million_usd:
    • Flop for earnings less than 400 million USD.
    • Hit for earnings between 400 and 1000 million USD.
    • Blockbuster for earnings greater than 1000 million USD.
  • Computes the "Profit in Millions" by subtracting the budget_million_usd from the box_office_million_usd.
  • Calculates the "Average Screen Time" for characters in the movie by rounding the average screen_time_minutes to the nearest whole number and appending 'mins'.

FROM Clause:

  • Uses the movies table as the base dataset.

INNER JOIN:

  • Combines movies with movie_details using movie_id.
  • Combines movies with characters also using movie_id.

WHERE Clause:

  • Filters the results to include only movies where the imdb_rating is higher than the average IMDb rating of all movies.
  • The subquery (SELECT AVG(movie_details.imdb_rating) FROM movie_details) calculates the average IMDb rating across all movies.

GROUP BY Clause:

  • Groups the results by movie_name, box_office_million_usd, and budget_million_usd to aggregate the average screen time correctly. When using aggregate functions, all non-aggregate columns in the SELECT statement must be included in the GROUP BY clause.

ORDER BY Clause:

  • Orders the results based on "Profit in Millions" in descending order to list the most profitable movies first.

The output is:

Plain text
1 movie_name | Success Level | Profit in Millions | Average Screen Time 2---------------------------------------------+---------------+--------------------+--------------------- 3 Avengers: Endgame | Blockbuster | 2441.8 | 66 mins 4 Spider-Man: No Way Home | Blockbuster | 1735.4 | 56 mins 5 Avengers: Infinity War | Blockbuster | 1727.4 | 67 mins 6 Doctor Strange in the Multiverse of Madness | Blockbuster | 1414.7 | 55 mins 7 The Avengers | Blockbuster | 1299.6 | 51 mins 8 Spider-Man: Far From Home | Blockbuster | 971.9 | 54 mins 9 Captain America: Civil War | Blockbuster | 903.3 | 58 mins 10 Thor: Ragnarok | Hit | 670.8 | 59 mins 11 Guardians of the Galaxy Vol. 2 | Hit | 663.8 | 52 mins 12 Guardians of the Galaxy | Hit | 603.3 | 44 mins 13 Guardians of the Galaxy Vol. 3 | Hit | 595 | 55 mins 14 Captain America: The Winter Soldier | Hit | 544.3 | 48 mins 15 Thor: Love and Thunder | Hit | 514.3 | 56 mins 16 Doctor Strange | Hit | 512.7 | 46 mins 17 Iron Man | Hit | 445.2 | 34 mins 18 Shang-Chi and the Legend of the Ten Rings | Hit | 230.0 | 62 mins 19(16 rows)

The output is a list of movies including their name, success level, profit in millions, and average screen time, filtered to include only those movies with above-average IMDb ratings and sorted by profit in descending order.

Notice that none of the movies are labeled as a flop. Remember that we filtered out all movies with an IMDb rating less than average. From our output, we know that any movie labeled as a flop was excluded from our table. This means that all movies that made less than 400 million dollars in profit have an IMDb rating lower than average.

Summary and Next Steps

Congratulations on reaching the final lesson in the "Intro to SQL with PostgreSQL" course path. You have now mastered the key techniques for dealing with complex queries and conditional logic in PostgreSQL. You've come a long way in unlocking your PostgreSQL Super Powers! Keep exploring and practicing to further enhance your skills.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.