Lesson 2
Mastering Subqueries for Data Analysis
Introduction and Overview

Welcome back! In this lesson, we will focus on mastering subqueries for data analysis in PostgreSQL. Subqueries, also known as inner queries or nested queries, allow you to perform more complex queries by embedding one query inside another. This technique is particularly useful for comparing aggregated data, filtering results, and more. Let’s dive in!

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Understanding Joining Tables

Before we explore subqueries, let's briefly review joining tables in SQL. Joining tables is a fundamental operation used to combine data from two or more tables based on a related column.

Consider the following example:

SQL
1SELECT 2 movies.movie_name, 3 movie_details.imdb_rating 4FROM 5 movies 6INNER JOIN 7 movie_details ON movies.movie_id = movie_details.movie_id;

In this example:

  • We select movie_name from the movies table and imdb_rating from the movie_details table
  • We join the movies table with the movie_details table using the movie_id column, which is common to both tables.

The output is:

Plain text
1 movie_name | imdb_rating 2---------------------------------------------+------------- 3 Iron Man | 7.9 4 The Incredible Hulk | 6.7 5 Iron Man 2 | 7.0 6 Thor | 7.0 7 Captain America: The First Avenger | 6.9 8 The Avengers | 8.0 9 Iron Man 3 | 7.2 10 Thor: The Dark World | 6.9 11 Captain America: The Winter Soldier | 7.7 12 Guardians of the Galaxy | 8.0 13 Avengers: Age of Ultron | 7.3 14 Ant-Man | 7.3 15 Captain America: Civil War | 7.8 16 Doctor Strange | 7.5 17 Guardians of the Galaxy Vol. 2 | 7.6 18 Spider-Man: Homecoming | 7.4 19 Thor: Ragnarok | 7.9 20 Black Panther | 7.3 21 Avengers: Infinity War | 8.4 22 Ant-Man and The Wasp | 7.1 23 Captain Marvel | 6.9 24 Avengers: Endgame | 8.4 25 Spider-Man: Far From Home | 7.5 26 Black Widow | 6.8 27 Shang-Chi and the Legend of the Ten Rings | 7.6 28 Eternals | 6.8 29 Spider-Man: No Way Home | 8.4 30 Doctor Strange in the Multiverse of Madness | 7.8 31 Thor: Love and Thunder | 7.5 32 Black Panther: Wakanda Forever | 7.3 33 Ant-Man and The Wasp: Quantumania | 6.2 34 Guardians of the Galaxy Vol. 3 | 8.1 35 The Marvels | 6.1 36(33 rows)

The result table lists movie names and their respective IMDb rating.

Introduction to Subqueries and Syntax

Subqueries are queries nested within another SQL query. They can be used in various places such as SELECT, FROM, WHERE, and others. Subqueries are useful for situations where you need to perform multiple steps to retrieve the desired data.

The syntax of subqueries is:

SQL
1SELECT column1, column2, ... 2FROM table_name1 3WHERE column_name operator (SELECT column1 4 FROM table_name2 5 WHERE condition);

Let's take a look at each part of this query:

Outer Query

  • SELECT column1, column2, ...: Specifies the columns you want to retrieve from table_name1.
  • FROM table_name1: Specifies the main table from which you want to fetch the data.
  • WHERE column_name operator: Begins the WHERE clause that will filter the rows based on a condition involving a subquery.

Subquery (Inner Query)

  • SELECT column1: Specifies the column you want to retrieve in the subquery.
  • FROM table_name2: Specifies the table from which the subquery will fetch the data.
  • WHERE condition: Applies a condition to filter the rows in the subquery.

Operator:

  • The operator in the WHERE clause can be a comparison operator (such as =, <, >, <=, >=, <>) or other operators like IN, EXISTS.
Subquery Example

With this foundation of subquery syntax, let's take a look at an example. One common use of subqueries is to compare values against an average or other aggregate results. Let's form a query to find the names of movies that have an IMDb rating greater than the average IMDb rating of all the movies.

First we need a query to find the average IMDb rating.

SQL
1SELECT 2 AVG(imdb_rating) 3FROM 4 movie_details;

The output of this query is:

Plain text
1 avg 2-------------------- 3 7.4030303030303030 4(1 row)

The average IMDb rating of the movies is approximately 7.4.

We will use the resulting average to filter movies in the outer query.

SQL
1SELECT 2 movie_name, 3 imdb_rating 4FROM 5 movies 6INNER JOIN 7 movie_details ON movies.movie_id = movie_details.movie_id 8WHERE 9 imdb_rating > ( 10 SELECT 11 AVG(imdb_rating) 12 FROM 13 movie_details 14 );

In this final query:

  • We join the movies and movie_details tables using the movie_id column.
  • We use a subquery to calculate the average IMDb rating
  • We filter movies that have an IMDb rating greater than the average.

The output is

Plain text
1 movie_name | imdb_rating 2---------------------------------------------+------------- 3 Iron Man | 7.9 4 The Avengers | 8.0 5 Captain America: The Winter Soldier | 7.7 6 Guardians of the Galaxy | 8.0 7 Captain America: Civil War | 7.8 8 Doctor Strange | 7.5 9 Guardians of the Galaxy Vol. 2 | 7.6 10 Thor: Ragnarok | 7.9 11 Avengers: Infinity War | 8.4 12 Avengers: Endgame | 8.4 13 Spider-Man: Far From Home | 7.5 14 Shang-Chi and the Legend of the Ten Rings | 7.6 15 Spider-Man: No Way Home | 8.4 16 Doctor Strange in the Multiverse of Madness | 7.8 17 Thor: Love and Thunder | 7.5 18 Guardians of the Galaxy Vol. 3 | 8.1 19(16 rows)

This output shows the movie name and IMDb rating for every movie greater than the average IMDb rating (7.4).

This powerful combination of JOIN and subqueries allows us to perform complex data analysis with ease.

Summary and Next Steps

Congratulations! You have successfully learned how to use subqueries in PostgreSQL to achieve more complex data analysis tasks. Here's a quick recap of what we covered:

  • The importance and usage of JOIN operations.
  • The syntax of subqueries
  • A step-by-step breakdown of comparing values against an average using subqueries.

Now it’s time to apply what you've learned with some practice exercises. Happy querying!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.