Welcome back! In this lesson, we will focus on mastering subqueries for data analysis in PostgreSQL. Subqueries, also known as inner queries or nested queries, allow you to perform more complex queries by embedding one query inside another. This technique is particularly useful for comparing aggregated data, filtering results, and more. Let’s dive in!
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Plain text1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------
Movie Details Table
Plain text1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------
Characters Table
Plain text1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Before we explore subqueries, let's briefly review joining tables in SQL. Joining tables is a fundamental operation used to combine data from two or more tables based on a related column.
Consider the following example:
SQL1SELECT
2 movies.movie_name,
3 movie_details.imdb_rating
4FROM
5 movies
6INNER JOIN
7 movie_details ON movies.movie_id = movie_details.movie_id;
In this example:
- We select
movie_name
from themovies
table andimdb_rating
from themovie_details
table - We join the
movies
table with themovie_details
table using themovie_id
column, which is common to both tables.
The output is:
Plain text1 movie_name | imdb_rating 2---------------------------------------------+------------- 3 Iron Man | 7.9 4 The Incredible Hulk | 6.7 5 Iron Man 2 | 7.0 6 Thor | 7.0 7 Captain America: The First Avenger | 6.9 8 The Avengers | 8.0 9 Iron Man 3 | 7.2 10 Thor: The Dark World | 6.9 11 Captain America: The Winter Soldier | 7.7 12 Guardians of the Galaxy | 8.0 13 Avengers: Age of Ultron | 7.3 14 Ant-Man | 7.3 15 Captain America: Civil War | 7.8 16 Doctor Strange | 7.5 17 Guardians of the Galaxy Vol. 2 | 7.6 18 Spider-Man: Homecoming | 7.4 19 Thor: Ragnarok | 7.9 20 Black Panther | 7.3 21 Avengers: Infinity War | 8.4 22 Ant-Man and The Wasp | 7.1 23 Captain Marvel | 6.9 24 Avengers: Endgame | 8.4 25 Spider-Man: Far From Home | 7.5 26 Black Widow | 6.8 27 Shang-Chi and the Legend of the Ten Rings | 7.6 28 Eternals | 6.8 29 Spider-Man: No Way Home | 8.4 30 Doctor Strange in the Multiverse of Madness | 7.8 31 Thor: Love and Thunder | 7.5 32 Black Panther: Wakanda Forever | 7.3 33 Ant-Man and The Wasp: Quantumania | 6.2 34 Guardians of the Galaxy Vol. 3 | 8.1 35 The Marvels | 6.1 36(33 rows)
The result table lists movie names and their respective IMDb rating.
Subqueries are queries nested within another SQL query. They can be used in various places such as SELECT
, FROM
, WHERE
, and others. Subqueries are useful for situations where you need to perform multiple steps to retrieve the desired data.
The syntax of subqueries is:
SQL1SELECT column1, column2, ... 2FROM table_name1 3WHERE column_name operator (SELECT column1 4 FROM table_name2 5 WHERE condition);
Let's take a look at each part of this query:
Outer Query
SELECT column1, column2, ...
: Specifies the columns you want to retrieve fromtable_name1
.FROM table_name1
: Specifies the main table from which you want to fetch the data.WHERE column_name operator
: Begins theWHERE
clause that will filter the rows based on a condition involving a subquery.
Subquery (Inner Query)
SELECT column1
: Specifies the column you want to retrieve in the subquery.FROM table_name2
: Specifies the table from which the subquery will fetch the data.WHERE condition
: Applies a condition to filter the rows in the subquery.
Operator:
- The
operator
in theWHERE
clause can be a comparison operator (such as=
,<
,>
,<=
,>=
,<>
) or other operators likeIN
,EXISTS
.
With this foundation of subquery syntax, let's take a look at an example. One common use of subqueries is to compare values against an average or other aggregate results. Let's form a query to find the names of movies that have an IMDb rating greater than the average IMDb rating of all the movies.
First we need a query to find the average IMDb rating.
SQL1SELECT 2 AVG(imdb_rating) 3FROM 4 movie_details;
The output of this query is:
Plain text1 avg 2-------------------- 3 7.4030303030303030 4(1 row)
The average IMDb rating of the movies is approximately 7.4.
We will use the resulting average to filter movies in the outer query.
SQL1SELECT
2 movie_name,
3 imdb_rating
4FROM
5 movies
6INNER JOIN
7 movie_details ON movies.movie_id = movie_details.movie_id
8WHERE
9 imdb_rating > (
10 SELECT
11 AVG(imdb_rating)
12 FROM
13 movie_details
14 );
In this final query:
- We join the
movies
andmovie_details
tables using themovie_id
column. - We use a subquery to calculate the average IMDb rating
- We filter movies that have an IMDb rating greater than the average.
The output is
Plain text1 movie_name | imdb_rating 2---------------------------------------------+------------- 3 Iron Man | 7.9 4 The Avengers | 8.0 5 Captain America: The Winter Soldier | 7.7 6 Guardians of the Galaxy | 8.0 7 Captain America: Civil War | 7.8 8 Doctor Strange | 7.5 9 Guardians of the Galaxy Vol. 2 | 7.6 10 Thor: Ragnarok | 7.9 11 Avengers: Infinity War | 8.4 12 Avengers: Endgame | 8.4 13 Spider-Man: Far From Home | 7.5 14 Shang-Chi and the Legend of the Ten Rings | 7.6 15 Spider-Man: No Way Home | 8.4 16 Doctor Strange in the Multiverse of Madness | 7.8 17 Thor: Love and Thunder | 7.5 18 Guardians of the Galaxy Vol. 3 | 8.1 19(16 rows)
This output shows the movie name and IMDb rating for every movie greater than the average IMDb rating (7.4).
This powerful combination of JOIN
and subqueries allows us to perform complex data analysis with ease.
Congratulations! You have successfully learned how to use subqueries in PostgreSQL to achieve more complex data analysis tasks. Here's a quick recap of what we covered:
- The importance and usage of
JOIN
operations. - The syntax of subqueries
- A step-by-step breakdown of comparing values against an average using subqueries.
Now it’s time to apply what you've learned with some practice exercises. Happy querying!