Welcome back! In this lesson, we will focus on mastering subqueries for data analysis in PostgreSQL. Subqueries, also known as inner queries or nested queries, allow you to perform more complex queries by embedding one query inside another. This technique is particularly useful for comparing aggregated data, filtering results, and more. Let’s dive in!
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Movie Details Table
Characters Table
Before we explore subqueries, let's briefly review joining tables in SQL. Joining tables is a fundamental operation used to combine data from two or more tables based on a related column.
Consider the following example:
In this example:
- We select
movie_name
from themovies
table andimdb_rating
from themovie_details
table - We join the
movies
table with themovie_details
table using themovie_id
column, which is common to both tables.
The output is:
The result table lists movie names and their respective IMDb rating.
Subqueries are queries nested within another SQL query. They can be used in various places such as SELECT
, FROM
, WHERE
, and others. Subqueries are useful for situations where you need to perform multiple steps to retrieve the desired data.
The syntax of subqueries is:
Let's take a look at each part of this query:
Outer Query
SELECT column1, column2, ...
: Specifies the columns you want to retrieve fromtable_name1
.FROM table_name1
: Specifies the main table from which you want to fetch the data.WHERE column_name operator
: Begins theWHERE
clause that will filter the rows based on a condition involving a subquery.
Subquery (Inner Query)
SELECT column1
: Specifies the column you want to retrieve in the subquery.FROM table_name2
: Specifies the table from which the subquery will fetch the data.WHERE condition
: Applies a condition to filter the rows in the subquery.
Operator:
- The
operator
in theWHERE
clause can be a comparison operator (such as=
,<
,>
,<=
,>=
,<>
) or other operators likeIN
,EXISTS
.
With this foundation of subquery syntax, let's take a look at an example. One common use of subqueries is to compare values against an average or other aggregate results. Let's form a query to find the names of movies that have an IMDb rating greater than the average IMDb rating of all the movies.
First we need a query to find the average IMDb rating.
The output of this query is:
The average IMDb rating of the movies is approximately 7.4.
We will use the resulting average to filter movies in the outer query.
In this final query:
- We join the
movies
andmovie_details
tables using themovie_id
column. - We use a subquery to calculate the average IMDb rating
- We filter movies that have an IMDb rating greater than the average.
The output is
This output shows the movie name and IMDb rating for every movie greater than the average IMDb rating (7.4).
This powerful combination of JOIN
and subqueries allows us to perform complex data analysis with ease.
Congratulations! You have successfully learned how to use subqueries in PostgreSQL to achieve more complex data analysis tasks. Here's a quick recap of what we covered:
- The importance and usage of
JOIN
operations. - The syntax of subqueries
- A step-by-step breakdown of comparing values against an average using subqueries.
Now it’s time to apply what you've learned with some practice exercises. Happy querying!
