Welcome back! In this lesson, we're going to dive into SQL aggregate functions on multiple tables. This lesson will combine everything you have learned so far about SQL queries.
Let's get started putting everything you've learned together.
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Plain text1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------
Movie Details Table
Plain text1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------
Characters Table
Plain text1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Let's begin with our first example, where we will aggregate total box office earnings by phase. We will use an INNER JOIN
on the movies
and movie_details
tables. We will then use the SUM
function and the GROUP BY
clause to find the total box office earnings per phase. This advanced query is:
SQL1SELECT movies.phase,
2SUM(movie_details.box_office_million_usd) AS "Total Box Office"
3FROM movies
4INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id
5GROUP BY movies.phase;
There's a lot going on in this query. Let's break it down step by step.
- We first use
SELECT
to obtain themovies.phase
column and the sum of box office sales. We use the aliasTotal Box Office
for the result of theSUM
. FROM movies
selects the primary tableINNER JOIN movie_details
specifies the table to join withmovies
ON movies.movie_id = movie_details.movie_id
matches the rows of themovies
andmovie_details
column based on themovie_id
GROUP BY movies.phase
specifies the rows of the output table each correspond to a phase
The output is:
Plain text1 phase | Total Box Office 2-------+------------------ 3 3 | 13501.8 4 5 | 1521 5 4 | 6374.8 6 2 | 5272.3 7 1 | 3812.0 8(5 rows)
The output shows the sum of box office sales for each phase.
In this example, we want to find the average IMDb score for the movies in each phase that feature the character Thor. The steps are:
- Use
INNER JOIN
to join all 3 tables based on themovie_id
column. - Find the average IMDb rating
- Filter the movies by only using movies that feature Thor.
- Group the rows by
movies.phase
- Sort the average IMDb ratings in descending order.
Let's take a look at the most advanced query yet:
SQL1SELECT movies.phase, AVG(movie_details.imdb_rating) AS "Average Rating"
2FROM movies
3INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id
4INNER JOIN characters ON movies.movie_id = characters.movie_id
5WHERE characters.character_name LIKE '%Thor%'
6GROUP BY movies.phase
7ORDER BY "Average Rating" DESC;
Let's look at each line individually:
SELECT movies.phase, AVG(movie_details.imdb_rating) AS "Average Rating"
- This line specifies that our result should have a
phase
column from themovies
table and a column calledAverage Rating
with the average IMDb rating
FROM movies
- Specifies the
movies
table as the primary table from which data will be fetched.
INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id
- Combines the
movies
table with themovie_details
table, matching rows based on themovie_id
column, which must be present in both tables.
INNER JOIN characters ON movies.movie_id = characters.movie_id
- Further combines the result set with the
characters
table, again matching rows based on themovie_id
column.
WHERE characters.character_name LIKE '%Thor%'
- Filters the records to include only those where the
character_name
column in thecharacters
table contains the text "Thor". The%
wildcard allows for any characters to appear before or after "Thor".
GROUP BY movies.phase
- Groups the resulting records by the
phase
column from themovies
table, ensuring that the average IMDb rating is calculated separately for each phase.
ORDER BY "Average Rating" DESC;
- Sorts the final result set in descending order based on the calculated
Average Rating
.
The output is:
Plain text1 phase | Average Rating 2-------+-------------------- 3 3 | 8.2333333333333333 4 1 | 7.5000000000000000 5 4 | 7.5000000000000000 6 2 | 7.1000000000000000 7(4 rows)
Let's analze this output:
phase
: This column represents the phase number from themovies
table.Average Rating
: This column shows the calculated average IMDb rating for movies featuring Thor.
Phase 3
- The average IMDb rating for movies featuring Thor in Phase 3 is approximately 8.23.
Phase 1
- The average IMDb rating for Thor movies in Phase 1 is 7.50.
Phase 4
- The average IMDb rating for Thor movies in Phase 4 is also 7.50.
Phase 2
- The average IMDb rating for Thor movies in Phase 2 is approximately 7.10.
This allows us to see at a glance which phase had the highest and lowest average IMDb ratings for movies featuring Thor.
Congratulations on finishing the last lesson of this course. In this lesson, we covered how to use SQL aggregate functions such as SUM()
and AVG()
alongside the GROUP BY
clause in PostgreSQL. We also demonstrated how to join multiple tables and filter data using the WHERE
clause with conditions like LIKE
.
Now it's your turn to practice these concepts. The upcoming exercises will help you solidify your understanding and build confidence in writing SQL queries using aggregate functions.
Great work, and happy querying!