Lesson 5
Applying SQL Aggregate Functions to Multiple Tables
Introduction

Welcome back! In this lesson, we're going to dive into SQL aggregate functions on multiple tables. This lesson will combine everything you have learned so far about SQL queries.

Let's get started putting everything you've learned together.

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Aggregating Total Box Office Earnings by Phase Using JOINs

Let's begin with our first example, where we will aggregate total box office earnings by phase. We will use an INNER JOIN on the movies and movie_details tables. We will then use the SUM function and the GROUP BY clause to find the total box office earnings per phase. This advanced query is:

SQL
1SELECT movies.phase, 2SUM(movie_details.box_office_million_usd) AS "Total Box Office" 3FROM movies 4INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id 5GROUP BY movies.phase;

There's a lot going on in this query. Let's break it down step by step.

  • We first use SELECT to obtain the movies.phase column and the sum of box office sales. We use the alias Total Box Office for the result of the SUM.
  • FROM movies selects the primary table
  • INNER JOIN movie_details specifies the table to join with movies
  • ON movies.movie_id = movie_details.movie_id matches the rows of the movies and movie_details column based on the movie_id
  • GROUP BY movies.phase specifies the rows of the output table each correspond to a phase

The output is:

Plain text
1 phase | Total Box Office 2-------+------------------ 3 3 | 13501.8 4 5 | 1521 5 4 | 6374.8 6 2 | 5272.3 7 1 | 3812.0 8(5 rows)

The output shows the sum of box office sales for each phase.

Aggregating Average IMDb Ratings by Phase for Movies with Thor

In this example, we want to find the average IMDb score for the movies in each phase that feature the character Thor. The steps are:

  • Use INNER JOIN to join all 3 tables based on the movie_id column.
  • Find the average IMDb rating
  • Filter the movies by only using movies that feature Thor.
  • Group the rows by movies.phase
  • Sort the average IMDb ratings in descending order.

Let's take a look at the most advanced query yet:

SQL
1SELECT movies.phase, AVG(movie_details.imdb_rating) AS "Average Rating" 2FROM movies 3INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id 4INNER JOIN characters ON movies.movie_id = characters.movie_id 5WHERE characters.character_name LIKE '%Thor%' 6GROUP BY movies.phase 7ORDER BY "Average Rating" DESC;

Let's look at each line individually:

SELECT movies.phase, AVG(movie_details.imdb_rating) AS "Average Rating"

  • This line specifies that our result should have a phase column from the movies table and a column called Average Rating with the average IMDb rating

FROM movies

  • Specifies the movies table as the primary table from which data will be fetched.

INNER JOIN movie_details ON movies.movie_id = movie_details.movie_id

  • Combines the movies table with the movie_details table, matching rows based on the movie_id column, which must be present in both tables.

INNER JOIN characters ON movies.movie_id = characters.movie_id

  • Further combines the result set with the characters table, again matching rows based on the movie_id column.

WHERE characters.character_name LIKE '%Thor%'

  • Filters the records to include only those where the character_name column in the characters table contains the text "Thor". The % wildcard allows for any characters to appear before or after "Thor".

GROUP BY movies.phase

  • Groups the resulting records by the phase column from the movies table, ensuring that the average IMDb rating is calculated separately for each phase.

ORDER BY "Average Rating" DESC;

  • Sorts the final result set in descending order based on the calculated Average Rating.

The output is:

Plain text
1 phase | Average Rating 2-------+-------------------- 3 3 | 8.2333333333333333 4 1 | 7.5000000000000000 5 4 | 7.5000000000000000 6 2 | 7.1000000000000000 7(4 rows)

Let's analze this output:

  • phase: This column represents the phase number from the movies table.
  • Average Rating: This column shows the calculated average IMDb rating for movies featuring Thor.

Phase 3

  • The average IMDb rating for movies featuring Thor in Phase 3 is approximately 8.23.

Phase 1

  • The average IMDb rating for Thor movies in Phase 1 is 7.50.

Phase 4

  • The average IMDb rating for Thor movies in Phase 4 is also 7.50.

Phase 2

  • The average IMDb rating for Thor movies in Phase 2 is approximately 7.10.

This allows us to see at a glance which phase had the highest and lowest average IMDb ratings for movies featuring Thor.

Summary and Preparation for Practice

Congratulations on finishing the last lesson of this course. In this lesson, we covered how to use SQL aggregate functions such as SUM() and AVG() alongside the GROUP BY clause in PostgreSQL. We also demonstrated how to join multiple tables and filter data using the WHERE clause with conditions like LIKE.

Now it's your turn to practice these concepts. The upcoming exercises will help you solidify your understanding and build confidence in writing SQL queries using aggregate functions.

Great work, and happy querying!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.