Welcome to this lesson on aggregate functions in PostgreSQL. In the previous lessons, we covered how to use the COUNT
function for quantitative analysis and utilized the DISTINCT
clause to explore unique values in our data. Now, we will dive into more advanced data analysis techniques. In this lesson, we will explore the SUM
, AVG
, MAX
and MIN
functions.
By the end of this lesson, you'll be able to analyze different aspects of the movie data, like calculating total box office earnings, average IMDb ratings, and identifying extreme values.
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Plain text1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------
Movie Details Table
Plain text1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------
Characters Table
Plain text1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Here is the basic syntax for using aggregate functions.
SQL1SELECT aggregate_function(column_name) 2FROM table_name;
We want to find the total box office sales for all 33 movies in our dataset. To do this, we use the SUM()
aggregate function to add the values of box_office_million_usd
of each row in the movie_details
table. We also want the resulting table to have a column named "total_sales". The query is:
SQL1SELECT SUM(movie_details.box_office_million_usd) AS "Total Sales" 2FROM movie_details;
In this code, we pass the movie_details.box_office_million_usd
column into the SUM
function. We use AS
to create an alias for the column called "Total Sales".
The result is:
Plain text1 Total Sales 2------------- 3 30481.9 4(1 row)
Using the SUM
function, we can see that the total box office sales from all 33 movies is 30481.9 million dollars ($30,481,900,000).
Now, let's use the AVG
function, to find the average IMDb rating for movies with a runtime longer than 120 minutes and less than or equal to 120 minutes.
The queries are:
SQL1SELECT AVG(movie_details.imdb_rating)
2FROM movie_details
3WHERE movie_details.runtime_minutes > 120;
4
5SELECT AVG(movie_details.imdb_rating)
6FROM movie_details
7WHERE movie_details.runtime_minutes <= 120;
In the code, we pass the movie_details.imdb_rating
into the AVG
function. In the first query, we only use movies with a runtime greater than 120 minutes. The second query only uses movies with a runtime less than or equal to 120 minutes.
The output is:
Plain text1 avg 2-------------------- 3 7.4740740740740741 4(1 row) 5 6 avg 7-------------------- 8 7.0833333333333333 9(1 row)
Using the AVG
function with the WHERE
clause, we see that movies with a runtime longer than 2 hours tend to have higher IMDb ratings.
The MAX
and MIN
functions find the maximum and minimum values of a column. MAX
and MIN
can accept numbers or even text. The MIN
value of a text column is the text that comes first alphabetically. Similarly, MAX
finds the text that comes last alphabetically.
SQL1SELECT MIN(movies.movie_name) AS "First Movie", 2MAX(movies.movie_name) AS "Last Movie" 3FROM movies;
In this code, we pass the movies.movie_name
into both the MAX
and MIN
function.
The output is:
Plain text1 First Movie | Last Movie 2-------------+---------------------- 3 Ant-Man | Thor: The Dark World 4(1 row)
When listing the movies in alphabetical order, "Ant-Man" comes first and "Thor: The Dark World" comes last.
To recap, in this lesson, we focused on the following key points:
- Using aggregate functions like
SUM()
,AVG()
,MAX()
, andMIN()
. - Using
AS
to create aliases for aggregate functions - Filtering aggregate function input using the
WHERE
Next, you'll apply what you've learned through practice exercises to solidify your understanding of these concepts. Happy querying!