Welcome to the next lesson! In this lesson, we will focus on analyzing Marvel movies trends using the GROUP BY
clause. This clause is a crucial part of SQL that allows you to summarize information and gain insights based on specific categories. By the end of this lesson, you will be able to group data effectively and analyze trends using PostgreSQL.
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Plain text1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------
Movie Details Table
Plain text1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------
Characters Table
Plain text1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
The GROUP BY
clause groups rows that have the same values into summary rows. It is often used with aggregate functions (COUNT
, SUM
, AVG
, MAX
, MIN
) to perform calculations on each group of data.
In simpler terms, if you have a dataset with numerous rows, the GROUP BY
clause can help you categorize this data into meaningful groups. For example, you might want to group movies by their release year or by the phase they belong to.
Before diving into examples, let's explore the basic syntax of the GROUP BY
clause.
SQL1SELECT column1, aggregate_function(column2) 2FROM table_name 3GROUP BY column1;
SELECT column1, aggregate_function(column2)
: Selects the columns you want to include in the result. One column should be a column you want to group by, and the other should be an aggregate function applied to another column.FROM table_name
: Specifies the table from which to retrieve the data.GROUP BY column1
: Groups the results by the specified column.
Now, let's write an SQL query to group the movies by their phases and count the number of movies in each phase.
SQL1SELECT phase, COUNT(movie_id) AS "Number of Movies" 2FROM movies 3GROUP BY phase;
SELECT phase, COUNT(movie_id) AS "Number of Movies"
: Selects thephase
column and counts the number ofmovie_id
in each phase.FROM movies
: Specifies the table from which to retrieve the data.GROUP BY phase
: Groups the results by thephase
column.
The output is:
Plain text1 phase | "Number of Movies" 2-------+---------------- 3 3 | 11 4 5 | 3 5 4 | 7 6 2 | 6 7 1 | 6 8(5 rows)
The output has two columns. The phase
column and Number of Movies
column. The output is grouped by the phase
value with the corresponding COUNT
of movies in each phase.
Let's move on to a more advanced example. Suppose you want to group movies by their release year instead of phases. Here is how you can do it using the EXTRACT
function:
SQL1SELECT EXTRACT(year FROM release_date) AS "Year", 2COUNT(movie_id) as "Total Movies" 3FROM movies 4GROUP BY EXTRACT(year FROM release_date) 5ORDER BY "Total Movies" DESC;
- The
EXTRACT
function is used to retrieve a specific part of a date or time value. In this case, it extracts the year from therelease_date
.
Let's break down this full query line by line:
SELECT EXTRACT(year FROM release_date) AS "Year", COUNT(movie_id) as "Total Movies"
: This part of the query extracts the year from therelease_date
and labels it asYear
. It also counts the number ofmovie_id
values in each year and labels it asTotal Movies
.FROM movies
: Specifies themovies
table as the source of the data.GROUP BY EXTRACT(year FROM release_date)
: Groups the results by the year extracted from therelease_date
.ORDER BY "Total Movies" DESC
: Sorts the results in descending order based on theTotal Movies
column.
The output is:
Plain text1 Year | Total Movies 2------+-------------- 3 2021 | 4 4 2018 | 3 5 2017 | 3 6 2023 | 3 7 2022 | 3 8 2019 | 3 9 2013 | 2 10 2014 | 2 11 2016 | 2 12 2008 | 2 13 2011 | 2 14 2015 | 2 15 2012 | 1 16 2010 | 1 17(14 rows)
This query shows the number of movies released each year, sorted by the total number of movies in descending order.
In this lesson, you learned how to use the GROUP BY
clause in PostgreSQL to analyze trends in your data. We covered:
- The purpose and usage of the
GROUP BY
clause. - A basic example of grouping movies by their phases.
- Using
EXTRACT
to group columns by year
Congratulations on completing this lesson! Take some time to practice the exercises that follow and continue to explore the capabilities of SQL.