Lesson 4
Analyzing Trends with GROUP BY
Introduction

Welcome to the next lesson! In this lesson, we will focus on analyzing Marvel movies trends using the GROUP BY clause. This clause is a crucial part of SQL that allows you to summarize information and gain insights based on specific categories. By the end of this lesson, you will be able to group data effectively and analyze trends using PostgreSQL.

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Exploring the GROUP BY Clause

The GROUP BY clause groups rows that have the same values into summary rows. It is often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of data.

In simpler terms, if you have a dataset with numerous rows, the GROUP BY clause can help you categorize this data into meaningful groups. For example, you might want to group movies by their release year or by the phase they belong to.

Basic Syntax of GROUP BY

Before diving into examples, let's explore the basic syntax of the GROUP BY clause.

SQL
1SELECT column1, aggregate_function(column2) 2FROM table_name 3GROUP BY column1;
  • SELECT column1, aggregate_function(column2): Selects the columns you want to include in the result. One column should be a column you want to group by, and the other should be an aggregate function applied to another column.
  • FROM table_name: Specifies the table from which to retrieve the data.
  • GROUP BY column1: Groups the results by the specified column.
Grouping by a Single Column

Now, let's write an SQL query to group the movies by their phases and count the number of movies in each phase.

SQL
1SELECT phase, COUNT(movie_id) AS "Number of Movies" 2FROM movies 3GROUP BY phase;
  • SELECT phase, COUNT(movie_id) AS "Number of Movies": Selects the phase column and counts the number of movie_id in each phase.
  • FROM movies: Specifies the table from which to retrieve the data.
  • GROUP BY phase: Groups the results by the phase column.

The output is:

Plain text
1 phase | "Number of Movies" 2-------+---------------- 3 3 | 11 4 5 | 3 5 4 | 7 6 2 | 6 7 1 | 6 8(5 rows)

The output has two columns. The phase column and Number of Movies column. The output is grouped by the phase value with the corresponding COUNT of movies in each phase.

Advanced Example: Grouping by Date

Let's move on to a more advanced example. Suppose you want to group movies by their release year instead of phases. Here is how you can do it using the EXTRACT function:

SQL
1SELECT EXTRACT(year FROM release_date) AS "Year", 2COUNT(movie_id) as "Total Movies" 3FROM movies 4GROUP BY EXTRACT(year FROM release_date) 5ORDER BY "Total Movies" DESC;
  • The EXTRACT function is used to retrieve a specific part of a date or time value. In this case, it extracts the year from the release_date.

Let's break down this full query line by line:

  • SELECT EXTRACT(year FROM release_date) AS "Year", COUNT(movie_id) as "Total Movies": This part of the query extracts the year from the release_date and labels it as Year. It also counts the number of movie_id values in each year and labels it as Total Movies.
  • FROM movies: Specifies the movies table as the source of the data.
  • GROUP BY EXTRACT(year FROM release_date): Groups the results by the year extracted from the release_date.
  • ORDER BY "Total Movies" DESC: Sorts the results in descending order based on the Total Movies column.

The output is:

Plain text
1 Year | Total Movies 2------+-------------- 3 2021 | 4 4 2018 | 3 5 2017 | 3 6 2023 | 3 7 2022 | 3 8 2019 | 3 9 2013 | 2 10 2014 | 2 11 2016 | 2 12 2008 | 2 13 2011 | 2 14 2015 | 2 15 2012 | 1 16 2010 | 1 17(14 rows)

This query shows the number of movies released each year, sorted by the total number of movies in descending order.

Summary and Review

In this lesson, you learned how to use the GROUP BY clause in PostgreSQL to analyze trends in your data. We covered:

  • The purpose and usage of the GROUP BY clause.
  • A basic example of grouping movies by their phases.
  • Using EXTRACT to group columns by year

Congratulations on completing this lesson! Take some time to practice the exercises that follow and continue to explore the capabilities of SQL.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.