Introduction

Welcome to the next lesson! In this lesson, we will focus on analyzing Marvel movies trends using the GROUP BY clause. This clause is a crucial part of SQL that allows you to summarize information and gain insights based on specific categories. By the end of this lesson, you will be able to group data effectively and analyze trends using PostgreSQL.

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Movie Details Table

Characters Table

Exploring the GROUP BY Clause

The GROUP BY clause groups rows that have the same values into summary rows. It is often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of data.

In simpler terms, if you have a dataset with numerous rows, the GROUP BY clause can help you categorize this data into meaningful groups. For example, you might want to group movies by their release year or by the phase they belong to.

Basic Syntax of GROUP BY

Before diving into examples, let's explore the basic syntax of the GROUP BY clause.

  • SELECT column1, aggregate_function(column2): Selects the columns you want to include in the result. One column should be a column you want to group by, and the other should be an aggregate function applied to another column.
  • FROM table_name: Specifies the table from which to retrieve the data.
  • GROUP BY column1: Groups the results by the specified column.
Grouping by a Single Column

Now, let's write an SQL query to group the movies by their phases and count the number of movies in each phase.

  • SELECT phase, COUNT(movie_id) AS "Number of Movies": Selects the phase column and counts the number of movie_id in each phase.
  • FROM movies: Specifies the table from which to retrieve the data.
  • GROUP BY phase: Groups the results by the phase column.

The output is:

The output has two columns. The phase column and Number of Movies column. The output is grouped by the phase value with the corresponding COUNT of movies in each phase.

Advanced Example: Grouping by Date

Let's move on to a more advanced example. Suppose you want to group movies by their release year instead of phases. Here is how you can do it using the EXTRACT function:

  • The EXTRACT function is used to retrieve a specific part of a date or time value. In this case, it extracts the year from the release_date.

Let's break down this full query line by line:

  • SELECT EXTRACT(year FROM release_date) AS "Year", COUNT(movie_id) as "Total Movies": This part of the query extracts the year from the release_date and labels it as Year. It also counts the number of movie_id values in each year and labels it as Total Movies.
  • FROM movies: Specifies the movies table as the source of the data.
  • GROUP BY EXTRACT(year FROM release_date): Groups the results by the year extracted from the release_date.
  • ORDER BY "Total Movies" DESC: Sorts the results in descending order based on the Total Movies column.

The output is:

This query shows the number of movies released each year, sorted by the total number of movies in descending order.

Summary and Review

In this lesson, you learned how to use the GROUP BY clause in PostgreSQL to analyze trends in your data. We covered:

  • The purpose and usage of the GROUP BY clause.
  • A basic example of grouping movies by their phases.
  • Using EXTRACT to group columns by year

Congratulations on completing this lesson! Take some time to practice the exercises that follow and continue to explore the capabilities of SQL.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal