Welcome to the next lesson! In this lesson, we will focus on analyzing Marvel movies trends using the GROUP BY clause. This clause is a crucial part of SQL that allows you to summarize information and gain insights based on specific categories. By the end of this lesson, you will be able to group data effectively and analyze trends using PostgreSQL.
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Movie Details Table
Characters Table
The GROUP BY clause groups rows that have the same values into summary rows. It is often used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of data.
In simpler terms, if you have a dataset with numerous rows, the GROUP BY clause can help you categorize this data into meaningful groups. For example, you might want to group movies by their release year or by the phase they belong to.
Before diving into examples, let's explore the basic syntax of the GROUP BY clause.
SELECT column1, aggregate_function(column2): Selects the columns you want to include in the result. One column should be a column you want to group by, and the other should be an aggregate function applied to another column.FROM table_name: Specifies the table from which to retrieve the data.GROUP BY column1: Groups the results by the specified column.
Now, let's write an SQL query to group the movies by their phases and count the number of movies in each phase.
SELECT phase, COUNT(movie_id) AS "Number of Movies": Selects thephasecolumn and counts the number ofmovie_idin each phase.FROM movies: Specifies the table from which to retrieve the data.GROUP BY phase: Groups the results by thephasecolumn.
The output is:
The output has two columns. The phase column and Number of Movies column. The output is grouped by the phase value with the corresponding COUNT of movies in each phase.
Let's move on to a more advanced example. Suppose you want to group movies by their release year instead of phases. Here is how you can do it using the EXTRACT function:
- The
EXTRACTfunction is used to retrieve a specific part of a date or time value. In this case, it extracts the year from therelease_date.
Let's break down this full query line by line:
SELECT EXTRACT(year FROM release_date) AS "Year", COUNT(movie_id) as "Total Movies": This part of the query extracts the year from therelease_dateand labels it asYear. It also counts the number ofmovie_idvalues in each year and labels it asTotal Movies.FROM movies: Specifies themoviestable as the source of the data.GROUP BY EXTRACT(year FROM release_date): Groups the results by the year extracted from therelease_date.ORDER BY "Total Movies" DESC: Sorts the results in descending order based on the column.
In this lesson, you learned how to use the GROUP BY clause in PostgreSQL to analyze trends in your data. We covered:
- The purpose and usage of the
GROUP BYclause. - A basic example of grouping movies by their phases.
- Using
EXTRACTto group columns by year
Congratulations on completing this lesson! Take some time to practice the exercises that follow and continue to explore the capabilities of SQL.
