Welcome back! In our previous lessons, we've explored various ways to query and analyze data using PostgreSQL. We began by mastering text-based queries, moved on to understanding subqueries, and learned to create new columns and perform mathematical operations.
In this lesson, we'll dive deeper into complex queries and conditional logic:
- Using the
CASE
statement to categorize data. - Combining multiple conditions in the
WHERE
clause. - Enhancing queries with subqueries.
These techniques will help you make your data queries more powerful and insightful.
For convenience, the tables and columns in our Marvel movies dataset are:
Movies Table
Movie Details Table
Characters Table
The CASE
statement allows you to perform conditional logic directly within your queries. It's similar to an IF-THEN-ELSE statement in programming languages.
The syntax of a CASE
statement is as follows:
CASE
: Begins the conditional logic block.WHEN condition1 THEN result1
: Ifcondition1
is true, the statement returnsresult1
.WHEN condition2 THEN result2
: Ifcondition2
is true, the statement returnsresult2
.ELSE resultN
: If none of the conditions are true, the statement returnsresultN
. This part is optional but recommended for completeness.END
: Ends theCASE
block.AS alias_name
: (Optional) Provides a temporary name for the resulting column.
Let's see a real-world application of the CASE
statement. We'll categorize movies based on their box office earnings into three categories: Flop, Hit, and Blockbuster.
We want to categorize movies as either a "Flop", "Hit", or "Blockbuster" based on box office sales. The query is:
Let's break down this query:
- We use
INNER JOIN
to combine data from themovies
andmovie_details
tables based onmovie_id
. - The
SELECT
statement retrieves movie names. CASE
begins the conditional logic block to categorize the movies based on their box office earnings.WHEN movie_details.box_office_million_usd < 400 THEN 'Flop'
: If the movie's box office earnings are less than 400 million USD, the movie is categorized as 'Flop'.WHEN movie_details.box_office_million_usd BETWEEN 400 AND 1000 THEN 'Hit'
: If the movie's box office earnings are between 400 and 1000 million USD (inclusive), the movie is categorized as 'Hit'.ELSE 'Blockbuster'
: If the movie's box office earnings are greater than 1000 million USD, the movie is categorized as 'Blockbuster'.END AS success_category
: Ends theCASE
block and gives the result column the aliassuccess_category
.
The output is:
From the output, we can see that "The Incredible Hulk" is listed as a 'Flop', "Iron Man" is listed as a 'Hit', and "The Avengers" is listed as a 'Blockbuster'.
This example demonstrates how you can use the CASE
statement to derive meaningful categories from numerical data.
Congratulations on reaching the last lesson of the "Intro to SQL with PostgreSQL" course path! Now let's put your PostgreSQL super powers to the test. In this example, we combine every query technique we have learned.
We want to create a table with the following columns:
- Movie Name: The name of the movie.
- Success Level: Categorized as 'Flop', 'Hit', or 'Blockbuster' based on the box office earnings.
- Profit in Millions: The difference between the box office earnings and the budget, representing the movie's profit.
- Average Screen Time: The rounded average screen time of characters in the movie, displayed in minutes with 'mins' appended.
Only movies with an IMDb rating higher than average should be included, and the output should be sorted by profit in descending order.
Let's take a look at this super query:
Let's break this query down line by line:
SELECT Clause:
- Retrieves the
movie_name
. - Uses a
CASE
statement to determine the "Success Level" of each movie based on itsbox_office_million_usd
:Flop
for earnings less than 400 million USD.Hit
for earnings between 400 and 1000 million USD.Blockbuster
for earnings greater than 1000 million USD.
- Computes the "Profit in Millions" by subtracting the
budget_million_usd
from thebox_office_million_usd
. - Calculates the "Average Screen Time" for characters in the movie by rounding the average
screen_time_minutes
to the nearest whole number and appending 'mins'.
FROM Clause:
- Uses the
movies
table as the base dataset.
INNER JOIN:
- Combines
movies
withmovie_details
usingmovie_id
. - Combines
movies
withcharacters
also usingmovie_id
.
WHERE Clause:
- Filters the results to include only movies where the
imdb_rating
is higher than the average IMDb rating of all movies. - The subquery
(SELECT AVG(movie_details.imdb_rating) FROM movie_details)
calculates the average IMDb rating across all movies.
GROUP BY Clause:
- Groups the results by
movie_name
,box_office_million_usd
, andbudget_million_usd
to aggregate the average screen time correctly. When using aggregate functions, all non-aggregate columns in theSELECT
statement must be included in theGROUP BY
clause.
ORDER BY Clause:
- Orders the results based on "Profit in Millions" in descending order to list the most profitable movies first.
The output is:
The output is a list of movies including their name, success level, profit in millions, and average screen time, filtered to include only those movies with above-average IMDb ratings and sorted by profit in descending order.
Notice that none of the movies are labeled as a flop. Remember that we filtered out all movies with an IMDb rating less than average. From our output, we know that any movie labeled as a flop was excluded from our table. This means that all movies that made less than 400 million dollars in profit have an IMDb rating lower than average.
Congratulations on reaching the final lesson in the "Intro to SQL with PostgreSQL" course path. You have now mastered the key techniques for dealing with complex queries and conditional logic in PostgreSQL. You've come a long way in unlocking your PostgreSQL Super Powers! Keep exploring and practicing to further enhance your skills.
