Introduction and Overview

Welcome back! In this lesson, we will focus on mastering subqueries for data analysis in PostgreSQL. Subqueries, also known as inner queries or nested queries, allow you to perform more complex queries by embedding one query inside another. This technique is particularly useful for comparing aggregated data, filtering results, and more. Let’s dive in!

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Movie Details Table

Characters Table

Understanding Joining Tables

Before we explore subqueries, let's briefly review joining tables in SQL. Joining tables is a fundamental operation used to combine data from two or more tables based on a related column.

Consider the following example:

In this example:

  • We select movie_name from the movies table and imdb_rating from the movie_details table
  • We join the movies table with the movie_details table using the movie_id column, which is common to both tables.

The output is:

The result table lists movie names and their respective IMDb rating.

Introduction to Subqueries and Syntax

Subqueries are queries nested within another SQL query. They can be used in various places such as SELECT, FROM, WHERE, and others. Subqueries are useful for situations where you need to perform multiple steps to retrieve the desired data.

The syntax of subqueries is:

Let's take a look at each part of this query:

Outer Query

  • SELECT column1, column2, ...: Specifies the columns you want to retrieve from table_name1.
  • FROM table_name1: Specifies the main table from which you want to fetch the data.
  • WHERE column_name operator: Begins the WHERE clause that will filter the rows based on a condition involving a subquery.

Subquery (Inner Query)

  • SELECT column1: Specifies the column you want to retrieve in the subquery.
  • FROM table_name2: Specifies the table from which the subquery will fetch the data.
  • WHERE condition: Applies a condition to filter the rows in the subquery.

Operator:

  • The operator in the WHERE clause can be a comparison operator (such as =, <, >, <=, >=, <>) or other operators like IN, EXISTS.
Subquery Example

With this foundation of subquery syntax, let's take a look at an example. One common use of subqueries is to compare values against an average or other aggregate results. Let's form a query to find the names of movies that have an IMDb rating greater than the average IMDb rating of all the movies.

First we need a query to find the average IMDb rating.

The output of this query is:

The average IMDb rating of the movies is approximately 7.4.

We will use the resulting average to filter movies in the outer query.

In this final query:

  • We join the movies and movie_details tables using the movie_id column.
  • We use a subquery to calculate the average IMDb rating
  • We filter movies that have an IMDb rating greater than the average.

The output is

This output shows the movie name and IMDb rating for every movie greater than the average IMDb rating (7.4).

This powerful combination of JOIN and subqueries allows us to perform complex data analysis with ease.

Summary and Next Steps

Congratulations! You have successfully learned how to use subqueries in PostgreSQL to achieve more complex data analysis tasks. Here's a quick recap of what we covered:

  • The importance and usage of JOIN operations.
  • The syntax of subqueries
  • A step-by-step breakdown of comparing values against an average using subqueries.

Now it’s time to apply what you've learned with some practice exercises. Happy querying!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal