Lesson 2
Utilizing DISTINCT for Uniqueness Exploration
Introduction to the DISTINCT Clause

Welcome back! In the previous lesson, we explored the COUNT function for quantitative analysis in PostgreSQL. Today, we'll dive into another essential SQL feature: the DISTINCT clause.

The DISTINCT clause is used to return unique values from a column, helping you find and filter out duplicate data. For example, the characters table has 243 rows, but some of those characters appear in multiple movies. Therefore, they appear multiple times in the table. To find out how many distinct character names there are, we can use the DISTINCT function.

Dataset Review

For convenience, the tables and columns in our Marvel movies dataset are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+------------+--------------+-------

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+-----------------

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+----------------+-------+---------------------
Basic Syntax of DISTINCT

Let's start by understanding the syntax of the DISTINCT clause:

SQL
1SELECT DISTINCT column_name 2FROM table_name;

Here's a quick breakdown of the syntax:

  • SELECT DISTINCT: This part of the command tells PostgreSQL that you want to select unique values from a specific column.
  • column_name: Replace this with the name of the column you want to filter for unique values.
  • FROM table_name: Replace this with the name of the table containing the data.

Now, let's see it in action with a practical example.

Selecting Unique Character Names

Suppose we want to get a list of unique character names. The SQL query is:

SQL
1SELECT DISTINCT character_name 2FROM characters;
  • SELECT DISTINCT character_name: This part selects unique character_name values from the characters table.
  • FROM characters: Specifies the characters table where the data is located.

When you run this query, you will get a table of unique character names. The first 20 entries of the output are:

Plain text
1 character_name 2------------------------------------- 3 Stephen Strange/Doctor Strange 4 JARVIS 5 Mysterio 6 Adrian Toomes/Vulture 7 Karl Mordo 8 Christine Palmer 9 Nebula 10 Rocket 11 Gamora 12 Phil Coulson 13 Makkari 14 Flash Thompson 15 Drax the Destroyer 16 Gorr the God Butcher 17 Ned Leeds 18 Max Dillon/Electro 19 Odin 20 Everett Ross 21 Ronan the Accuser 22 Pepper Potts 23.... 24(147 rows)

By using the DISTINCT clause, we eliminate the duplicate entries and get a list of unique character names. The output tells us that there are 147 unique character names in the characters table.

Combining DISTINCT and COUNT

To combine COUNT and DISTINCT, you can use the COUNT function with the DISTINCT keyword inside it. This allows you to count the number of unique values in a column without listing the whole output table.

Here's the syntax:

SQL
1SELECT COUNT(DISTINCT column_name) 2FROM table_name;

The query to count the number of unique character names is:

SQL
1SELECT COUNT(DISTINCT character_name) AS unique_character_count 2FROM characters;
  • COUNT(DISTINCT character_name): This part counts the number of unique character_name values in the characters table.
  • AS unique_character_count: This assigns an alias unique_character_count to the resulting count.
  • FROM characters: Specifies the characters table where the data is located.

The following output shows that there are 147 distinct character names.

Plain text
1 unique_character_count 2------------------------ 3 147 4(1 row)
Summary and Next Steps

Today, you learned how to use the DISTINCT clause to extract unique values from a column in PostgreSQL. Let's summarize the key points:

  • The DISTINCT clause helps eliminate duplicate rows from queries.
  • We walked through a practical example of selecting unique character names from a table.
  • We combined the COUNT function and DISTINCT clause to find the number of unique character names

Now, you're ready to apply what you've learned in the upcoming practice exercises. These exercises will help reinforce your understanding and give you hands-on experience with the DISTINCT clause. Keep up the great work, and let's continue our journey into mastering SQL functions and clauses with PostgreSQL!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.