Lesson 1
Mastering Text-Based Queries in PostgreSQL
Introduction and Overview

Welcome to the first lesson of our course on Advanced Query Techniques and Conditional Logic in PostgreSQL. In this lesson, we will focus on mastering text-based queries.

By the end of this lesson, you will know how to search for patterns in text and handle case sensitivity effectively using PostgreSQL.

Dataset Overview

This course uses the Marvel movies dataset from previous courses. The dataset has a movies table, movie_details table, and a characters table. As a refresher, the tables are:

Movies Table

Plain text
1 movie_id | movie_name | release_date | phase 2----------+---------------------------------------------+--------------+------- 3 1 | Iron Man | 2008-05-02 | 1 4 2 | The Incredible Hulk | 2008-06-13 | 1 5 3 | Iron Man 2 | 2010-05-07 | 1 6 4 | Thor | 2011-05-06 | 1 7 5 | Captain America: The First Avenger | 2011-07-22 | 1 8 6 | The Avengers | 2012-05-04 | 1 9 7 | Iron Man 3 | 2013-05-03 | 2 10 8 | Thor: The Dark World | 2013-11-08 | 2 11 9 | Captain America: The Winter Soldier | 2014-04-04 | 2 12 10 | Guardians of the Galaxy | 2014-08-01 | 2 13 11 | Avengers: Age of Ultron | 2015-05-01 | 2 14 12 | Ant-Man | 2015-07-17 | 2 15 13 | Captain America: Civil War | 2016-05-06 | 3 16 14 | Doctor Strange | 2016-11-04 | 3 17 15 | Guardians of the Galaxy Vol. 2 | 2017-05-05 | 3 18 16 | Spider-Man: Homecoming | 2017-07-07 | 3 19 17 | Thor: Ragnarok | 2017-11-03 | 3 20 18 | Black Panther | 2018-02-16 | 3 21 19 | Avengers: Infinity War | 2018-04-27 | 3 22 20 | Ant-Man and The Wasp | 2018-07-06 | 3 23 21 | Captain Marvel | 2019-03-08 | 3 24 22 | Avengers: Endgame | 2019-04-26 | 3 25 23 | Spider-Man: Far From Home | 2019-07-02 | 3 26 24 | Black Widow | 2021-07-09 | 4 27 25 | Shang-Chi and the Legend of the Ten Rings | 2021-09-03 | 4 28 26 | Eternals | 2021-11-05 | 4 29 27 | Spider-Man: No Way Home | 2021-12-17 | 4 30 28 | Doctor Strange in the Multiverse of Madness | 2022-05-06 | 4 31 29 | Thor: Love and Thunder | 2022-07-08 | 4 32 30 | Black Panther: Wakanda Forever | 2022-11-11 | 4 33 31 | Ant-Man and The Wasp: Quantumania | 2023-02-17 | 5 34 32 | Guardians of the Galaxy Vol. 3 | 2023-05-05 | 5 35 33 | The Marvels | 2023-11-10 | 5 36(33 rows)

The movies table includes the first 33 Marvel movies. Each entry in the table includes a value for movie_id, movie_name, release_date and phase. The movie_id column corresponds with the movie_id columns from the movie_details and characters tables.

Movie Details Table

Plain text
1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+----------------- 3 1 | 140 | 585.2 | 7.9 | 126 4 2 | 150 | 263.4 | 6.7 | 112 5 3 | 200 | 623.9 | 7.0 | 124 6 4 | 150 | 449.3 | 7.0 | 115 7 5 | 140 | 370.6 | 6.9 | 124 8 6 | 220 | 1519.6 | 8.0 | 143 9 7 | 200 | 1215.4 | 7.2 | 130 10 8 | 170 | 644.6 | 6.9 | 112 11 9 | 170 | 714.3 | 7.7 | 136 12 10 | 170 | 773.3 | 8.0 | 121 13 11 | 250 | 1405.4 | 7.3 | 141 14 12 | 130 | 519.3 | 7.3 | 117 15 13 | 250 | 1153.3 | 7.8 | 147 16 14 | 165 | 677.7 | 7.5 | 115 17 15 | 200 | 863.8 | 7.6 | 136 18 16 | 175 | 880.2 | 7.4 | 133 19 17 | 180 | 850.8 | 7.9 | 130 20 18 | 200 | 1346.9 | 7.3 | 134 21 19 | 321 | 2048.4 | 8.4 | 149 22 20 | 162 | 622.7 | 7.1 | 118 23 21 | 175 | 1128.3 | 6.9 | 123 24 22 | 356 | 2797.8 | 8.4 | 181 25 23 | 160 | 1131.9 | 7.5 | 129 26 24 | 200 | 378.5 | 6.8 | 134 27 25 | 200 | 430.0 | 7.6 | 132 28 26 | 200 | 402.9 | 6.8 | 157 29 27 | 260 | 1995.4 | 8.4 | 148 30 28 | 180 | 1594.7 | 7.8 | 132 31 29 | 200 | 714.3 | 7.5 | 130 32 30 | 250 | 859 | 7.3 | 161 33 31 | 200 | 476 | 6.2 | 125 34 32 | 250 | 845 | 8.1 | 150 35 33 | 250 | 200 | 6.1 | 124 36(33 rows)

The movie_details table includes details for the first 33 Marvel movies. Each entry in the table includes a value for movie_id, budget_million_usd, box_office_million_usd, imdb_rating, and runtime_minutes. The movie_id column corresponds with the movie_id columns from the movies and characters tables.

Characters Table

Plain text
1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+-------------------------------------+------------------------+--------------------- 3 1 | 1 | Tony Stark | Robert Downey Jr. | 120 4 2 | 1 | Pepper Potts | Gwyneth Paltrow | 40 5 3 | 1 | James Rhodes | Terrence Howard | 30 6 4 | 1 | Obadiah Stane | Jeff Bridges | 25 7 5 | 1 | Happy Hogan | Jon Favreau | 20 8 6 | 1 | Agent Coulson | Clark Gregg | 15 9 7 | 1 | Raza | Faran Tahir | 10 10 8 | 1 | Yinsen | Shaun Toub | 10 11 9 | 2 | Bruce Banner/Hulk | Edward Norton | 110 12 10 | 2 | Betty Ross | Liv Tyler | 35 13 11 | 2 | Thaddeus Ross | William Hurt | 25 14 12 | 2 | Emil Blonsky/Abomination | Tim Roth | 20 15 13 | 2 | Leonard Samson | Ty Burrell | 15 16 14 | 2 | General Ross | William Hurt | 15 17 15 | 2 | Jack McGee | Tim Blake Nelson | 10 18 16 | 3 | Natasha Romanoff/Black Widow | Scarlett Johansson | 100 19 17 | 3 | Nick Fury | Samuel L. Jackson | 30 20 18 | 3 | James Rhodes/War Machine | Don Cheadle | 25 21 19 | 3 | Ivan Vanko/Whiplash | Mickey Rourke | 20 22 20 | 3 | Justin Hammer | Sam Rockwell | 15 23 21 | 3 | JARVIS | Paul Bettany | 10 24 22 | 3 | Howard Stark | John Slattery | 10 25...... 26(243 rows)

The characters table contains 243 entries for characters that appear in the first 33 Marvel movies. Each entry has a character_id, movie_id, character_name, actor, and screen_time_minutes. The movie_id column corresponds with the movie_id column from the movies and movie_details table.

Now that we understand the dataset, let's dive into text-based queries.

Introduction to Pattern Matching using %

Pattern matching in PostgreSQL is primarily facilitated through the use of the % wildcard within the LIKE and ILIKE operators. The % wildcard matches any sequence of characters, including an empty sequence.

The LIKE operator performs case-sensitive pattern matching while the ILIKE operator performs case-insensitive pattern matching.

Placing % at the end of a text query matches any string that starts with the text placed before %. For example, the pattern A% matches any string that starts with the letter "A" followed by any sequence of characters, including an empty sequence.

Placing % at the start of a text query matches any string that ends with the text placed after %. For example, the pattern %a matches any string that ends with the letter "a".

Finding Characters Starting with "A"

Suppose we want to find characters with names that begin with "A". The query is:

SQL
1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE 'A%';

The output is:

Plain text
1 character_name 2------------------------------- 3 Ayo 4 Aunt May 5 Alexei Shostakov/Red Guardian 6 Aldrich Killian 7 America Chavez 8 Agent Coulson 9 Adrian Toomes/Vulture 10 Alexander Pierce 11(8 rows)

Here, any character name starting with "A" (case-sensitive) followed by any sequence of characters is selected.

Find Characters Ending with "a"

The query to find character names that end with "a" (case-sensitive) preceded by any sequence of characters is:

SQL
1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE '%a';

The output is:

Plain text
1 character_name 2------------------------------ 3 Ramonda 4 Yelena Belova 5 Hela 6 Nakia 7 Raza 8 Clea 9 Steve Rogers/Captain America 10 Nebula 11 Thena 12 Gamora 13 Frigga 14 Minn-Erva 15 Yondu Udonta 16(13 rows)

Any character name ending with "a" will be included, regardless of the preceding sequence of characters.

Exact Matches

In PostgreSQL, you can use the = operator to find exact matches in text-based queries. This operator ensures only entries that exactly match the specified text are returned.

To find exact matches for the character name "Tony Stark", you can use the following query:

SQL
1SELECT character_name 2FROM characters 3WHERE character_name = 'Tony Stark';

The output is:

Plain text
1 character_name 2---------------- 3 Tony Stark 4(1 row)

This query returns only the entries where the character_name is exactly "Tony Stark", with no additional characters or variations in case.

Substring Matches

Placing % at the beginning and end of a text finds all entries that contain that substring. The % at the beginning means any sequence of characters (including an empty sequence) can come before the substring, and the % at the ends means any sequence of characters (including an empty sequence) can come after the substring. Let's take a look at an example:

SQL
1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE '%Tony Stark%';

The output of this query is:

Plain text
1 character_name 2--------------------- 3 Tony Stark 4 Tony Stark/Iron Man 5(2 rows)

The output indicates that there are two distinct entries in the character_name column that contain "Tony Stark": "Tony Stark" and "Tony Stark/Iron Man".

"Tony Stark/Iron Man" did not appear in the query containing WHERE character_name = 'Tony Stark' because "Tony Stark/Iron Man" is not an exact match.

Exploring Case Sensitivity with ILIKE

PostgreSQL provides two pattern-matching functions: LIKE and ILIKE. LIKE performs case-sensitive pattern matching, while ILIKE performs case-insensitive pattern matching.

Let's revisit the example of finding the characters whose names begin with "A". In the characters table, the first letter of each character's name is capitalized.

SQL
1SELECT character_name 2FROM characters 3WHERE character_name LIKE 'a%';

The above query does not return any entries because no character names start with lowercase "a". The output is:

Plain text
1 character_name 2---------------- 3(0 rows)

We can use ILIKE to find all names that begin with the letter "A"/"a" regardless of case.

SQL
1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name ILIKE 'a%';

The output of this query is:

Plain text
1 character_name 2------------------------------- 3 Ayo 4 Aunt May 5 Alexei Shostakov/Red Guardian 6 Aldrich Killian 7 America Chavez 8 Agent Coulson 9 Adrian Toomes/Vulture 10 Alexander Pierce 11(8 rows)

The output shows how ILIKE can be used to query text without regard to case.

Handling Case Sensitivity

Sometimes, you might want to handle case sensitivity explicitly. We can do this using the UPPER and LOWER functions. These functions return their input in all capital letters and all lowercase letters, respectively.

To find names that are fully uppercase, we use the UPPER function as follows:

SQL
1SELECT character_name 2FROM characters 3WHERE character_name = UPPER(character_name);

UPPER(character_name) returns the character_name in all caps. Using WHERE character_name = UPPER(character_name);, we find all character names that remain unchanged when passed into the UPPER function.

The output is:

Plain text
1 character_name 2---------------- 3 JARVIS 4 MJ 5 MJ 6(3 rows)

The output shows the names of characters that only contain capital letters.

Similarly, to find names in lowercase, we use the LOWER function.

SQL
1SELECT actor 2FROM characters 3WHERE actor = LOWER(actor);

The output is:

Plain text
1 actor 2------- 3(0 rows)

This query returns an empty table, as no actor names are in all lowercase.

Summary and Next Steps

In this lesson, we:

  • Reviewed the Marvel movies database
  • Explored pattern matching using % and =
  • Found exact matches and substring matches in text data
  • Handled case sensitivity using the UPPER and LOWER functions

You are now well-equipped to handle text-based queries in PostgreSQL. The next steps involve practicing these queries through exercises to reinforce your understanding. Happy querying!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.