Welcome to the first lesson of our course on Advanced Query Techniques and Conditional Logic in PostgreSQL. In this lesson, we will focus on mastering text-based queries.
By the end of this lesson, you will know how to search for patterns in text and handle case sensitivity effectively using PostgreSQL.
This course uses the Marvel movies dataset from previous courses. The dataset has a movies
table, movie_details
table, and a characters
table. As a refresher, the tables are:
Movies Table
Plain text1 movie_id | movie_name | release_date | phase 2----------+---------------------------------------------+--------------+------- 3 1 | Iron Man | 2008-05-02 | 1 4 2 | The Incredible Hulk | 2008-06-13 | 1 5 3 | Iron Man 2 | 2010-05-07 | 1 6 4 | Thor | 2011-05-06 | 1 7 5 | Captain America: The First Avenger | 2011-07-22 | 1 8 6 | The Avengers | 2012-05-04 | 1 9 7 | Iron Man 3 | 2013-05-03 | 2 10 8 | Thor: The Dark World | 2013-11-08 | 2 11 9 | Captain America: The Winter Soldier | 2014-04-04 | 2 12 10 | Guardians of the Galaxy | 2014-08-01 | 2 13 11 | Avengers: Age of Ultron | 2015-05-01 | 2 14 12 | Ant-Man | 2015-07-17 | 2 15 13 | Captain America: Civil War | 2016-05-06 | 3 16 14 | Doctor Strange | 2016-11-04 | 3 17 15 | Guardians of the Galaxy Vol. 2 | 2017-05-05 | 3 18 16 | Spider-Man: Homecoming | 2017-07-07 | 3 19 17 | Thor: Ragnarok | 2017-11-03 | 3 20 18 | Black Panther | 2018-02-16 | 3 21 19 | Avengers: Infinity War | 2018-04-27 | 3 22 20 | Ant-Man and The Wasp | 2018-07-06 | 3 23 21 | Captain Marvel | 2019-03-08 | 3 24 22 | Avengers: Endgame | 2019-04-26 | 3 25 23 | Spider-Man: Far From Home | 2019-07-02 | 3 26 24 | Black Widow | 2021-07-09 | 4 27 25 | Shang-Chi and the Legend of the Ten Rings | 2021-09-03 | 4 28 26 | Eternals | 2021-11-05 | 4 29 27 | Spider-Man: No Way Home | 2021-12-17 | 4 30 28 | Doctor Strange in the Multiverse of Madness | 2022-05-06 | 4 31 29 | Thor: Love and Thunder | 2022-07-08 | 4 32 30 | Black Panther: Wakanda Forever | 2022-11-11 | 4 33 31 | Ant-Man and The Wasp: Quantumania | 2023-02-17 | 5 34 32 | Guardians of the Galaxy Vol. 3 | 2023-05-05 | 5 35 33 | The Marvels | 2023-11-10 | 5 36(33 rows)
The movies
table includes the first 33 Marvel movies. Each entry in the table includes a value for movie_id
, movie_name
, release_date
and phase
. The movie_id
column corresponds with the movie_id
columns from the movie_details
and characters
tables.
Movie Details Table
Plain text1 movie_id | budget_million_usd | box_office_million_usd | imdb_rating | runtime_minutes 2----------+--------------------+------------------------+-------------+----------------- 3 1 | 140 | 585.2 | 7.9 | 126 4 2 | 150 | 263.4 | 6.7 | 112 5 3 | 200 | 623.9 | 7.0 | 124 6 4 | 150 | 449.3 | 7.0 | 115 7 5 | 140 | 370.6 | 6.9 | 124 8 6 | 220 | 1519.6 | 8.0 | 143 9 7 | 200 | 1215.4 | 7.2 | 130 10 8 | 170 | 644.6 | 6.9 | 112 11 9 | 170 | 714.3 | 7.7 | 136 12 10 | 170 | 773.3 | 8.0 | 121 13 11 | 250 | 1405.4 | 7.3 | 141 14 12 | 130 | 519.3 | 7.3 | 117 15 13 | 250 | 1153.3 | 7.8 | 147 16 14 | 165 | 677.7 | 7.5 | 115 17 15 | 200 | 863.8 | 7.6 | 136 18 16 | 175 | 880.2 | 7.4 | 133 19 17 | 180 | 850.8 | 7.9 | 130 20 18 | 200 | 1346.9 | 7.3 | 134 21 19 | 321 | 2048.4 | 8.4 | 149 22 20 | 162 | 622.7 | 7.1 | 118 23 21 | 175 | 1128.3 | 6.9 | 123 24 22 | 356 | 2797.8 | 8.4 | 181 25 23 | 160 | 1131.9 | 7.5 | 129 26 24 | 200 | 378.5 | 6.8 | 134 27 25 | 200 | 430.0 | 7.6 | 132 28 26 | 200 | 402.9 | 6.8 | 157 29 27 | 260 | 1995.4 | 8.4 | 148 30 28 | 180 | 1594.7 | 7.8 | 132 31 29 | 200 | 714.3 | 7.5 | 130 32 30 | 250 | 859 | 7.3 | 161 33 31 | 200 | 476 | 6.2 | 125 34 32 | 250 | 845 | 8.1 | 150 35 33 | 250 | 200 | 6.1 | 124 36(33 rows)
The movie_details
table includes details for the first 33 Marvel movies. Each entry in the table includes a value for movie_id
, budget_million_usd
, box_office_million_usd
, imdb_rating
, and runtime_minutes
. The movie_id
column corresponds with the movie_id
columns from the movies
and characters
tables.
Characters Table
Plain text1 character_id | movie_id | character_name | actor | screen_time_minutes 2--------------+----------+-------------------------------------+------------------------+--------------------- 3 1 | 1 | Tony Stark | Robert Downey Jr. | 120 4 2 | 1 | Pepper Potts | Gwyneth Paltrow | 40 5 3 | 1 | James Rhodes | Terrence Howard | 30 6 4 | 1 | Obadiah Stane | Jeff Bridges | 25 7 5 | 1 | Happy Hogan | Jon Favreau | 20 8 6 | 1 | Agent Coulson | Clark Gregg | 15 9 7 | 1 | Raza | Faran Tahir | 10 10 8 | 1 | Yinsen | Shaun Toub | 10 11 9 | 2 | Bruce Banner/Hulk | Edward Norton | 110 12 10 | 2 | Betty Ross | Liv Tyler | 35 13 11 | 2 | Thaddeus Ross | William Hurt | 25 14 12 | 2 | Emil Blonsky/Abomination | Tim Roth | 20 15 13 | 2 | Leonard Samson | Ty Burrell | 15 16 14 | 2 | General Ross | William Hurt | 15 17 15 | 2 | Jack McGee | Tim Blake Nelson | 10 18 16 | 3 | Natasha Romanoff/Black Widow | Scarlett Johansson | 100 19 17 | 3 | Nick Fury | Samuel L. Jackson | 30 20 18 | 3 | James Rhodes/War Machine | Don Cheadle | 25 21 19 | 3 | Ivan Vanko/Whiplash | Mickey Rourke | 20 22 20 | 3 | Justin Hammer | Sam Rockwell | 15 23 21 | 3 | JARVIS | Paul Bettany | 10 24 22 | 3 | Howard Stark | John Slattery | 10 25...... 26(243 rows)
The characters
table contains 243 entries for characters that appear in the first 33 Marvel movies. Each entry has a character_id
, movie_id
, character_name
, actor
, and screen_time_minutes
. The movie_id
column corresponds with the movie_id
column from the movies
and movie_details
table.
Now that we understand the dataset, let's dive into text-based queries.
Pattern matching in PostgreSQL is primarily facilitated through the use of the %
wildcard within the LIKE
and ILIKE
operators. The %
wildcard matches any sequence of characters, including an empty sequence.
The LIKE
operator performs case-sensitive pattern matching while the ILIKE
operator performs case-insensitive pattern matching.
Placing %
at the end of a text query matches any string that starts with the text placed before %
. For example, the pattern A%
matches any string that starts with the letter "A" followed by any sequence of characters, including an empty sequence.
Placing %
at the start of a text query matches any string that ends with the text placed after %
. For example, the pattern %a
matches any string that ends with the letter "a".
Suppose we want to find characters with names that begin with "A". The query is:
SQL1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE 'A%';
The output is:
Plain text1 character_name 2------------------------------- 3 Ayo 4 Aunt May 5 Alexei Shostakov/Red Guardian 6 Aldrich Killian 7 America Chavez 8 Agent Coulson 9 Adrian Toomes/Vulture 10 Alexander Pierce 11(8 rows)
Here, any character name starting with "A" (case-sensitive) followed by any sequence of characters is selected.
The query to find character names that end with "a" (case-sensitive) preceded by any sequence of characters is:
SQL1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE '%a';
The output is:
Plain text1 character_name 2------------------------------ 3 Ramonda 4 Yelena Belova 5 Hela 6 Nakia 7 Raza 8 Clea 9 Steve Rogers/Captain America 10 Nebula 11 Thena 12 Gamora 13 Frigga 14 Minn-Erva 15 Yondu Udonta 16(13 rows)
Any character name ending with "a" will be included, regardless of the preceding sequence of characters.
In PostgreSQL, you can use the =
operator to find exact matches in text-based queries. This operator ensures only entries that exactly match the specified text are returned.
To find exact matches for the character name "Tony Stark", you can use the following query:
SQL1SELECT character_name
2FROM characters
3WHERE character_name = 'Tony Stark';
The output is:
Plain text1 character_name 2---------------- 3 Tony Stark 4(1 row)
This query returns only the entries where the character_name
is exactly "Tony Stark", with no additional characters or variations in case.
Placing %
at the beginning and end of a text finds all entries that contain that substring. The %
at the beginning means any sequence of characters (including an empty sequence) can come before the substring, and the %
at the ends means any sequence of characters (including an empty sequence) can come after the substring. Let's take a look at an example:
SQL1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name LIKE '%Tony Stark%';
The output of this query is:
Plain text1 character_name 2--------------------- 3 Tony Stark 4 Tony Stark/Iron Man 5(2 rows)
The output indicates that there are two distinct entries in the character_name
column that contain "Tony Stark": "Tony Stark" and "Tony Stark/Iron Man".
"Tony Stark/Iron Man" did not appear in the query containing WHERE character_name = 'Tony Stark'
because "Tony Stark/Iron Man" is not an exact match.
PostgreSQL provides two pattern-matching functions: LIKE
and ILIKE
. LIKE
performs case-sensitive pattern matching, while ILIKE
performs case-insensitive pattern matching.
Let's revisit the example of finding the characters whose names begin with "A". In the characters
table, the first letter of each character's name is capitalized.
SQL1SELECT character_name 2FROM characters 3WHERE character_name LIKE 'a%';
The above query does not return any entries because no character names start with lowercase "a". The output is:
Plain text1 character_name 2---------------- 3(0 rows)
We can use ILIKE
to find all names that begin with the letter "A"/"a" regardless of case.
SQL1SELECT DISTINCT character_name 2FROM characters 3WHERE character_name ILIKE 'a%';
The output of this query is:
Plain text1 character_name 2------------------------------- 3 Ayo 4 Aunt May 5 Alexei Shostakov/Red Guardian 6 Aldrich Killian 7 America Chavez 8 Agent Coulson 9 Adrian Toomes/Vulture 10 Alexander Pierce 11(8 rows)
The output shows how ILIKE
can be used to query text without regard to case.
Sometimes, you might want to handle case sensitivity explicitly. We can do this using the UPPER
and LOWER
functions. These functions return their input in all capital letters and all lowercase letters, respectively.
To find names that are fully uppercase, we use the UPPER
function as follows:
SQL1SELECT character_name
2FROM characters
3WHERE character_name = UPPER(character_name);
UPPER(character_name)
returns the character_name
in all caps.
Using WHERE character_name = UPPER(character_name);
, we find all character names that remain unchanged when passed into the UPPER
function.
The output is:
Plain text1 character_name 2---------------- 3 JARVIS 4 MJ 5 MJ 6(3 rows)
The output shows the names of characters that only contain capital letters.
Similarly, to find names in lowercase, we use the LOWER
function.
SQL1SELECT actor
2FROM characters
3WHERE actor = LOWER(actor);
The output is:
Plain text1 actor 2------- 3(0 rows)
This query returns an empty table, as no actor names are in all lowercase.
In this lesson, we:
- Reviewed the Marvel movies database
- Explored pattern matching using
%
and=
- Found exact matches and substring matches in text data
- Handled case sensitivity using the
UPPER
andLOWER
functions
You are now well-equipped to handle text-based queries in PostgreSQL. The next steps involve practicing these queries through exercises to reinforce your understanding. Happy querying!