Welcome to the first lesson of our course on Advanced Query Techniques and Conditional Logic in PostgreSQL. In this lesson, we will focus on mastering text-based queries.
By the end of this lesson, you will know how to search for patterns in text and handle case sensitivity effectively using PostgreSQL.
This course uses the Marvel movies dataset from previous courses. The dataset has a movies
table, movie_details
table, and a characters
table. As a refresher, the tables are:
Movies Table
The movies
table includes the first 33 Marvel movies. Each entry in the table includes a value for movie_id
, movie_name
, release_date
and phase
. The movie_id
column corresponds with the movie_id
columns from the movie_details
and characters
tables.
Movie Details Table
The movie_details
table includes details for the first 33 Marvel movies. Each entry in the table includes a value for movie_id
, budget_million_usd
, box_office_million_usd
, imdb_rating
, and runtime_minutes
. The movie_id
column corresponds with the movie_id
columns from the movies
and characters
tables.
Characters Table
The characters
table contains 243 entries for characters that appear in the first 33 Marvel movies. Each entry has a character_id
, movie_id
, character_name
, actor
, and screen_time_minutes
. The movie_id
column corresponds with the movie_id
column from the movies
and movie_details
table.
Now that we understand the dataset, let's dive into text-based queries.
Pattern matching in PostgreSQL is primarily facilitated through the use of the %
wildcard within the LIKE
and ILIKE
operators. The %
wildcard matches any sequence of characters, including an empty sequence.
The LIKE
operator performs case-sensitive pattern matching while the ILIKE
operator performs case-insensitive pattern matching.
Placing %
at the end of a text query matches any string that starts with the text placed before %
. For example, the pattern A%
matches any string that starts with the letter "A" followed by any sequence of characters, including an empty sequence.
Placing %
at the start of a text query matches any string that ends with the text placed after %
. For example, the pattern %a
matches any string that ends with the letter "a".
Suppose we want to find characters with names that begin with "A". The query is:
The output is:
Here, any character name starting with "A" (case-sensitive) followed by any sequence of characters is selected.
The query to find character names that end with "a" (case-sensitive) preceded by any sequence of characters is:
The output is:
Any character name ending with "a" will be included, regardless of the preceding sequence of characters.
In PostgreSQL, you can use the =
operator to find exact matches in text-based queries. This operator ensures only entries that exactly match the specified text are returned.
To find exact matches for the character name "Tony Stark", you can use the following query:
The output is:
This query returns only the entries where the character_name
is exactly "Tony Stark", with no additional characters or variations in case.
Placing %
at the beginning and end of a text finds all entries that contain that substring. The %
at the beginning means any sequence of characters (including an empty sequence) can come before the substring, and the %
at the ends means any sequence of characters (including an empty sequence) can come after the substring. Let's take a look at an example:
The output of this query is:
The output indicates that there are two distinct entries in the character_name
column that contain "Tony Stark": "Tony Stark" and "Tony Stark/Iron Man".
"Tony Stark/Iron Man" did not appear in the query containing WHERE character_name = 'Tony Stark'
because "Tony Stark/Iron Man" is not an exact match.
PostgreSQL provides two pattern-matching functions: LIKE
and ILIKE
. LIKE
performs case-sensitive pattern matching, while ILIKE
performs case-insensitive pattern matching.
Let's revisit the example of finding the characters whose names begin with "A". In the characters
table, the first letter of each character's name is capitalized.
The above query does not return any entries because no character names start with lowercase "a". The output is:
We can use ILIKE
to find all names that begin with the letter "A"/"a" regardless of case.
The output of this query is:
The output shows how ILIKE
can be used to query text without regard to case.
Sometimes, you might want to handle case sensitivity explicitly. We can do this using the UPPER
and LOWER
functions. These functions return their input in all capital letters and all lowercase letters, respectively.
To find names that are fully uppercase, we use the UPPER
function as follows:
UPPER(character_name)
returns the character_name
in all caps.
Using WHERE character_name = UPPER(character_name);
, we find all character names that remain unchanged when passed into the UPPER
function.
The output is:
The output shows the names of characters that only contain capital letters.
Similarly, to find names in lowercase, we use the LOWER
function.
The output is:
This query returns an empty table, as no actor names are in all lowercase.
In this lesson, we:
- Reviewed the Marvel movies database
- Explored pattern matching using
%
and=
- Found exact matches and substring matches in text data
- Handled case sensitivity using the
UPPER
andLOWER
functions
You are now well-equipped to handle text-based queries in PostgreSQL. The next steps involve practicing these queries through exercises to reinforce your understanding. Happy querying!
