Welcome to "Building the Movie Recommendation Base Backend using Codex"! In this course, we are going to build a powerful backend system capable of recommending movies and TV shows.
To build a recommendation engine, we first need data. We are starting with a raw dataset: a CSV file containing thousands of Netflix titles, including their descriptions, actors, directors, and ratings. However, a simple text file is not powerful enough for a complex application. We need to move this data into a PostgreSQL database.
In this first lesson, we will focus on the "planning and migration" phase. We will not write every single line of code by hand. Instead, we will learn how to leverage Codex, an AI coding assistant, to accelerate our development. We will use Codex to design our database structure, write scripts to load our data, and verify that everything looks correct.
Before we start asking Codex to write code for us, we must address a critical topic: Security.
When you use an AI tool like Codex, you are sending data to an external server for processing. This leads to the "Golden Rule" of AI development:
Never send secrets, passwords, or Personally Identifiable Information (PII) to an AI assistant.
Let's look at an example of an unsafe prompt:
This is dangerous because you just shared your password. If the AI logs this conversation, your password is no longer private.
Here is the safe way to ask the same question:
By using placeholders or environment variables, we get the code we need without exposing our secrets. Throughout this course, we will practice this safe approach.
A database needs a blueprint, known as a Schema. This defines what tables we have and how they relate to each other. Our raw data is in a CSV file, which is just one big table. However, a movie might have multiple actors, and an actor might be in multiple movies.
To handle this efficiently, we need to "normalize" our data. This means breaking it into smaller, related tables. We can ask Codex to help us design this.
We might prompt Codex like this:
"I have a dataset of movies with columns for title, director, cast, and country. The 'cast' column contains comma-separated lists of actors. Please generate a PostgreSQL schema that normalizes this data."
Codex will generate SQL code to create these tables. Let's look at how this structure is built, step by step.
First, it creates the main table for the shows:
This table holds the core information for every movie or TV show.
Next, instead of storing a long list of actors inside the shows table, Codex creates a separate table just for actors:
Finally, to link them together, it creates a "junction" table. This table says, "This specific Actor is in this specific Show."
By using this structure, our database is organized and efficient. We save this blueprint as schema.sql.
Now that we have a database structure, we need to move the data from our CSV file into these new tables. Doing this manually for thousands of movies would take forever. We need a Python script to automate it. Here is an overview of the functionality of the script:
Let's look at the logic Codex uses to build this script, load_netflix.py.
First, the script needs to load the raw data. Codex often suggests using a library called pandas for this, as it is very good at handling CSV files.
The raw data has strings like "Actor A, Actor B, Actor C." The script needs to split these into individual names so they can go into our actors table.
Finally, the script connects to Postgres and inserts the data. It separates the data into lists: one list for movies, one for actors, and one for the connections between them.
By running this script, we transform our messy text file into a structured, queryable database.
After running a migration script, we should never assume it worked perfectly. We need to verify the results. We can ask Codex to write a "sanity check" script, check_import.py.
This script performs simple queries to ensure the data looks right.
The most basic check is counting the rows. If our CSV had 8,000 titles, our database should also have 8,000 titles.
We also want to see a few actual records to ensure the relationships are working. For example, if we query a movie, do we get the correct actors back?
If this script prints out movie titles with their correct cast members, we know our migration was a success.
In this lesson, we established the foundation for our movie recommendation backend.
- Security: We learned never to share secrets with AI tools.
- Planning: We used Codex to design a normalized SQL schema (
schema.sql) that organizes our data efficiently. - Automation: We generated a Python script (
load_netflix.py) to clean and load our CSV data into Postgres. - Verification: We created a check script (
check_import.py) to confirm our data is safe and sound.
Now it is time for you to put this into practice. In the upcoming exercises, you will use Codex to generate these scripts and run the migration yourself. This is the first step in building a robust backend!
