Welcome back. In our previous lessons, we successfully built a solid foundation for our application: a normalized database containing movies, TV shows, actors, and directors. We also established a Python structure to interact with this data.
However, a database of movies alone is not enough to build a recommendation engine. To suggest titles a user might enjoy, we need to understand their preferences. We need to know who they are, what they have watched, and what they thought about those titles.
In this lesson, we will evolve our database from a static catalog into a dynamic system that captures user behavior. We will design and implement three new tables to store user profiles, written reviews, and viewing history.
Before we write any SQL code, let us plan the structure of our new data. We need to track three distinct concepts:
- The User: We need a central record for each person using our app. This will be the
userstable. It will hold basic information like a unique ID and an email address. - Explicit Feedback (Reviews): This occurs when a user takes the time to write a review or give a star rating. We will store this in a
user_reviewstable. - Implicit Feedback (Views): Often, users do not write reviews; they simply watch things. Knowing what a user has watched is a powerful signal for recommendations. We will track this in a
user_viewstable.
These tables will not exist in isolation. They must connect to our existing data. A review, for example, must belong to a specific User and reference a specific Show.
Let’s visualize our new schema changes.
- USERS: Each user has a unique
user_idand email. - USER_REVIEWS: Each review is linked to a user and a show. A user can only review a show once.
- USER_VIEWS: Tracks if a user has watched a show. Each user-show pair is unique.
Since our recommendation engine will frequently ask questions like "Find all reviews by User X" or "Find all users who watched Movie Y," we should add indexes to speed up these lookups.
Indexes act like a book's index, allowing the database to find specific rows without scanning the entire table.
To ensure our new tables are correctly set up and accessible, you should update your Python scripts as follows:
-
In
check_import.py:- Add a function to query and print the counts of users, reviews, and views from the new tables, similar to how you count movies and shows.
- Update the main function to call this new function and display the results.
-
In
load_netflix.py:- Update the schema creation logic to include the new
users,user_reviews, anduser_viewstables, along with their constraints and indexes. - Ensure these tables are created when the script initializes the database, so they are available for future data loading and verification.
- Update the schema creation logic to include the new
These updates will help you confirm that the new tables exist and are accessible from your application code.
In this lesson, we expanded our database schema to support user interactions.
- We created the
userstable to manage user identities. - We created
user_reviewswith constraints to ensure valid ratings (1-5) and data integrity. - We created
user_viewsto track watch history. - We updated our verification script to confirm these tables exist in the database.
In the upcoming practice exercises, you will be tasked with writing the SQL for these tables yourself and updating the Python script to verify your work. This ensures you are ready to start building the API endpoints that will populate these tables in future lessons.
