Introduction: Moving Beyond Static Data

Welcome back. In our previous lessons, we successfully built a solid foundation for our application: a normalized database containing movies, TV shows, actors, and directors. We also established a Python structure to interact with this data.

However, a database of movies alone is not enough to build a recommendation engine. To suggest titles a user might enjoy, we need to understand their preferences. We need to know who they are, what they have watched, and what they thought about those titles.

In this lesson, we will evolve our database from a static catalog into a dynamic system that captures user behavior. We will design and implement three new tables to store user profiles, written reviews, and viewing history.

Designing the User-Centric Tables

Before we write any SQL code, let us plan the structure of our new data. We need to track three distinct concepts:

  1. The User: We need a central record for each person using our app. This will be the users table. It will hold basic information like a unique ID and an email address.
  2. Explicit Feedback (Reviews): This occurs when a user takes the time to write a review or give a star rating. We will store this in a user_reviews table.
  3. Implicit Feedback (Views): Often, users do not write reviews; they simply watch things. Knowing what a user has watched is a powerful signal for recommendations. We will track this in a user_views table.

These tables will not exist in isolation. They must connect to our existing data. A review, for example, must belong to a specific User and reference a specific Show.

Visualizing the Schema Changes

Let’s visualize our new schema changes.

erDiagram USERS { INT user_id PK "Primary key" TEXT email "Unique email" TEXT display_name "Display name" } USER_REVIEWS { INT review_id PK "Primary key" INT user_id FK "References USERS" TEXT show_id FK "References SHOWS" SMALLINT rating "1-5" TEXT review_text "Review text" TIMESTAMPTZ created_at "Timestamp" } USER_VIEWS { INT user_id FK,PK "References USERS" TEXT show_id FK,PK "References SHOWS" BOOLEAN viewed "Watched or not" TIMESTAMPTZ last_updated "Timestamp" } USERS||--o{USER_REVIEWS:"writes" USERS||--o{USER_VIEWS:"views"
Table Relationships and Constraints
  • USERS: Each user has a unique user_id and email.
  • USER_REVIEWS: Each review is linked to a user and a show. A user can only review a show once.
  • USER_VIEWS: Tracks if a user has watched a show. Each user-show pair is unique.
Adding Performance Indexes

Since our recommendation engine will frequently ask questions like "Find all reviews by User X" or "Find all users who watched Movie Y," we should add indexes to speed up these lookups.

Indexes act like a book's index, allowing the database to find specific rows without scanning the entire table.

Verifying the Schema with Python

To ensure our new tables are correctly set up and accessible, you should update your Python scripts as follows:

  • In check_import.py:

    • Add a function to query and print the counts of users, reviews, and views from the new tables, similar to how you count movies and shows.
    • Update the main function to call this new function and display the results.
  • In load_netflix.py:

    • Update the schema creation logic to include the new users, user_reviews, and user_views tables, along with their constraints and indexes.
    • Ensure these tables are created when the script initializes the database, so they are available for future data loading and verification.

These updates will help you confirm that the new tables exist and are accessible from your application code.

Summary and Practice Prep

In this lesson, we expanded our database schema to support user interactions.

  1. We created the users table to manage user identities.
  2. We created user_reviews with constraints to ensure valid ratings (1-5) and data integrity.
  3. We created user_views to track watch history.
  4. We updated our verification script to confirm these tables exist in the database.

In the upcoming practice exercises, you will be tasked with writing the SQL for these tables yourself and updating the Python script to verify your work. This ensures you are ready to start building the API endpoints that will populate these tables in future lessons.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal