Introduction: Keeping Your Recipe Data Clean

Welcome back! So far, you have learned how to build and use API endpoints to retrieve, search, and rate recipes in your cooking helper app. As your app grows and more recipes are added, it’s important to make sure your data stays clean and reliable. One common problem in real-world applications is duplicate data — when the same recipe appears more than once in your database.

Duplicate recipes can confuse users, make search results messy, and even affect features like ratings and recommendations. In this lesson, you will learn how to identify and remove duplicate recipes from your application. This is an important step in keeping your app’s data accurate and user-friendly.

By the end of this lesson, you’ll know how to find and safely delete duplicate recipes, making your cooking helper app more professional and enjoyable for users.

How Duplicate Recipes Happen

Before we look at the solution, let’s talk about how duplicate recipes can appear in your database. Sometimes, users might add the same recipe twice by mistake. Other times, small differences — like extra spaces or different capitalization — can make two recipes look different to a computer, even though they are the same to a person.

For example, these two names would be considered duplicates by a human, but not always by a computer:

  • "Chocolate Cake"
  • " chocolate cake "

In our project, we consider recipes to be duplicates if their names are the same when you ignore spaces at the beginning or end and treat uppercase and lowercase letters as the same. This is called normalizing the name.

Understanding the Duplicate Recipe Removal Process

Let’s break down how the duplicate recipe removal process works, step by step.

1. Connecting to Your App and Database

First, the solution needs to connect to your app and the database where your recipes are stored. This is done by setting up the correct environment and importing the necessary modules and models:

  • The code sets up the environment so the application’s code can be found and used.
  • The database session and the Recipe model are imported so the solution can work with your recipes.
2. Normalizing Recipe Names

To find duplicates, the solution needs to compare recipe names in a way that ignores spaces and capitalization. This is done with a helper function:

  • strip() removes spaces at the beginning and end.
  • lower() makes all letters lowercase.

For example:

  • " Chocolate Cake " becomes "chocolate cake".
3. Finding and Listing Duplicates

The solution then loads all recipes and groups them by their normalized name:

  • all_recipes = session.query(Recipe).all() gets every recipe from the database.
  • name_map is a dictionary that groups recipes by their normalized name.
  • If a group has more than one recipe, it’s a duplicate.

The solution then prints out the duplicates it finds.

Example Output:

4. Safely Removing Duplicates

The solution asks you if you want to delete the duplicates, keeping only one copy of each:

  • If you type y, the solution deletes all but one recipe in each duplicate group.
  • If you type anything else, it does nothing.

This makes sure you don’t accidentally delete recipes without checking first.

When a duplicate recipe is deleted, you don’t have to worry about its reviews being left behind in the database. This is because the Review model uses cascading deletes on its recipe_id foreign key. This means that when a recipe is deleted, all reviews linked to that recipe are automatically deleted by the database. This helps keep your data consistent and prevents orphaned reviews.

5. Running the Duplicate Removal

To run the duplicate removal, you can use the following code:

Example: Running the Duplicate Removal

Let’s see what happens when you run the duplicate removal process.

If there are no duplicates, you’ll see:

If duplicates are found, you’ll see something like:

If you type y and press Enter, the solution will remove the extra copies and show:

If you press Enter or type anything else, it will show:

This gives you a chance to review what will be deleted before making any changes.

Summary and What’s Next

In this lesson, you learned why duplicate recipes can be a problem and how to keep your recipe database clean by finding and safely removing duplicates. You saw how to connect to your app, normalize recipe names, find duplicates, and safely remove them with your confirmation.

Good data hygiene is important for any real-world app. By keeping your recipes unique, you make your cooking helper more reliable and enjoyable for users.

Congratulations on reaching the end of this course! You now have the skills to build, search, and maintain a high-quality recipe API. Be sure to try the practice exercises next to reinforce what you’ve learned. Great job!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal