Introduction: Turning Web Pages into Usable Recipes

Welcome back! In the previous lesson, you learned how to use AI to generate new recipes from a list of ingredients. Now, let’s take the next step: extracting recipes from real-world web pages.

Many cooking websites have great recipes, but the information is often buried in messy HTML code. Our goal is to build a script that can take a raw HTML file, use AI to extract a clean recipe, and then store that recipe in our database. This process is a key part of making our AI Cooking Helper smarter and more useful.

By the end of this lesson, you’ll understand how to automate recipe extraction from HTML using Python, prompt templates, and your existing database setup.

Quick Recall: Recipe Generation with AI

Before we dive in, let’s briefly remind ourselves how we previously generated recipes with AI.

In the last lesson, you learned how to:

  • Use prompt templates to ask the AI for a recipe based on a list of ingredients.
  • Send these prompts to the AI and receive a structured recipe in response.
  • Parse the AI’s response and use it in your Flask app.

This time, instead of generating a recipe from scratch, we’ll use the AI to extract a recipe from a messy HTML page. The process is similar, but the input and prompts are a bit different.

How the Extraction Script Works

Let’s look at the big picture before we break things down.

The script you’ll be working with is called extract_and_store_recipe.py. Its job is to:

  1. Read a raw HTML file from your computer.
  2. Use AI to extract a clean recipe from that HTML.
  3. Parse the AI’s response into structured data (name, ingredients, steps).
  4. Store the recipe in your database, making sure not to add duplicates.

Here’s a simple diagram of the flow:

HTML file to AI Extraction to Structured Recipe to Database Storage

This script brings together everything you’ve learned so far: prompt templates, LLM calls, and database operations.

Using Prompts to Extract Recipes from HTML

The first step is to get the AI to read the HTML and return a clean recipe. We do this by sending it a carefully crafted prompt.

Let’s look at how the script prepares and sends this prompt.

Loading the Prompt Templates

We use two prompt templates:

  • A system prompt that tells the AI what its job is.
  • A user prompt that gives the AI the actual HTML to process.

Here’s how the script loads and fills in these templates:

  • generate_response is a function that loads the prompt templates, fills in the {{html}} variable with your HTML, and sends the request to the AI.
  • The AI is told to extract a recipe and return it in a specific format.
Example: What the AI Sees

System prompt:

User prompt:

When the script runs, it replaces {{html}} with the actual HTML content. The AI then returns a recipe in the requested format.

Parsing and Saving the Recipe

Once the AI returns its response, we need to turn that text into structured data and save it to the database.

Parsing the AI’s Response

The script uses a function called parse_recipe_string to break the AI’s response into parts:

  • The function reads each line of the AI’s response.
  • It looks for the Name:, Ingredients:, and Steps: sections.
  • It collects the recipe name, a list of ingredients, and a list of steps.

You might find the code familiar, as its the same logic we used in the generate_recipe function inside routes.py

Example output:

Storing the Recipe in the Database

Now, let’s see how the script saves the recipe:

  • The function first checks if the recipe data is empty.
  • It opens a database session and checks if a recipe with the same name already exists.
  • If not, it creates a new Recipe object and adds each ingredient, creating new ones if needed.
  • Finally, it saves everything to the database.

You might also find the code familiar, as its the same logic we used in the add_manual_recipe.py script with avoidance of duplicate insert added.

Example output:

or, if the recipe already exists:

Summary And What’s Next

In this lesson, you learned how to extract a recipe from a messy HTML page using AI and store it in your database. You saw how the script:

  • Reads an HTML file,
  • Uses prompt templates to guide the AI,
  • Parses the AI’s response into structured data,
  • And saves the recipe and its ingredients to your database.

Next, you’ll get hands-on practice running and modifying this script. You’ll see how it works with real HTML files and learn how to troubleshoot and improve the extraction process. Great job making it this far — let’s keep building your AI Cooking Helper!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal