Introduction: The Web Searcher Module

Welcome back! In the previous lesson, you learned how to use the DDGS library to search the web, fetch a single web page, and convert its content from HTML to Markdown. These are essential skills for building an automated research tool.

In this lesson, we will take the next step by creating a Web Searcher module. This module will allow you to search for a topic, fetch the top web pages, and convert their content to Markdown — all in one go. This is a key part of building a tool that can gather and process information from the web automatically.

By the end of this lesson, you will know how to combine searching, fetching, and converting web content into a single, reusable function. This will make your code cleaner and more powerful and prepare you for more advanced automation tasks later in the course.

Before we dive in, let’s quickly remind ourselves of the main tools we have used so far:

  • DDGS: This library lets us perform web searches in Python and get results as structured data.
  • httpx: This is a library for making HTTP requests, which we use to fetch web pages.
  • html_to_markdown: This tool converts HTML content into Markdown, making it easier to read and process.

You have already seen how to use each of these tools separately. Now, we will see how to use them together to automate the process of searching for and retrieving useful web content.

Building the `search_and_fetch_markdown` Function

Now, let’s put everything we learned together into a single function. We want a function that:

  1. Takes a search query.
  2. Searches the web for the top results.
  3. Fetches each result’s web page.
  4. Converts the HTML to Markdown.
  5. Returns a list of dictionaries, each with the title, URL, and Markdown content.

Let’s build this step by step.

Step 1: Define the Function and Set Up the Search
  • We import the needed libraries.
  • The function takes a query, and optional max_results and timeout.
  • We create a DDGS object and perform the search.
  • We prepare an empty list to store the results.
Step 2: Loop Through Results and Fetch Content
  • For each result, we get the URL and title.
  • If the URL is missing, we skip it.
  • We use a try block to handle errors. If fetching or converting fails, we record the error.
  • If successful, we convert the HTML to Markdown and add the result to our list.
  • At the end, the function returns a list of dictionaries, each with the title, URL, and Markdown content.
Example Output

If you call the function like this:

You might see output like:

This shows that your function is working: it searches, fetches, and converts web pages to Markdown.

Summary And What’s Next

In this lesson, you learned how to build a Web Searcher module that can:

  • Search the web for a topic.
  • Fetch the top web pages from the results.
  • Convert each page’s HTML content to Markdown.
  • Return all this information in a structured way.

This function is a powerful building block for your automated research tool. In the next practice exercises, you will get hands-on experience using and modifying this function. You will practice searching for different topics, handling errors, and working with the Markdown output.

Great job making it this far — let’s move on to the practice and solidify your skills!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal