Introduction: Automating Web Search for Research

Welcome to the first lesson of our course, "Automating Web Content Retrieval and Parsing in Python". In this course, you will learn how to build a research tool that can search the web, gather content, and process information automatically.

Automating web search is a key part of building modern research tools. Instead of searching for information manually, you can write a Python module to do it for you. This saves time and allows you to collect and process large amounts of information quickly.

In this lesson, we will focus on using DDGS, A metasearch library that aggregates results from diverse web search services.

By the end of this lesson, you will know how to:

  • Search the web using DDGS in Python
  • Fetch the first result from your search
  • Convert the web page content into a readable format

Let’s get started!

Using DDGS to Search the Web

The DDGS library allows you to perform web searches directly from Python. This is helpful because you can automate the process of finding information online.

DDGS works as a metasearch tool: it automatically selects and queries different search backends (such as DuckDuckGo, Brave, or others) to provide you with a diverse set of results. You do not need to choose the backend yourself, the library handles this for you.

First, let’s see how to import the DDGS library and perform a simple search.

  • Here, we import the DDGS class from the library.
  • We define a search query, in this case, "Python programming".
  • We call the .text() method to perform the search and ask for just one result (max_results=1).
  • The results variable will contain a list of search results.

Sample Output:

Each result is a dictionary with keys like title, href (the URL), and body (a short description).

Note: On CodeSignal, the library is already installed, so you do not need to install it yourself. On your own computer, you would install it using .

Extracting and Fetching the First Search Result

Now that we have search results, let’s extract the URL of the first result and fetch the web page content.

First, let’s check if we got any results and get the URL:

  • We check if results is not empty and if the first result has a "href" key.
  • If so, we extract the URL and print it.
  • If not, we print a message saying no valid results were found.

Even though we request only one result with max_results=1, DDGS().text() always returns a list. To access the actual result, we need to extract the first item from the list using results[0], even if there's only one result.

Next, let’s fetch the content of the web page using the httpx library:

  • We use httpx.get() to download the web page at the given URL.
  • The timeout=10 argument means the request will wait up to 10 seconds.
  • response.raise_for_status() will raise an error if the request fails.
  • We print the first 200 characters of the web page content to see what we got.
Converting Web Content to Markdown

Web pages are usually written in HTML, which can be hard to read. To make the content easier to work with, we can convert it to Markdown, a simpler text format.

We will use the html_to_markdown library for this. Here’s how you can do it:

  • We import the convert_to_markdown function.
  • We pass the HTML content to this function, and it returns a Markdown version.
  • We print the first 200 characters of the Markdown content.

Sample Output:

Markdown is much easier to read and process than raw HTML, which is why we use this step.

Summary and What’s Next

In this lesson, you learned how to:

  • Use the DDGS library to search the web from Python
  • Extract the first search result and fetch its web page content using httpx
  • Convert the web page’s HTML to Markdown for easier reading

These steps are the foundation for building an automated research tool. In the practice exercises that follow, you will get hands-on experience with these skills. You will write your own code to search, fetch, and convert web content, preparing you for more advanced features in future lessons.

Let’s move on to the practice exercises and start building your DeepResearcher!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal