Making the Web Search Reliable and Safe

Introduction: The Need for Reliable and Safe Web Search

Welcome back! In the last lesson, you learned how to avoid common pitfalls in your web searcher, such as duplicate results and broken links. Now, we will take your skills a step further by making your web searcher more reliable and safe.

When you automate web content retrieval, you will often face problems like network errors, slow responses, or even accidentally processing the same page more than once. If you don’t handle these issues, your research tool might miss important information or waste time and resources.

In this lesson, you will learn how to:

Automatically retry failed web requests,
And use logging to monitor what happens during your web search.

By the end of this lesson, you will have a web searcher that is much more robust and ready for real-world use.

Retrying Failed Requests with Tenacity

When you fetch web pages, sometimes things go wrong. The website might be slow, your internet connection might drop, or the server might return an error. If you don’t handle these problems, your program could crash or miss important data.

To solve this, you can use the tenacity library. This library lets you automatically retry a function if it fails due to certain errors, such as timeouts or connection problems.

Let’s start by importing the necessary modules and setting up a simple retry mechanism.

Here’s what each import does:

tenacity provides decorators and tools for retrying functions.
httpx is the library we use to make HTTP requests.
The exception types (TimeoutException, RequestError, HTTPStatusError) help us specify which errors should trigger a retry.

Now, let’s create a function that fetches a web page and automatically retries if it fails due to a network error or timeout.

Let’s break down what’s happening here:

The @retry decorator tells Python to retry the function up to 3 times if it fails due to a timeout, connection error, or HTTP error.

Logging: Monitoring Successes and Failures

When your program runs, it’s helpful to know what’s happening — especially when things go wrong. Logging lets you record messages about what your program is doing, which can help you debug problems or understand how your code is working.

Python’s built-in logging module makes this easy.

Let’s set up basic logging:

This sets the logging level to INFO, so you’ll see informational messages and warnings.
The format makes it clear what type of message is being logged.

Now, let’s add logging to our web searcher. For example, you can log warnings when a request fails:

If a timeout occurs, you log a warning with the URL.
If there’s an HTTP error, you log the status code and URL.
If there’s a connection error or any other unexpected error, you log those as well.

Example Output:

Logging helps you see what went wrong and where, making it much easier to fix problems.

Summary and What’s Next

In this lesson, you learned how to make your web searcher more reliable and safe by:

Retrying failed requests automatically with the tenacity library,
And using logging to monitor successes and failures.

These improvements will help your automated research tool handle real-world problems and give you better control over what happens during web searches.

Next, you’ll get a chance to practice these skills with hands-on exercises. This will help you reinforce what you’ve learned and prepare you for building even more advanced features. Keep up the great work!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal