Welcome back! In the last lesson, you learned how to avoid common pitfalls in your web searcher, such as duplicate results and broken links. Now, we will take your skills a step further by making your web searcher more reliable and safe.
When you automate web content retrieval, you will often face problems like network errors, slow responses, or even accidentally processing the same page more than once. If you don’t handle these issues, your research tool might miss important information or waste time and resources.
In this lesson, you will learn how to:
- Automatically retry failed web requests,
- And use logging to monitor what happens during your web search.
By the end of this lesson, you will have a web searcher that is much more robust and ready for real-world use.
When you fetch web pages, sometimes things go wrong. The website might be slow, your internet connection might drop, or the server might return an error. If you don’t handle these problems, your program could crash or miss important data.
To solve this, you can use the tenacity
library. This library lets you automatically retry a function if it fails due to certain errors, such as timeouts or connection problems.
Let’s start by importing the necessary modules and setting up a simple retry mechanism.
Here’s what each import does:
tenacity
provides decorators and tools for retrying functions.httpx
is the library we use to make HTTP requests.- The exception types (
TimeoutException
,RequestError
,HTTPStatusError
) help us specify which errors should trigger a retry.
Now, let’s create a function that fetches a web page and automatically retries if it fails due to a network error or timeout.
Let’s break down what’s happening here:
- The
@retry
decorator tells Python to retry the function up to 3 times if it fails due to a timeout, connection error, or HTTP error.
When your program runs, it’s helpful to know what’s happening — especially when things go wrong. Logging lets you record messages about what your program is doing, which can help you debug problems or understand how your code is working.
Python’s built-in logging
module makes this easy.
Let’s set up basic logging:
- This sets the logging level to
INFO
, so you’ll see informational messages and warnings. - The format makes it clear what type of message is being logged.
Now, let’s add logging to our web searcher. For example, you can log warnings when a request fails:
- If a timeout occurs, you log a warning with the URL.
- If there’s an HTTP error, you log the status code and URL.
- If there’s a connection error or any other unexpected error, you log those as well.
Example Output:
Logging helps you see what went wrong and where, making it much easier to fix problems.
In this lesson, you learned how to make your web searcher more reliable and safe by:
- Retrying failed requests automatically with the
tenacity
library, - And using logging to monitor successes and failures.
These improvements will help your automated research tool handle real-world problems and give you better control over what happens during web searches.
Next, you’ll get a chance to practice these skills with hands-on exercises. This will help you reinforce what you’ve learned and prepare you for building even more advanced features. Keep up the great work!
