Generating Search Queries with OpenAI

Introduction: The Role of Search Queries in DeepResearcher

Welcome back! In the previous lesson, you learned how the DeepResearcher project is organized and how the main program connects different parts of the research tool. Now, we are ready to dive into one of the most important steps in automated research: generating search queries.

When you want to research a topic, you usually start by typing a question or a phrase into a search engine. The quality of your search queries can make a big difference in the results you get. In DeepResearcher, we want to automate this process so that the tool can come up with several smart search queries based on a user’s topic. This helps us gather more complete and relevant information from the web.

In this lesson, you will learn how DeepResearcher uses OpenAI to generate a list of search queries from a user’s input. This is a key step that powers the rest of the research process.

How DeepResearcher Generates Search Queries

Let’s look at how DeepResearcher turns a user’s research topic into a set of search queries using OpenAI.

The main function responsible for this is called generate_initial_search_queries. Here’s how it works, step by step.

1. Collecting the User’s Query

First, we need to get the topic or question the user wants to research. This is usually a string, like "What are the health benefits of green tea?"

input() asks the user to type in their research topic.
.strip() removes any extra spaces at the beginning or end.

2. Preparing Variables for the Language Model

Next, we prepare the user’s query to send to the language model. We put it into a dictionary called variables.

This dictionary will be used to fill in the prompt template for the language model.

3. Generating the Search Queries with OpenAI

Now, we use the generate_response function to ask the language model (like GPT-3.5 or GPT-4) to generate search queries for us. We provide it with two prompt files and the variables.

"search_generator_system" and "search_generator_user" are the names of the prompt files that you will have to write.
variables is the dictionary we just created.

The language model will read the prompts and the user’s query, then return a string that should look like a Python list of search queries.

Understanding and Validating Model Output

The language model is supposed to return a Python list of strings, like this:

But sometimes, the output might not be exactly what we expect. We need to check and handle this.

1. Evaluating the Output

We use the eval() function to turn the string into a real Python list.

eval(search_queries_str) tries to convert the string to a Python object.
We check if the result is a list. If not, we raise an error.
If anything goes wrong, we print an error message and return an empty list.

This makes sure that our program only continues if we get a valid list of search queries.

2. Why This Validation Is Important

Language models can sometimes return results in the wrong format, especially if the prompt is not clear or if there is a mistake. By checking the output, we make sure our program doesn’t crash or use bad data.

Summary And What’s Next

In this lesson, you learned how DeepResearcher uses OpenAI to generate a set of search queries from a user’s research topic. You saw how we:

Collect the user’s input
Prepare variables for the language model
Use prompt files to guide the model
Validate the output to make sure it’s a proper list of search queries

These steps are essential for making sure our research tool starts with strong, relevant search queries. In the next set of practice exercises, you’ll get hands-on experience with generating and handling search queries yourself. This will help you see how good queries can lead to better research results.

Previous Lesson

Next Lesson: Parsing and Selecting Useful Information

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal