Introduction: Laying the Foundation for DeepResearcher

Welcome to the first lesson of the "Creating a Researcher in Python with OpenAI" course! In this course, you will learn how to build DeepResearcher, an AI-powered research tool that can search the web, gather information, and generate a final report — all using Python.

Before we dive into the details of web searching and AI, it’s important to set up a solid foundation. A clear project structure will help you keep your code organized, make it easier to add new features, and help you debug problems as you go. In this lesson, we’ll walk through the basic structure of the DeepResearcher project and explain how the main program is set up.

By the end of this lesson, you’ll understand how the main parts of the project fit together and be ready to start building out each piece in future lessons.

Recall: Project Flowchart

Let’s start with a quick reminder about how the DeepResearcher works:

flowchart TD B1["LLM: Generate Search Queries"] C["Web: Get top Search Results"] D["Web: Download HTML"] E["Web: Convert to Markdown"] F{"Is it Relevant?"} G["LLM: Extract Relevant Context"] K["Delete entry"] H{"More Research Needed?"} I["LLM: Generate Final Report"] A["User Input: Research Query"] --> B1 B1 --> C C --> D D --> E E --> F F -- Yes --> G F -- No --> K G --> H H -- Yes --> B1 H -- No --> I I --> J["Output: Research Report"]

This flowchart illustrates the step-by-step process, showing how each component fits into the overall workflow. The LLM (Language Model) and Web components work together to automate the research process.

Understanding the Main Program

Now, let’s look at the main file of our project: main.py. This file is the entry point for DeepResearcher. It’s where the program starts running.

Let’s break down the key parts of this file step by step.

1. Importing Functions

At the top of the file, we import a function from another part of our project:

This allows us to use the clear_visited_pages function in our main program.

2. Defining Function Stubs

Next, we see several function definitions. Right now, these functions are just “stubs” — they don’t do anything yet, but they show what the main steps of our program will be.

  • generate_initial_search_queries: This will take the user’s research topic and create a list of search queries.
  • perform_iterative_research: This will handle the main research loop, searching the web and collecting information.
  • generate_final_report: This will take all the information we’ve gathered and create a final report.

The pass statement is a placeholder. It means “do nothing for now.” We’ll fill in these functions in later lessons.

3. The Main Function

The main logic of the program is inside the research_main() function:

Let’s break this down:

  • The program asks the user for a research topic and how many times to repeat the research process.
  • It clears any previously visited web pages.
  • It generates the first set of search queries.
  • If there are no search queries, it stops.
  • It copies the search queries to keep track of all queries used.
  • It performs the main research loop.
  • Finally, it generates a report.
4. Running the Program

At the bottom, we see:

This means: “If this file is run directly, start the program by calling research_main().”

Summary and What’s Next

In this lesson, you learned how to set up the basic structure for the DeepResearcher project. You saw how the main program is organized, what each function is responsible for, and how the program flows from user input to generating a final report.

This structure will make it much easier to build and test each part of the project as we move forward. In the next practice exercises, you’ll get hands-on experience working with this structure and preparing your own project files. After that, we’ll start filling in each function to bring DeepResearcher to life, step by step.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal