Loading...

Introduction & Context

Welcome back! In the previous lesson, you learned how to structure your agent’s output using Pydantic models and the output_type parameter. This made your agent’s responses more predictable and easier to use in your applications. You also practiced accessing specific fields from the agent’s output, which is a key skill for building robust systems.

Now, let’s take the next step: making your agent-powered applications more responsive and interactive. In many real-world scenarios — such as chatbots, web apps, or tools that need to handle multiple users at once — waiting for an agent to finish its work before doing anything else can make your application feel slow or unresponsive. This is where asynchronous and streamed execution modes come in. These modes allow your program to keep working while the agent is thinking, or even to show results as soon as they are available, creating a smoother and more engaging user experience.

In this lesson, you will learn how to run agents asynchronously and how to process their outputs in real time using streaming. By the end, you will be able to build applications that feel fast and interactive, even when working with complex AI agents.

Overview Of Openai Agents SDK Execution Modes

The OpenAI Agents SDK provides three main ways to run agents: synchronous, asynchronous, and streamed execution. You have already used synchronous execution in earlier lessons, where your code waits for the agent to finish before moving on. This is simple and works well for basic scripts, but it can block your program and make it less responsive.

Asynchronous execution allows your program to start an agent task and then do other things while waiting for the result. This is especially useful in applications with user interfaces, web servers, or any situation where you do not want to block the main thread. Streamed execution takes this a step further by letting you process the agent’s output as it is being generated, token by token or event by event. This is ideal for real-time applications, such as chatbots that display text as it is typed out.

Here is a quick comparison of the three modes:

Mode	Method	Blocking?	Real-Time Output?	Use Case Example
Synchronous	`Runner.run_sync`	Yes	No	Simple scripts, batch jobs
Asynchronous	`Runner.run`	No	No	Web servers, UI apps
Streamed	`Runner.run_streamed`	No	Yes	Chatbots, live dashboards

By using asynchronous and streamed execution, you can make your applications more efficient and user-friendly.

Asynchronous Execution Mode

Asynchronous execution is a way to run tasks in the background, so your program does not have to wait for them to finish before moving on. In Python, this is done using the async and await keywords. When you run an agent asynchronously, you can perform other tasks — such as handling user input or updating a display — while the agent is working.

Let’s look at an example. Suppose you want to get a travel recommendation from your agent, but you do not want your program to freeze while waiting for the answer. Here is how you can do it using asynchronous execution:

In this example, the main function is defined as async, and the agent is run using await Runner.run(...). The program uses Python’s built-in asyncio library to manage asynchronous execution, starting the event loop with asyncio.run(main()). This setup allows your program to remain responsive and perform other tasks while waiting for the agent’s response. Once the result is ready, you can access the structured output just as before.

Asynchronous execution is a great choice when you want your application to stay responsive, especially if you need to handle multiple requests at the same time.

Streamed Execution Mode

Streamed execution is designed for situations where you want to process the agent’s output as soon as it is available, rather than waiting for the entire response. This is especially useful for real-time applications, such as chatbots or live dashboards, where you want to display information to the user as quickly as possible.

With the OpenAI Agents SDK, you can use the Runner.run_streamed method to start the agent and then iterate over the streaming events as they arrive. Each event can represent a new chunk of text or another type of update from the agent.

Here is an example that builds on the previous code, but now uses streaming to print the agent’s response as it is generated:

Here’s how you can print each token as soon as it is generated using run_streamed in the OpenAI Agents SDK:

Start the agent in streaming mode using Runner.run_streamed. This lets you receive the agent’s response bit by bit, instead of waiting for the whole answer.
Loop through the streaming events with async for. Each event represents a new piece of the agent’s response.
Check if the event contains new text by looking for events of type "raw_response_event" and making sure the event data is a ResponseTextDeltaEvent.
Print the new text immediately by accessing event.data.delta. This will show each token or chunk as soon as it is available, just like watching someone type in a chat. Here’s how the print statement works:
- event.data.delta: Contains the latest chunk or token generated by the agent during streaming.
- end="": Prevents Python from adding a newline after each chunk, so the streamed text appears as a continuous line.
- flush=True: Forces Python to immediately write the output to the console, ensuring each new chunk is displayed to the user as soon as it arrives.

When the agent is finished, you can still access the structured final output as before. Streamed execution is perfect for applications where you want to keep users engaged and informed as soon as new information is available.

Real-World Use Cases And Best Practices

Asynchronous and streamed execution modes are especially valuable in real-world applications that require speed and responsiveness. For example, chatbots and virtual assistants benefit from streaming because users can see answers as they are being generated, making the interaction feel more natural. Web applications and APIs often use asynchronous execution to handle many requests at once without slowing down.

When using these modes, keep in mind a few best practices. First, always make sure your code is properly structured to handle asynchronous functions — this usually means using async def and await in the right places. Second, when working with streaming, be careful to process each event as it arrives and update your user interface or logs accordingly. Finally, remember that not all environments support asynchronous code in the same way, so always test your application in the environment where it will run.

Summary & Next Steps

In this lesson, you learned how to make your agent-powered applications more responsive and interactive by using asynchronous and streamed execution modes. You saw how asynchronous execution allows your program to keep working while waiting for the agent’s response, and how streamed execution lets you process and display output in real time. These techniques are essential for building modern, user-friendly applications that feel fast and engaging.

You are now ready to practice these concepts in hands-on exercises. In the next part of the course, you will get the chance to implement asynchronous and streamed agent execution yourself, handle streaming events, and build applications that respond to users in real time. Keep experimenting, and you will soon be able to create powerful, interactive AI experiences!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal