Welcome back! In the previous lesson, you learned how to structure your agent's output using Zod schemas and the outputType parameter. This made your agent's responses more predictable and type-safe, allowing you to confidently access specific fields in the agent's output — an essential skill for building robust JavaScript applications.
Now, let's take the next step: making your agent-powered applications more responsive and interactive. In many real-world scenarios — such as chatbots, web apps, or interactive tools — waiting for an agent to complete its entire response before showing anything to the user can make your application feel slow or unresponsive. This is where streaming execution comes in. While the standard async execution waits for the complete response, streaming allows you to process and display the agent's output as it's being generated, creating a smoother and more engaging user experience.
In this lesson, you'll learn the differences between non-streaming and streaming execution modes, and how to implement streaming in your JavaScript applications. By the end, you'll be able to build applications that feel fast and interactive, even when working with complex AI agents.
The OpenAI Agents SDK for JavaScript provides two ways to handle agent responses: non-streaming (default) and streaming execution. Both are asynchronous operations (using async/await), but they differ in how and when you receive the agent's output.
Non-streaming execution is what you've been using so far. When you call await run(agent, input), your code waits for the agent to completely finish generating its response before returning the result. This is simple and works well for many use cases, but it means users have to wait for the entire response before seeing anything.
Streaming execution allows you to receive and process the agent's output as it's being generated, token by token or chunk by chunk. This creates a more interactive experience, similar to how ChatGPT displays text as it "types" out responses. To enable streaming, you simply add { stream: true } to your run call.
Here's a quick comparison:
Think of streaming like watching a video that loads progressively versus downloading the entire file before you can watch it — streaming provides immediate feedback and keeps users engaged throughout the process.
Let's start by reviewing the non-streaming approach you're already familiar with. This is the default behavior when you run an agent:
In this example, when you run the agent, you'll wait several seconds before the complete response appears all at once. The user has no indication that the agent is working during this time, which can make the application feel unresponsive.
Streamed execution is designed for situations where you want to process the agent’s output as soon as it is available, rather than waiting for the entire response. This is especially useful for real-time applications, such as chatbots or live dashboards, where you want to display information to the user as quickly as possible.
With the OpenAI Agents SDK, you can use the run function with the { stream: true } option to start the agent and then process the streaming events as they arrive. Each event can represent a new chunk of text or another type of update from the agent.
Here’s an example that prints the agent’s response as it is generated:
Here’s what’s happening in this example:
- Start the agent in streaming mode by passing
{ stream: true }to therunfunction. This lets you receive the agent’s response bit by bit, instead of waiting for the whole answer. - Stream the output to the console using
toTextStream(), which pipes the streamed text directly toprocess.stdoutas it arrives. - Wait for the stream to complete with
await streamResult.completed. - Access the final output as before, using
streamResult.finalOutput.
While the run() function provides a convenient way to execute agents with streaming, you can also use the Runner class directly for more control over the execution environment.
The Runner class supports streaming in the same way as the top-level run() function. Here's how to use it:
The streaming behavior is identical whether you use run() or Runner.run() - both return the same StreamedRunResult object with the same streaming capabilities and event handling.
If you want more control over the streaming events (for example, to update a UI or log each chunk), you can manually process the events using an async iterator:
When you iterate over streamResult, here's what happens:
- You receive various event types as the agent processes your request, not just text output
- The
raw_model_stream_eventtype contains the actual streaming data from the model - Within these events,
output_text_deltarepresents a chunk of text output - The
deltafield contains the new text fragment that was just generated - Each iteration processes one chunk, allowing you to handle it immediately (update UI, log, transform, etc.)
This manual approach gives you complete control over how streaming data is processed and displayed, making it ideal for building sophisticated user interfaces or applications with special streaming requirements.
Choose the right approach based on your use case:
Use Non-Streaming When:
- Building APIs that need complete responses before returning
- Processing batches of requests
- The response is short and quick
- You need to validate or transform the complete response before showing it
- Working with systems that expect complete data
Use Streaming When:
- Building interactive chat interfaces
- Creating live demos or presentations
- Working with long-form content generation
- Providing real-time feedback to users
- Building applications where perceived performance matters
In this lesson, you learned how streaming execution can make your agent-powered applications feel more responsive and interactive. You saw the difference between non-streaming and streaming modes, learned how to implement streaming with the StreamedRunResult object, and explored best practices for building real-time applications.
Key takeaways:
- Non-streaming execution (
await run(agent, input)) waits for the complete response - Streaming execution (
await run(agent, input, { stream: true })) provides output as it's generated - Use
toTextStream()for simple text streaming or iterate over events for more control - Always wait for
streamResult.completedbefore considering the interaction done - Choose streaming for interactive applications and non-streaming for batch processing or APIs
In the next part of the course, you'll practice implementing streaming in various scenarios, handle streaming with tool calls and handoffs, and build a real-time application that showcases the power of streaming agents. Get ready to create AI experiences that feel truly interactive!
