Streaming Agent Responses

Introduction & Context

Welcome back! In the previous lesson, you learned how to structure your agent's output using Zod schemas and the outputType parameter. This made your agent's responses more predictable and type-safe, allowing you to confidently access specific fields in the agent's output — an essential skill for building robust JavaScript applications.

Now, let's take the next step: making your agent-powered applications more responsive and interactive. In many real-world scenarios — such as chatbots, web apps, or interactive tools — waiting for an agent to complete its entire response before showing anything to the user can make your application feel slow or unresponsive. This is where streaming execution comes in. While the standard async execution waits for the complete response, streaming allows you to process and display the agent's output as it's being generated, creating a smoother and more engaging user experience.

In this lesson, you'll learn the differences between non-streaming and streaming execution modes, and how to implement streaming in your JavaScript applications. By the end, you'll be able to build applications that feel fast and interactive, even when working with complex AI agents.

Understanding Non-Streaming vs Streaming Execution

The OpenAI Agents SDK for JavaScript provides two ways to handle agent responses: non-streaming (default) and streaming execution. Both are asynchronous operations (using async/await), but they differ in how and when you receive the agent's output.

Non-streaming execution is what you've been using so far. When you call await run(agent, input), your code waits for the agent to completely finish generating its response before returning the result. This is simple and works well for many use cases, but it means users have to wait for the entire response before seeing anything.

Streaming execution allows you to receive and process the agent's output as it's being generated, token by token or chunk by chunk. This creates a more interactive experience, similar to how ChatGPT displays text as it "types" out responses. To enable streaming, you simply add { stream: true } to your run call.

Here's a quick comparison:

Aspect	Non-Streaming	Streaming
Method	`await run(...)`	`await run(..., { stream: true })`
Response timing	All at once when complete	Incrementally as generated
Result type	`RunResult`	`StreamedRunResult`
User experience	Wait, then see full response	See response appearing live
Use cases	APIs, batch processing, simple apps	Chat UIs, live demos, interactive apps

Think of streaming like watching a video that loads progressively versus downloading the entire file before you can watch it — streaming provides immediate feedback and keeps users engaged throughout the process.

Non-Streaming Execution (Standard Approach)

Let's start by reviewing the non-streaming approach you're already familiar with. This is the default behavior when you run an agent:

In this example, when you run the agent, you'll wait several seconds before the complete response appears all at once. The user has no indication that the agent is working during this time, which can make the application feel unresponsive.

Streaming Execution for Real-Time Output

Streamed execution is designed for situations where you want to process the agent’s output as soon as it is available, rather than waiting for the entire response. This is especially useful for real-time applications, such as chatbots or live dashboards, where you want to display information to the user as quickly as possible.

With the OpenAI Agents SDK, you can use the run function with the { stream: true } option to start the agent and then process the streaming events as they arrive. Each event can represent a new chunk of text or another type of update from the agent.

Here’s an example that prints the agent’s response as it is generated:

Here’s what’s happening in this example:

Start the agent in streaming mode by passing { stream: true } to the run function. This lets you receive the agent’s response bit by bit, instead of waiting for the whole answer.
Stream the output to the console using toTextStream(), which pipes the streamed text directly to process.stdout as it arrives.
Wait for the stream to complete with await streamResult.completed.
Access the final output as before, using streamResult.finalOutput.

Using the Runner Class for Streaming

While the run() function provides a convenient way to execute agents with streaming, you can also use the Runner class directly for more control over the execution environment.

The Runner class supports streaming in the same way as the top-level run() function. Here's how to use it:

The streaming behavior is identical whether you use run() or Runner.run() - both return the same StreamedRunResult object with the same streaming capabilities and event handling.

Processing Streaming Events Manually

If you want more control over the streaming events (for example, to update a UI or log each chunk), you can manually process the events using an async iterator:

When you iterate over streamResult, here's what happens:

You receive various event types as the agent processes your request, not just text output
The raw_model_stream_event type contains the actual streaming data from the model
Within these events, output_text_delta represents a chunk of text output
The delta field contains the new text fragment that was just generated
Each iteration processes one chunk, allowing you to handle it immediately (update UI, log, transform, etc.)

This manual approach gives you complete control over how streaming data is processed and displayed, making it ideal for building sophisticated user interfaces or applications with special streaming requirements.

When to Use Streaming vs Non-Streaming

Choose the right approach based on your use case:

Use Non-Streaming When:

Building APIs that need complete responses before returning
Processing batches of requests
The response is short and quick
You need to validate or transform the complete response before showing it
Working with systems that expect complete data

Use Streaming When:

Building interactive chat interfaces
Creating live demos or presentations
Working with long-form content generation
Providing real-time feedback to users
Building applications where perceived performance matters

Summary & Next Steps

In this lesson, you learned how streaming execution can make your agent-powered applications feel more responsive and interactive. You saw the difference between non-streaming and streaming modes, learned how to implement streaming with the StreamedRunResult object, and explored best practices for building real-time applications.

Key takeaways:

Non-streaming execution (await run(agent, input)) waits for the complete response
Streaming execution (await run(agent, input, { stream: true })) provides output as it's generated
Use toTextStream() for simple text streaming or iterate over events for more control
Always wait for streamResult.completed before considering the interaction done
Choose streaming for interactive applications and non-streaming for batch processing or APIs

In the next part of the course, you'll practice implementing streaming in various scenarios, handle streaming with tool calls and handoffs, and build a real-time application that showcases the power of streaming agents. Get ready to create AI experiences that feel truly interactive!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal