Launching Agents with RESTful APIs

Introduction: Why Expose Agents via APIs?

Think about how you use your favorite apps: checking email on your phone, responding from your laptop, or asking your smart speaker to read messages. Each device talks to the same service through an API. This is the power of decoupling — separating core logic from any single interface so it can be accessed from anywhere.

In this course, we'll put three of those factors into practice by building a real API layer around an AI agent. Specifically, this lesson kicks off Factor 11 — Trigger from anywhere, meet users where they are and expands Factor 6 — Launch / Pause / Resume with simple APIs by implementing the launch part.

We'll build a FastAPI server that exposes an AI agent through REST endpoints, making it accessible from web browsers, mobile apps, scripts, or any tool that can make HTTP requests.

Project Structure

As we transition from building standalone agent logic to exposing it via a web service, our project structure reflects a clear separation between the API layer and the core logic. You are already familiar with the core directory, which houses the agent's decision-making logic, tools, and models. We now introduce a server directory to handle web requests and a top-level test.py script to simulate a client:

src/core/: Contains the Agent class and State models you've used in previous lessons.
src/server/main.py: This is where you will write your FastAPI code. It imports the agent from core and exposes it through REST endpoints.
test.py: A standalone script that uses the requests library to "talk" to your server, allowing you to test the full lifecycle of an agent task.

This separation is central to Factor 11: your agent logic remains independent of how it is triggered — whether it's via a CLI, a web API, a Slack bot, or a cron job. The server directory is just one thin adapter around a stable core.

Setting Up FastAPI

FastAPI is a modern Python web framework designed for building APIs quickly and efficiently. It uses Python type hints to validate data automatically and generates interactive documentation for your endpoints. To get started, you create an instance of the FastAPI application:

This single line creates your API application. The app object will be used to define routes using Python decorators. If you were installing FastAPI locally, you would run pip install fastapi uvicorn in your terminal. To start the server locally, you would run:

This tells uvicorn to load the app object from your script and serve it, usually on http://localhost:8000. The --reload flag automatically restarts the server whenever you change your code. On CodeSignal, this environment is already set up for you.

Defining the Launch Request Model

When your API receives data from a client, you must ensure that the data is structured correctly. FastAPI uses Pydantic models to validate incoming requests using Python's type hints:

The LaunchRequest model defines a single field called input_prompt, which must be a str. When a client sends a POST request to launch an agent, FastAPI automatically validates that the request body contains this field and that it is a str, returning clear error messages if validation fails.

Understanding the State Model

The agent's progress is tracked through a State model that comes from your core agent logic. This model includes fields like id, status, steps, context, and final_answer:

The State model serves as both the response type for your endpoints and the internal representation of where the agent is in its workflow. By declaring response_model=State in your endpoint decorators, FastAPI knows to serialize the state object into JSON format when sending responses to clients.

Managing State with In-Memory Storage

When an agent starts working on a task, it might take several seconds or even minutes to complete. During this time, you need to store information about the agent's progress so clients can check back later:

This creates an empty dictionary where keys are str (the state ids) and values are State objects. In-memory storage is simple and works well for learning, though in a later unit, you will replace this with a database that persists data even when the server restarts.

Creating the Agent Instance

Before building endpoints, you need an Agent instance that will handle the actual work:

This creates an agent with all the tools and capabilities defined in your core logic. The agent's run method will process states, make LLM calls, and execute tool functions as needed.

Building the Launch Endpoint Structure

The launch endpoint is where clients start a new agent workflow. This is the first piece of Factor 6 — giving external systems a simple API to launch agent execution:

The @app.post("/agent/launch") decorator maps this function to POST requests. It creates an initial State, stores it, and uses BackgroundTasks to trigger the execution. This allows the API to return the id to the client immediately while the heavy work happens in the background.

Implementing Background Task Execution

Agent workflows can take a long time to complete, so running them directly in the endpoint would block the HTTP request. Instead, we can use a helper function:

This function retrieves the state, runs the agent via agent.run(), and updates the states dictionary with the results (including the updated steps count and final_answer).

Creating the State Retrieval Endpoint

Once a client has launched an agent and received a state_id, they need a way to check the agent's progress:

The @app.get() decorator maps this function to GET requests with a path parameter for the state_id. Together with the launch endpoint, this forms the foundation of Factor 6: clients can launch a workflow and then poll its status through simple API calls.

Setting Up the Test Script

To see your API in action, you need a client. The test script starts by launching an agent:

This test script is a demonstration of Factor 11 — it's a separate process accessing the agent through its REST API.

Implementing the Polling Loop

After launching the agent, the test script repeatedly checks the agent's status until it reaches a terminal state:

The while True loop continues until the status indicates the agent is no longer running.

Viewing the Complete Output

When you run the test script, you'll see the agent's progress:

The output shows the agent finished in 8 steps. The context array contains the full trace of reasoning and tool calls used to reach the final_answer. This transparency is valuable for debugging and understanding how the agent arrives at its conclusions.

Summary and What's Next

You've implemented the first part of Factor 6 — Launch / Pause / Resume with simple APIs and addressed Factor 11 — Trigger from anywhere by decoupling your agent. You learned to create FastAPI endpoints, validate data with Pydantic, use BackgroundTasks, and implement a polling client. In the next lesson, you'll replace the in-memory dictionary with a real database so that agent states survive server restarts.

Next Lesson: Persisting States with Databases and Callbacks

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal