In the previous lesson, you built a FastAPI server that exposes AI agents through REST endpoints, storing agent states in a Python dictionary. This worked well for learning the basics, but had a critical flaw: when your server restarts, all agent states disappear. For Factor 6 — Launch / Pause / Resume with simple APIs to work reliably, an agent that was paused yesterday must still be resumable today. Database persistence solves this by storing states on disk. In this lesson, you'll replace in-memory storage with SQLite using SQLAlchemy and implement progress callbacks that save state after each agent step, enabling real-time progress tracking for clients. This persistent foundation is what makes the pause and resume endpoints you'll build in the next lesson truly practical.
To begin working with a database, you need to create a connection engine and configure where the database file will be stored. SQLAlchemy provides this through the create_engine function, which takes a connection string specifying the database type and location:
The db_path variable uses Path to construct an absolute path to a file named agent_states.db inside a data directory. The mkdir(parents=True, exist_ok=True) call ensures the directory exists before SQLite tries to create the database file. The connection string sqlite:///{db_path} tells SQLAlchemy to use SQLite with the specified file path, and setting echo=False prevents SQLAlchemy from logging all SQL statements to the console. The declarative_base() function creates a base class that all your database models will inherit from, while sessionmaker creates a factory for database sessions that represent transaction boundaries. With this foundation in place, you can now define how agent states will be stored.
With the engine and session factory ready, you can define a database model that represents how agent states will be stored. SQLAlchemy models are Python classes that inherit from the Base class you created earlier:
The __tablename__ attribute tells SQLAlchemy to create a database table called states, and each class attribute corresponds to a column in that table. The id column uses String type and is marked as the primary_key, meaning each record must have a unique identifier. The steps column stores an integer with a default value of 0, while status tracks the agent's current state. The context and pending_tool_calls columns use the JSON column type, which stores complex Python data structures directly in the database. The and columns use type and are marked as because they only have values in specific situations. After defining your model, you need to create the actual database table.
Once you have defined your SQLAlchemy models, you can automatically create the corresponding database tables with a single line of code:
This line examines all models that inherit from Base and creates corresponding tables in the database. If the tables already exist, SQLAlchemy skips them, making this safe to run every time your server starts. Now you have a complete database schema ready to store agent states, but you need a way to convert between your Pydantic models and these database models.
Your API uses Pydantic's State model for validation and serialization, while SQLAlchemy's StateModel handles database operations. You need a helper function to convert a Pydantic model to a database model:
This function takes a State object and creates a new StateModel instance by copying each field. Because both models have the same field names, this is straightforward—you simply pass each attribute from the Pydantic model to the SQLAlchemy model's constructor. The StateModel instance returned by this function can be added to a database session and saved, allowing you to persist Pydantic states to disk.
The reverse conversion takes a database model and creates a Pydantic model, which you'll need when reading states from the database to return them through your API:
This function performs the opposite transformation, taking attributes from the StateModel and constructing a State object. The or [] expressions handle cases where the database might return None for the context or pending_tool_calls fields—Pydantic expects these to be lists, so the conversion provides empty lists as defaults. These two conversion functions form a bridge between your persistence layer and your API layer, but you still need a safe way to manage database transactions.
Database sessions need careful management to ensure data integrity. If an error occurs during a transaction, you must roll back any changes to prevent partial updates from corrupting your database:
The @contextmanager decorator turns this function into a context manager that can be used with Python's with statement. When you enter the with block, the function creates a new session and yields it to your code. If your code completes without raising an exception, the try block calls session.commit() to save all changes. If an exception occurs, the except block calls session.rollback() to undo any changes, then re-raises the exception. The finally block ensures the session is always closed, releasing database connections and preventing resource leaks. This pattern guarantees you never forget to commit or close a session, making your database code more reliable.
As a reminder from earlier lessons, the agent's run method processes states by making LLM calls and executing tools until the workflow completes. To enable incremental persistence, you need to modify run to accept an optional callback function that gets called after each step:
The progress_callback parameter defaults to None, making it optional. After each call to self._next_step(state), the code checks whether a callback was provided and calls it with the updated state. The callback is also called in the exception handler, ensuring that failed states get persisted. By accepting callbacks rather than hardcoding database operations, the run method remains flexible and can work with different persistence strategies. Notice the is_resuming logic: when the agent already has steps completed (meaning it was paused or interrupted), max_steps_allowed is calculated relative to the current step count, so a resumed agent gets a full budget of additional steps — a detail that will matter when you implement pause and resume in the next lesson.
To leverage the callback mechanism you just added to the agent, you need to create a function that saves state to the database after each step:
The _create_progress_callback function returns a closure that captures the state_id. Each time the agent completes a step, this callback queries the database for the existing record using session.query(StateModel).filter(StateModel.id == state_id).first(), updates all its fields with the new state values, and lets the context manager commit the changes automatically when the with block exits. This incremental saving means clients polling the state endpoint will see progress as it happens, rather than waiting for the entire workflow to complete.
With database models and progress callbacks in place, you can update your server endpoints to persist states. The launch endpoint now saves the initial state to the database instead of storing it in a dictionary:
This code opens a database session using the with get_db_session() context manager, converts the Pydantic State to a SQLAlchemy StateModel using pydantic_to_db(), then adds the new record to the session with session.add(). The context manager automatically commits this change when the with block ends. The endpoint still returns initial_state immediately and schedules background processing, maintaining responsive behavior while ensuring the state survives server restarts.
The background task function needs to load state from the database, run the agent with progress callbacks, and perform a final save when complete:
The function first loads the initial state from the database and converts it to a Pydantic model using db_to_pydantic(). It then creates a progress callback using the helper function you defined earlier and passes it to agent.run(). As the agent works, the callback saves state after each step. After the agent finishes, the function performs one final database update to ensure the terminal status is persisted, with the context manager committing the update automatically when the block exits, guaranteeing that clients always see the complete final state.
The state retrieval endpoint needs to query the database instead of looking up a dictionary key:
This endpoint opens a database session, queries for a StateModel with the matching id, and returns a 404 error if no record exists. If the record is found, it converts the database model to a Pydantic State using db_to_pydantic() and returns it. FastAPI automatically serializes the Pydantic model to JSON for the response, and now your endpoint reads from persistent storage rather than volatile memory.
When you run the test script from the previous lesson against this updated server, you'll see real-time progress as the agent works through each step:
Notice how the polling loop displays progress at each step, showing the current step count and any pending tool calls. Each line represents a fresh query to the database, demonstrating that states are being saved incrementally. If you restart your server in the middle of this workflow, the background execution will stop because this is a simple demo architecture rather than a durable worker system. However, the data remains safe; when you poll again with the same state ID, you'll find the last saved state right where it left off. This persistent foundation is exactly what you need to implement manual resume functionality in the next lesson.
After the agent completes all steps, the test script displays the final state with the full conversation history and answer:
The output shows that the agent successfully completed the task in seven steps, with the context array containing a detailed trace of every function call the agent made while solving the quadratic equation. This complete state is safely stored in your database, ready to be retrieved even if the server restarts, demonstrating the reliability of your new persistence layer.
You've transformed your API from using ephemeral in-memory storage to a durable persistence layer with SQLite and SQLAlchemy, enabling states to survive server restarts and providing real-time progress visibility through incremental saves. While the database state is now durable, remember that our execution model remains a lightweight teaching pattern—if the server stops, the background task stops too. However, this persistence infrastructure is what makes Factor 6 — Launch / Pause / Resume with simple APIs truly viable — without durable state, pausing and resuming would be meaningless. In the next lesson, you'll build on this foundation to implement the pause and resume endpoints that complete Factor 6.
