Deploying Models as REST APIs

Introduction & Overview

Welcome back! So far, you have learned how to explore and prepare data, train a linear regression model, and evaluate its performance on unseen data. In the previous lesson, you saw how to use your trained model to make predictions and assess how well it generalizes to new situations.

Now, it is time to take the next big step: making your model available for others to use. In real-world applications, machine learning models are rarely used only by the person who trained them. Instead, they are often deployed as web services so that other applications, websites, or users can send data and receive predictions. This is where REST APIs come in. In this lesson, you will learn how to deploy your trained model with a REST API using FastAPI. By the end of this lesson, you will know how to build, run, and test an API that serves predictions from your model, making your work accessible and useful in real-world scenarios.

Overview of REST APIs and FastAPI

Before we dive into the code, let's briefly discuss what a REST API is and why it is important in machine learning deployment. A REST API (Representational State Transfer Application Programming Interface) is a way for different software systems to communicate over the web using standard HTTP methods like GET and POST. When you deploy your model as a REST API, you make it possible for other programs to send data to your model and receive predictions in return, all through simple web requests.

FastAPI is a modern Python web framework designed for building APIs quickly and efficiently. It is known for its speed, ease of use, and automatic documentation features. FastAPI also makes it easy to handle errors, which is important for building reliable machine learning services. By using FastAPI, you can create a scalable and production-ready API with minimal code.

Loading the Trained Model

The first step in building your API is to load the trained model that you saved in previous lessons. You will use the joblib library to load the model from the joblib file:

This loads your trained linear regression model into memory so it can be used to make predictions when requests come in.

Creating the FastAPI Application

Next, you need to create a FastAPI application instance. This will serve as the foundation for your API:

The title and version parameters are optional but help document your API. FastAPI will automatically generate interactive documentation based on this information.

Creating the Prediction Endpoint

To handle incoming data, you can work directly with the raw JSON sent in the request. FastAPI provides access to the request body through the Request object. Here’s how you can define the prediction endpoint:

Let’s break down what’s happening:

The endpoint is set up at /predict, which means you (or anyone else) can send data to this specific URL to get a prediction. It accepts POST requests, which are used when you want to send data to the server for processing.
The function receives the request object, and await request.json() parses the incoming JSON body into a Python dictionary.
The features dictionary is wrapped in a list and converted into a pandas DataFrame, which is the format expected by the trained model.
The model makes a prediction, and the result is returned as a JSON response with a prediction value and a status string.
If any error occurs (such as missing fields or invalid data), an HTTP 400 error is returned with the error message.

Running the API Server

Once your API code is ready, you need to run it using Uvicorn. Open your terminal and run the following command:

When you run this command, you should see output like this:

This means your API server is now running and ready to accept requests on port 8000. The --reload flag means the server will automatically restart if you make changes to your code.

Preparing the Request Data

Now that your server is running, you can send requests to test it. First, let's prepare the sample data that we'll send to the API:

This sample data contains all nine features that our model expects, including the engineered feature RoomsPerHousehold.

Sending the POST Request

Next, define the API endpoint URL and send the request:

The requests.post() function sends a POST request to our API with the sample data in JSON format.

Handling the API Response

Finally, handle the response and display the results:

This code demonstrates proper error handling when working with APIs. The try-except block catches different types of errors that might occur: connection errors (when the server isn't running), HTTP errors (when the server returns an error status code), and general exceptions. The status code 200 indicates a successful request, while other codes like 400 (Bad Request) or 500 (Internal Server Error) indicate problems. By checking the status code and handling exceptions, your client code becomes more robust and provides helpful feedback when things go wrong.

Observing the Results

When you run this code while the server is running, you should see output like:

Meanwhile, in the server terminal, you will see a new log entry showing that the request was processed:

The last line shows that a POST request was made to the /predict endpoint from IP address 127.0.0.1 (localhost) and it returned a 200 OK status, indicating the request was successful.

Challenges of Local Development and SageMaker Advantages

Building and deploying machine learning models locally is a great way to learn the fundamentals, but it comes with real limitations. As your projects grow, you may find your machine running out of memory with large datasets, training times becoming impractically long, and your simple FastAPI server struggling to handle more than a handful of requests. Managing consistent environments across different machines or team members can be tricky, and meeting enterprise security or scalability requirements is often out of reach with local setups.

In professional settings, these challenges are addressed by cloud-based platforms like Amazon SageMaker. SageMaker allows you to train models on powerful, scalable infrastructure, deploy them as production-ready, auto-scaling endpoints, and manage everything from security to monitoring with ease. You only pay for the resources you use, and you can scale up or down as needed—eliminating the headaches of server management and environment inconsistencies.

While this unit focuses on local deployment to help you master the core concepts, cloud solutions like SageMaker are the next step for taking your machine learning projects to production scale. In future lessons, you’ll learn how to leverage these tools to build, train, and deploy models efficiently and securely in real-world environments.

Summary & Conclusion

In this lesson, you learned how to deploy your trained machine learning model as a REST API using FastAPI. You built the API step by step: loading the trained model, creating the FastAPI application, creating a prediction endpoint that parses raw JSON, running the server with Uvicorn, and testing it by sending requests.

These skills are essential for making your machine learning models useful in real-world applications. By exposing your model as an API, you allow other programs and users to benefit from your work. You've also seen how local development, while great for learning, has limitations that enterprise-scale solutions like SageMaker can overcome.

You are now ready to move on to hands-on practice, where you will build and test your own prediction API. Make sure to complete all the exercises in this unit to solidify your understanding of these fundamental concepts. Once you've mastered these basics, you'll be perfectly prepared for our next courses, where you'll discover how to harness the full power of AWS to build, train, and deploy models at enterprise scale!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal