Serving Estimator Models with Serverless Endpoints

Introduction & Overview

Welcome to another lesson on deploying models with SageMaker! In the previous unit, you learned how to deploy a locally trained model to a SageMaker serverless endpoint. You gained valuable experience with the fundamental concepts of model deployment: packaging model artifacts, uploading to S3, creating entry point scripts, and configuring serverless inference. Now it's time to build on that foundation and explore a more streamlined deployment workflow.

In this lesson, you'll learn how to deploy models that were trained directly within the SageMaker ecosystem using SageMaker estimators. This represents a natural progression in your learning journey because, when you train models using SageMaker's built-in training capabilities, the deployment process becomes significantly more streamlined. Instead of manually packaging and uploading model artifacts, SageMaker automatically handles these steps for you since the model artifacts are already stored within the SageMaker environment.

The key difference you'll discover is that SageMaker estimators come with built-in deployment capabilities that eliminate much of the manual configuration work you performed in the previous lesson. You'll learn how to attach to existing training jobs, leverage SageMaker's automatic artifact management, and deploy models with just a few lines of code while still maintaining the cost-effective benefits of serverless inference.

By the end of this lesson, you'll understand how to retrieve completed training jobs from your SageMaker environment, attach to those training jobs to create deployable estimators, configure serverless inference settings for optimal performance and cost management, and deploy estimator models to live endpoints that can serve real-time predictions. This knowledge will prepare you for more advanced deployment scenarios and help you build efficient machine learning workflows entirely within the SageMaker ecosystem.

Retrieving and Attaching to a Completed Training Job

When you train models using SageMaker estimators, the training process creates a training job that stores all the necessary information about your model, including the trained artifacts, training configuration, and metadata. To deploy one of these models, you first need to retrieve information about the completed training job and then attach to it to create a deployable estimator object.

As you did in the previous course when downloading model artifacts, you'll start by initializing a SageMaker session and retrieving your completed training jobs:

We're filtering for jobs containing sagemaker-scikit-learn because this is the default naming pattern that SageMaker uses for sklearn estimator training jobs. This helps us quickly identify the relevant training jobs from your account.

Now comes the key difference from the previous lesson. Instead of manually downloading and packaging model artifacts, you can attach directly to the existing training job to recreate the estimator:

The SKLearn.attach() method reconstructs an estimator object from the completed training job, automatically populating all the information SageMaker needs for deployment — the model artifacts location, training configuration, and framework specifications. This eliminates the manual packaging and uploading steps you performed in the previous lesson, since the estimator already has access to everything stored within the SageMaker environment.

Configuring Serverless Inference

As you learned in the previous lesson, serverless inference provides a cost-effective and scalable way to deploy models without managing infrastructure. The configuration process remains the same when deploying estimator models, but it's worth reviewing the key concepts and parameters that control your endpoint's behavior.

The configuration parameters remain the same whether you're deploying locally trained models or SageMaker estimators. We'll use the same memory_size_in_mb=2048 for 2 GB of memory allocation and max_concurrency=10 to limit simultaneous requests, providing the same cost control and performance benefits you experienced in the previous lesson.

Deploying the Estimator to a Serverless Endpoint

With your estimator attached and serverless configuration ready, you can now deploy your model to a live endpoint. The deployment process for estimator models is more streamlined than deploying locally trained models because SageMaker already has access to all the necessary model artifacts and configuration information.

The estimator.deploy() method initiates the deployment process using the serverless configuration you specified. Notice how much simpler this is compared to the previous lesson — you don't need to create a separate model object, specify entry point scripts, or configure framework versions. The estimator already contains all this information from the original training job.

Since the deployment happens in the background, you need to actively monitor its progress. The describe_endpoint() call queries the current status of your endpoint deployment. When you first run this code, you'll typically see output like:

Your endpoint progresses through several states during deployment. It starts as Creating while AWS provisions the serverless infrastructure, then transitions to InService when it's ready to handle prediction requests. If something goes wrong during deployment, the status will show Failed, and you can examine the error details to troubleshoot the issue.

Testing the Deployed Endpoint for Inference

Once your endpoint reaches the InService status, it's ready to handle prediction requests. Testing your deployed endpoint serves two critical purposes: verifying that the deployment was successful and ensuring that your model maintains its expected performance in the production environment.

The process of connecting to and testing an estimator-based endpoint is identical to what you learned in the previous lesson, but let's review the key concepts to reinforce your understanding:

The Predictor class creates a connection to your deployed endpoint using the endpoint name. As you learned previously, this predictor handles all the HTTP communication complexity, allowing you to focus on sending data and receiving predictions. The serializer and deserializer configuration remains the same — converting your Python data structures to CSV format for transmission and converting the CSV responses back to Python objects.

Now you can load your test data and make predictions to evaluate your deployed model's performance:

The evaluation process loads your test dataset, separates the features from the target variable, and sends the features to your endpoint for prediction. The predictor.predict(X_test.values) call triggers the entire serialization process, sending your data to the SageMaker endpoint and receiving predictions back.

Summary & Next Steps

Congratulations! You've successfully learned how to deploy SageMaker estimator models using serverless inference, building upon the foundational deployment concepts from your previous lesson. You discovered how to retrieve and attach to completed training jobs, which provides a much more streamlined deployment workflow compared to manually packaging locally trained models. The key advantage of working with SageMaker estimators is that all the model artifacts, configuration details, and framework specifications are automatically managed by SageMaker, eliminating the manual steps you performed in the previous lesson.

You've now mastered two fundamental deployment approaches in SageMaker, giving you the flexibility to choose the right method based on whether your models are trained locally or within the SageMaker ecosystem. In the upcoming practice exercises, you'll apply these concepts hands-on to solidify your understanding of estimator deployment workflows. Happy coding!

Previous Lesson

Next Lesson: Publishing ModelTrainer Models to SageMaker Endpoints

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal