Welcome to your first lesson in managing ML resources with the SageMaker AI Console. In this lesson, you'll learn to navigate the console like a seasoned ML engineer, discovering what's already running in your environment and understanding the current state of your machine learning infrastructure.
Our focus is on three essential skills: finding training jobs to see what models have been built, checking endpoints to see what's currently deployed and serving predictions, and understanding how to move quickly between these key sections. By the end, you’ll be able to answer questions like, “Is our latest model still training?” or “Which endpoint is serving production traffic?”—all by exploring the console.
Before diving deeper into resource management, let's quickly revisit what endpoints are and how they relate to models and endpoint configurations in SageMaker.
An endpoint is a fully managed, real-time prediction service that hosts your trained ML model so it can receive inference requests. Endpoints are the live, running resources that consume compute and incur costs as long as they are active.
There are three key components involved in deploying a model for real-time inference:
- Model: This is the trained artifact (such as a .tar.gz file) that contains your model weights and any necessary code for inference.
- Endpoint Configuration: This defines how your model will be deployed, including the instance type, instance count, and other deployment parameters. It acts as a blueprint for the endpoint.
- Endpoint: This is the actual running service created from a specific endpoint configuration and model. The endpoint receives traffic and serves predictions.
The typical workflow is:
- You create a model.
- You define an endpoint configuration that references the model and specifies deployment settings.
- You create an endpoint using the endpoint configuration.
Understanding this relationship is crucial: endpoints depend on endpoint configurations, which in turn reference models. When managing resources, always consider these dependencies to avoid accidentally deleting components that are still in use.
To begin, you’ll need to access the SageMaker AI Console from the AWS Management Console. Here’s how:
- Log in to AWS: Go to console.aws.amazon.com and sign in.
- Search for SageMaker: Use the search bar at the top to type "SageMaker".
- Open SageMaker AI: Select "Amazon SageMaker AI" from the dropdown.
The SageMaker AI Console is organized into sections that mirror the ML workflow, with the most relevant for us being:
- Training: Where you manage and monitor model training jobs.
- Inference: Where you handle deployed models and endpoints.
The video below demonstrates these steps and gives you a first look at the console’s layout.
After following these steps, you should feel comfortable getting to the SageMaker AI Console and recognizing where the main navigation sections are. Next, let’s explore how to find out what training jobs are running or have completed.
The Training section is your starting point for everything related to building models. Here, you can:
- See a list of all training jobs in your environment.
- Check the status of each job (InProgress, Completed, Failed, etc.).
- View basic details like job names and start/end times.
The following video will show you how to navigate the Training section and interpret what you see.
Once you know how to locate and review your training jobs, the next step is to see which models are actually deployed and serving predictions.
The Inference section is where you keep track of what’s live in production. In this section, you can:
- View all deployed endpoints.
- Check the status of each endpoint (such as InService or Failed).
- See each endpoint’s configuration and its associated model.
Watch the next video to see how to access and review your endpoints in the Inference section.
With this understanding of how to find deployed endpoints and check their health, you’ll always be able to answer what’s running in production.
You now have the foundational skills to navigate the SageMaker AI Console. You know how to access the console, find and review training jobs, and check on deployed endpoints. With these skills, you can quickly assess the state of your ML resources and confidently answer questions about what’s running in your environment.
Now, you’ll get to experience the console first hand in our practice session. You’ll be given credentials to access an AWS account and will use the SageMaker AI Console directly. Your task will be to answer questions about the resources you find—just like you would in a real-world ML project. This is your chance to put your navigation skills to the test and build confidence working with live SageMaker environments.
