Welcome back! So far, you have learned how to trigger AWS Lambda functions automatically when files are uploaded to S3, and even how to start a Glue ETL job in response to those uploads. These event-driven automations are powerful, but sometimes you need your data pipeline to run on a regular schedule, not just when new data arrives.
For example, you might want to check your data lake every hour to see if new files have arrived or run a cleanup task every night. Scheduling Lambda functions allows you to automate these regular tasks without waiting for an event like a file upload. This lesson will show you how to set up Lambda functions that run automatically at set times, making your data pipeline even more flexible and reliable.
To run Lambda functions on a schedule, AWS provides a service called EventBridge (previously known as CloudWatch Events). EventBridge lets you define rules that trigger actions at specific times or intervals. You can think of it as a cloud-based version of a cron job or a task scheduler.
When you create a scheduled rule in EventBridge, you specify how often you want it to run — such as every hour, every day, or at a specific time. EventBridge then automatically invokes your Lambda function according to that schedule. This means you do not need to rely on external triggers like S3 uploads; your function will run exactly when you want it to.
Let's look at a example of a Lambda function designed to run on a schedule. This function will check a specific S3 bucket and folder (prefix) for new files, log the number of files found, and print a message if new files are detected. Here is the code:
How does this work?
- The function connects to S3 using
boto3. - It checks a specific bucket and folder (you can change these to match your setup).
- It counts how many files are present.
- If any files are found, it prints a message (in a real pipeline, you could trigger more processing here).
- It returns a status and the file count.
The S3 call list_objects_v2() asks AWS for the objects currently stored in a bucket, optionally filtered by a prefix such as incoming/. It returns a response dictionary with metadata about the matching objects. In this example, we use KeyCount from that response as a simple way to measure how many files are currently waiting in that folder. This is your first direct use of the S3 API through boto3, so the main idea is: Lambda calls S3, S3 returns a dictionary, and your code reads the fields it needs from that dictionary.
Before setting up your schedule, it's important to understand how EventBridge lets you specify when your function should run. EventBridge supports two types of schedule expressions: rate expressions and cron expressions. Each has its own use case and syntax.
Rate Expressions:
Rate expressions are simple and great for running tasks at regular intervals. The syntax is rate(value unit), where the value is a number and the unit can be minute, minutes, hour, hours, day, or days.
Important: The minimum granularity for rate expressions is 1 minute. You cannot schedule a function to run more frequently than once per minute using rate expressions.
Examples:
rate(1 minute)- Runs every minute (the most frequent option)rate(5 minutes)- Runs every 5 minutesrate(1 hour)- Runs every hourrate(12 hours)- Runs every 12 hoursrate(1 day)- Runs once per day
Use rate expressions when you need simple, regular intervals and don't care about the exact time of day.
Cron Expressions:
Cron expressions give you more precise control over scheduling. They let you specify exact times, specific days of the week, or complex patterns. The syntax is .
Now that you have a Lambda function ready to run on a schedule and understand how schedule expressions work, let's see how to actually set up the schedule in AWS. This is where EventBridge comes in. EventBridge is a service that can automatically trigger actions (like running your Lambda function) at specific times or intervals—just like setting an alarm clock for your code!
Here's what you need to do:
-
Create a Rule in EventBridge:
You tell EventBridge when you want your Lambda function to run. For example, you can say, "Run this function every hour." -
Give EventBridge Permission:
For security, AWS needs to know that EventBridge is allowed to run your Lambda function. You have to give it permission. -
Connect the Rule to Your Lambda Function:
Finally, you link your Lambda function to the EventBridge rule, so the rule knows which function to run.
Let's look at a simple Python script that does all of this using the AWS SDK for Python, called boto3. You can run this script from your computer (make sure you have AWS credentials set up).
In this lesson, you learned how to automate regular data tasks by running Lambda functions on a schedule using EventBridge. You saw how to write a Lambda function that checks for new files in S3, learned about the two types of schedule expressions (rate and cron) and their use cases, and how to set up an EventBridge rule to trigger your function. You also learned that rate expressions have a minimum granularity of 1 minute, which helps you set realistic expectations for how frequently your automated tasks can run.
Next, you will get hands-on practice setting up and testing your own scheduled Lambda functions. You will create schedules using both rate and cron expressions, connect them to your Lambda code, and verify that everything runs as expected. This will help you build confidence in automating regular tasks in your AWS data pipelines.
