Welcome to your first lesson in the Data Pipeline Automation on AWS course. In this course, you will learn how to automate data tasks using AWS Lambda. AWS Lambda is a serverless compute service that lets you run code without having to manage servers. This means you can focus on writing your logic, and AWS takes care of running it for you, scaling it as needed, and charging you only for the compute time you use.
Serverless computing is especially useful for data automation because it allows you to respond to events — like files arriving in a storage bucket or at a scheduled time — without having to keep a server running all the time. Lambda functions are small, focused pieces of code that can be triggered by many AWS services, making them a key building block for automated data pipelines.
For data engineering specifically, this can save both money and operational effort. Many data workloads are bursty: a file lands in S3, a validation check runs, a routing function starts a Glue job, or a scheduled task performs a quick health check. If you keep a traditional server running for those tasks, you pay for idle time between events. With Lambda, you pay only for the short execution time you actually use, which is often much cheaper for ingestion, validation, orchestration, and monitoring steps. Operationally, serverless also removes work like patching hosts, managing cron servers, planning capacity for occasional spikes, and keeping custom daemons alive just to wait for the next event.
In this lesson, you will write, deploy, and test your very first Lambda function. You will also learn how to view your function’s output in AWS CloudWatch Logs.
At the heart of every AWS Lambda function is a special function called the handler. In Python, this is usually named lambda_handler, but you can choose any name as long as you tell AWS which function to use. The handler is the entry point for your code — it's the function that AWS will call whenever your Lambda is triggered.
The handler function always takes two arguments: event and context. Let's break down what each one does:
The event parameter contains information about what triggered the function. This is where you'll find the data you need to process. For example:
- If a file was uploaded to S3, the event will include the bucket name and file key
- If your Lambda was triggered by an API call, the event will contain the request parameters
- If it ran on a schedule, the event might contain timing information
The context parameter provides information about the runtime environment itself, such as:
- How much memory is available to your function
- How much time is left before the function times out
- The request ID for tracking and debugging
For most data automation tasks, you will work primarily with the event parameter since it contains the actual data and triggers you need to process. The context is useful for advanced scenarios like monitoring execution time or handling function limits.
With these basics in place, you're ready to write Lambda functions that can receive data from various AWS services and process it automatically.
Start by creating a file named lambda_function.py. This file will contain the code that AWS Lambda will execute. The function you define, called lambda_handler, is the entry point. AWS will automatically call this function whenever your Lambda is triggered. The function receives two arguments: event (the input data or trigger information) and context (information about the runtime environment).
In this example, the function checks if the event dictionary contains a key called name. If it does, it uses that value; otherwise, it defaults to "World". The greeting is printed, which means it will be sent to CloudWatch Logs automatically by AWS Lambda.
Before you can deploy your function to AWS Lambda, you need to package it. If your function only uses the Python standard library, you can simply zip your .py file. If you have additional dependencies, you would need to include them in the zip file as well.
This command creates a zip archive called function.zip containing your function code. AWS Lambda requires your code to be uploaded as a zip file.
To deploy your function, you need to use the AWS CLI. Make sure you have the AWS CLI installed and configured with your credentials (aws configure). You also need an IAM role with permissions for Lambda execution, such as the AWSLambdaBasicExecutionRole.
The following command creates a new Lambda function:
--function-nameis the name you want to give your Lambda function.--runtimespecifies the Python version.--roleis the ARN of the IAM role that Lambda will use to execute your function.--handlertells AWS which function to call (<filename>.<function_name>).--zip-fileis the path to your zipped code.
If you make changes to your code later, you can update the function using:
This command uploads the new version of your code to AWS Lambda.
Once your function is deployed, you can test it by invoking it from the command line. The following command sends a test event to your Lambda function:
- The
--payloadflag sends a JSON object as the event to your function. - The
--cli-binary-format raw-in-base64-outflag is required for AWS CLI v2 when passing inline JSON. In AWS CLI v2, payload parameters are handled more strictly as binary content, so without this flag the CLI can misinterpret inline JSON and Lambda may receive an invalid payload. This flag tells the CLI to send the JSON exactly as written instead of expecting base64-encoded binary input. - The output (the return value from your function) is saved to
output.json.
This simulates an event triggering your Lambda and allows you to see the response.
Every time your Lambda function runs, AWS automatically captures anything you print or log and sends it to CloudWatch Logs. This is useful for debugging and monitoring your function.
To view your logs in the AWS Console:
- Open the CloudWatch service.
- Go to Log groups.
- Find the log group named
/aws/lambda/my-first-lambda. - Click on the most recent log stream to see the output from your function, including any print statements.
If you prefer to use the AWS CLI, you can retrieve the latest log stream name with:
Copy the logStreamName from the output, then fetch the log events with:
Replace <log-stream-name> with the value you copied. This will display the logs generated by your Lambda function, including the greeting message you printed.
By following these steps, you have written, packaged, deployed, and tested a Lambda function, and learned how to view its output in CloudWatch Logs. This workflow is the foundation for building and monitoring automated data pipelines on AWS.
The examples in this course use print(...) because it keeps the code short and makes CloudWatch output easy to follow while learning. In production, prefer Python's logging module so you can control log levels and move toward structured logs more easily. As the course becomes more event-driven, keep two additional habits in mind: first, design S3-triggered workflows to be idempotent because duplicate events can happen; second, add lightweight retry and backoff around AWS SDK calls when you integrate with downstream services such as Glue.
In this lesson, you wrote a simple Lambda function in Python, deployed it to AWS Lambda using the AWS CLI, and invoked the function. You learned how to view its output in CloudWatch Logs, both through the AWS Console and the CLI. All logs from your function are automatically sent to CloudWatch without any extra code needed for this integration. You are now ready to automate data tasks and monitor your Lambda functions on AWS.
