Create a serverless file-processing app
One of the most common use cases for Lambda is to perform file processing tasks. For example, you might use a Lambda function to automatically create PDF files from HTML files or images, or to create thumbnails when a user uploads an image.
In this example, you create an app which automatically encrypts PDF files when they are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. To implement this app, you create the following resources:
-
An S3 bucket for users to upload PDF files to
-
A Lambda function in Python which reads the uploaded file and creates an encrypted, password-protected version of it
-
A second S3 bucket for Lambda to save the encrypted file in
You also create an AWS Identity and Access Management (IAM) policy to give your Lambda function permission to perform read and write operations on your S3 buckets.
Tip
If you’re brand new to Lambda, we recommend that you carry out the tutorial Create your first Lambda function before creating this example app.
You can deploy your app manually by creating and configuring resources with the AWS Management Console or the AWS Command Line Interface (AWS CLI). You can also deploy the app by using the AWS Serverless Application Model (AWS SAM). AWS SAM is an infrastructure as code (IaC) tool. With IaC, you don’t create resources manually, but define them in code and then deploy them automatically.
If you want to learn more about using Lambda with IaC before deploying this example app, see Using Lambda with infrastructure as code (IaC).
Prerequisites
Before you can create the example app, make sure you have the required command line tools installed.
-
AWS CLI
You can manually deploy the resources for your app using either the AWS Management Console or the AWS CLI. To use the CLI, install it by following the installation instructions in the AWS Command Line Interface User Guide.
-
AWS SAM CLI
If you want to deploy the example app using AWS SAM, you need to install both the AWS CLI and the AWS SAM CLI. To install the AWS SAM CLI, follow the installation instructions in the AWS SAM User Guide.
-
pytest module
After you’ve deployed your app, you can test it using an automated Python test script that we provide. To use this script, install the
pytest
package in you local development environment by running the following command:pip install pytest
To deploy the app using AWS SAM, Docker
Downloading the example app files
To create and test the example app, you create the following files in your project directory:
-
lambda_function.py
- the Python function code for the Lambda function that performs the file encryption -
requirements.txt
- a manifest file defining the dependencies that your Python function code requires -
template.yaml
- an AWS SAM template you can use to deploy the app -
test_pdf_encrypt.py
- a test script you can use to automatically test your application -
pytest.ini
- a configuration file for the the test script
Expand the following sections to view the code and to learn more about the role of each file in creating and testing your app. To create the
files on your local machine, either copy and paste the code below, or download the files from the aws-lambda-developer-guide
GitHub repo
Copy and paste the following code into a file named lambda_function.py
.
from pypdf import PdfReader, PdfWriter import uuid import os from urllib.parse import unquote_plus import boto3 # Create the S3 client to download and upload objects from S3 s3_client = boto3.client('s3') def lambda_handler(event, context): # Iterate over the S3 event object and get the key for all uploaded files for record in event['Records']: bucket = record['s3']['bucket']['name'] key = unquote_plus(record['s3']['object']['key']) # Decode the S3 object key to remove any URL-encoded characters download_path = f'/tmp/{uuid.uuid4()}.pdf' # Create a path in the Lambda tmp directory to save the file to upload_path = f'/tmp/converted-{uuid.uuid4()}.pdf' # Create another path to save the encrypted file to # If the file is a PDF, encrypt it and upload it to the destination S3 bucket if key.lower().endswith('.pdf'): s3_client.download_file(bucket, key, download_path) encrypt_pdf(download_path, upload_path) encrypted_key = add_encrypted_suffix(key) s3_client.upload_file(upload_path, f'{bucket}-encrypted', encrypted_key) # Define the function to encrypt the PDF file with a password def encrypt_pdf(file_path, encrypted_file_path): reader = PdfReader(file_path) writer = PdfWriter() for page in reader.pages: writer.add_page(page) # Add a password to the new PDF writer.encrypt("my-secret-password") # Save the new PDF to a file with open(encrypted_file_path, "wb") as file: writer.write(file) # Define a function to add a suffix to the original filename after encryption def add_encrypted_suffix(original_key): filename, extension = original_key.rsplit('.', 1) return f'{filename}_encrypted.{extension}'
Note
In this example code, a password for the encrypted file (my-secret-password
) is hardcoded into the
function code. In a production application, don't include sensitive information like passwords in your function code. Use
AWS Secrets Manager to securely store sensitive parameters.
The python function code contains three functions - the handler function that Lambda runs
when your function is invoked, and two separate function named add_encrypted_suffix
and encrypt_pdf
that the handler calls to perform the PDF encryption.
When your function is invoked by Amazon S3, Lambda passes a JSON formatted event argument to the function that contains details about the event that caused the invocation. In this case, the information includes name of the S3 bucket and the object keys for the uploaded files. To learn more about the format of event object for Amazon S3, see Process Amazon S3 event notifications with Lambda.
Your function then uses the AWS SDK for Python (Boto3) to download the PDF files specified in the event object to its local temporary storage directory, before
encrypting them using the pypdf
Finally, the function uses the Boto3 SDK to store the encrypted file in your S3 destination bucket.
Copy and paste the following code into a file named requirements.txt
.
boto3 pypdf
For this example, your function code has only two dependencies that aren't part of the standard Python library -
the SDK for Python (Boto3) and the pypdf
package the function uses to perform the PDF encryption.
Note
A version of the SDK for Python (Boto3) is included as part of the Lambda runtime, so your code would run without adding Boto3 to your function's deployment package. However, to maintain full control of your function's dependencies and avoid possible issues with version misalignment, best practice for Python is to include all function dependencies in your function's deployment package. See Runtime dependencies in Python to learn more.
Copy and paste the following code into a file named template.yaml
.
AWSTemplateFormatVersion: '2010-09-09' Transform: AWS::Serverless-2016-10-31 Resources: EncryptPDFFunction: Type: AWS::Serverless::Function Properties: FunctionName: EncryptPDF Architectures: [x86_64] CodeUri: ./ Handler: lambda_function.lambda_handler Runtime: python3.12 Timeout: 15 MemorySize: 256 LoggingConfig: LogFormat: JSON Policies: - AmazonS3FullAccess Events: S3Event: Type: S3 Properties: Bucket: !Ref PDFSourceBucket Events: s3:ObjectCreated:* PDFSourceBucket: Type: AWS::S3::Bucket Properties: BucketName: EXAMPLE-BUCKET EncryptedPDFBucket: Type: AWS::S3::Bucket Properties: BucketName: EXAMPLE-BUCKET-encrypted
The AWS SAM template defines the resources you create for your app. In this example, the template defines a Lambda function using the
AWS::Serverless::Function
type and two S3 buckets using the AWS::S3::Bucket
type. The bucket names specified in the
template are placeholders. Before you deploy the app using AWS SAM, you need to edit the template to rename the buckets with globally unique names that
meet the S3 bucket naming rules. This step is
explained further in Deploy the resources using AWS SAM.
The definition of the Lambda function resource configures a trigger for the function using the S3Event
event property. This
trigger causes your function to be invoked whenever an object is created in your source bucket.
The function definition also specifies an AWS Identity and Access Management (IAM) policy to be attached to the function's execution role.
The AWS managed policy
AmazonS3FullAccess
gives your function the permissions it needs to read and write objects to Amazon S3.
Copy and paste the following code into a file named test_pdf_encrypt.py
.
import boto3 import json import pytest import time import os @pytest.fixture def lambda_client(): return boto3.client('lambda') @pytest.fixture def s3_client(): return boto3.client('s3') @pytest.fixture def logs_client(): return boto3.client('logs') @pytest.fixture(scope='session') def cleanup(): # Create a new S3 client for cleanup s3_client = boto3.client('s3') yield # Cleanup code will be executed after all tests have finished # Delete test.pdf from the source bucket source_bucket = 'EXAMPLE-BUCKET' source_file_key = 'test.pdf' s3_client.delete_object(Bucket=source_bucket, Key=source_file_key) print(f"\nDeleted {source_file_key} from {source_bucket}") # Delete test_encrypted.pdf from the destination bucket destination_bucket = 'EXAMPLE-BUCKET-encrypted' destination_file_key = 'test_encrypted.pdf' s3_client.delete_object(Bucket=destination_bucket, Key=destination_file_key) print(f"Deleted {destination_file_key} from {destination_bucket}") @pytest.mark.order(1) def test_source_bucket_available(s3_client): s3_bucket_name = 'EXAMPLE-BUCKET' file_name = 'test.pdf' file_path = os.path.join(os.path.dirname(__file__), file_name) file_uploaded = False try: s3_client.upload_file(file_path, s3_bucket_name, file_name) file_uploaded = True except: print("Error: couldn't upload file") assert file_uploaded, "Could not upload file to S3 bucket" @pytest.mark.order(2) def test_lambda_invoked(logs_client): # Wait for a few seconds to make sure the logs are available time.sleep(5) # Get the latest log stream for the specified log group log_streams = logs_client.describe_log_streams( logGroupName='/aws/lambda/EncryptPDF', orderBy='LastEventTime', descending=True, limit=1 ) latest_log_stream_name = log_streams['logStreams'][0]['logStreamName'] # Retrieve the log events from the latest log stream log_events = logs_client.get_log_events( logGroupName='/aws/lambda/EncryptPDF', logStreamName=latest_log_stream_name ) success_found = False for event in log_events['events']: message = json.loads(event['message']) status = message.get('record', {}).get('status') if status == 'success': success_found = True break assert success_found, "Lambda function execution did not report 'success' status in logs." @pytest.mark.order(3) def test_encrypted_file_in_bucket(s3_client): # Specify the destination S3 bucket and the expected converted file key destination_bucket = 'EXAMPLE-BUCKET-encrypted' converted_file_key = 'test_encrypted.pdf' try: # Attempt to retrieve the metadata of the converted file from the destination S3 bucket s3_client.head_object(Bucket=destination_bucket, Key=converted_file_key) except s3_client.exceptions.ClientError as e: # If the file is not found, the test will fail pytest.fail(f"Converted file '{converted_file_key}' not found in the destination bucket: {str(e)}") def test_cleanup(cleanup): # This test uses the cleanup fixture and will be executed last pass
The automated test script executes three test functions to confirm correct operation of your app:
-
The test
test_source_bucket_available
confirms that your source bucket has been successfully created by uploading a test PDF file to the bucket. -
The test
test_lambda_invoked
interrogates the latest CloudWatch Logs log stream for your function to confirm that when you uploaded the test file, your Lambda function ran and reported success. -
The test
test_encrypted_file_in_bucket
confirms that your destination bucket contains the encryptedtest_encrypted.pdf
file.
After all these tests have run, the script runs an additional cleanup step to delete the test.pdf
and test_encrypted.pdf
files from
both your source and destination buckets.
As with the AWS SAM template, the bucket names specified in this file are placeholders. Before running the test, you need to edit this file with your app's real bucket names. This step is explained further in Testing the app with the automated script
Copy and paste the following code into a file named pytest.ini
.
[pytest] markers = order: specify test execution order
This is needed to specify the order in which the tests in the test_pdf_encrypt.py
script run.
Deploying the app
You can create and deploy the resources for this example app either manually or by using AWS SAM. In a production environment, we recommend that you use an IaC tool like AWS SAM to quickly and repeatably deploy whole serverless applications without using manual processes.
For this example, follow the console or AWS CLI instructions to learn how to configure each AWS resource separately, or skip ahead to Deploy the resources using AWS SAM to quickly deploy the app using a few CLI commands.
Deploy the resources manually
To deploy your app manually, you carry out the following steps:
-
Create source and destination Amazon S3 buckets
-
Create a Lambda function that encrypts a PDF file and saves the encrypted version to an S3 bucket
-
Configure a Lambda trigger that invokes your function when objects are uploaded to your source bucket
Follow the instructions in the following paragraphs to create and configure your resources.
Create two S3 buckets
First create two S3 buckets. The first bucket is the source bucket you will upload your PDF files to. The second bucket is used by Lambda to save the encrypted file when you invoke your function.
Create an execution role (AWS CLI only)
An execution role is an IAM role that grants a Lambda function permission to access AWS services and resources. When you create a function
using the Lambda console, Lambda automatically creates an execution role. You only need to create a role manually if you choose to deploy the app
using the AWS CLI. To give your function read and write access to Amazon S3, you attach the
AWS managed policy
AmazonS3FullAccess
.
Create the function deployment package
To create your function, you create a deployment package containing your function code and its dependencies. For this application, your function code uses a separate library for the PDF encryption.
To create the deployment package
-
Navigate to the project directory containing the
lambda_function.py
andrequirements.txt
files you created or downloaded from GitHub earlier and create a new directory namedpackage
. -
Install the dependencies specified in the
requirements.txt
file in yourpackage
directory by running the following command.pip install -r requirements.txt --target ./package/
-
Create a .zip file containing your application code and its dependencies. In Linux or MacOS, run the following commands from your command line interface.
cd package zip -r ../lambda_function.zip . cd .. zip lambda_function.zip lambda_function.py
In Windows, use your preferred zip tool to create the
lambda_function.zip
file. Make sure that yourlambda_function.py
file and the folders containing your dependencies are all at the root of the .zip file.
You can also create your deployment package using a Python virtual environment. See Working with .zip file archives for Python Lambda functions
Create the Lambda function
You now use the deployment package you created in the previous step to deploy your Lambda function.
Configure an Amazon S3 trigger to invoke the function
For your Lambda function to run when you upload a file to your source bucket, you need to configure a trigger for your function. You can configure the Amazon S3 trigger using either the console or the AWS CLI.
Important
This procedure configures the S3 bucket to invoke your function every time that an object is created in the bucket. Be sure to
configure this only on the source bucket. If your Lambda function creates objects in the same bucket that invokes it, your function can be
invoked continuously in a loop
Deploy the resources using AWS SAM
To deploy the example app using the AWS SAM CLI, carry out the following steps.
Make sure that you have
installed the latest version of the
CLI and that Docker
-
Edit the
template.yaml
file to specify the name of your S3 buckets. S3 buckets must have globally unique names that meet the S3 bucket naming rules.Replace the bucket name
EXAMPLE-BUCKET
with a name of your choice consisting of lowercase letters, numbers, dots (.), and hyphens (-). For the destination bucket, replaceEXAMPLE-BUCKET-encrypted
with<source-bucket-name>-encrypted
, where<source-bucket>
is the name you chose for your source bucket. -
Run the following command from the directory in which you saved your
template.yaml
,lambda_function.py
, andrequirements.txt
files.sam build --use-container
This command gathers the build artifacts for your application and places them in the proper format and location to deploy them. Specifying the
--use-container
option builds your function inside a Lambda-like Docker container. We use it here so you don't need to have Python 3.12 installed on your local machine for the build to work.During the build process, AWS SAM looks for the Lambda function code in the location you specified with the
CodeUri
property in the template. In this case, we specified the current directory as the location (./
).If a
requirements.txt
file is present, AWS SAM uses it to gather the specified dependencies. By default, AWS SAM creates a .zip deployment package with your function code and dependencies. You can also choose to deploy your function as a container image using the PackageType property. -
To deploy your application and create the Lambda and Amazon S3 resources specified in your AWS SAM template, run the following command.
sam deploy --guided
Using the
--guided
flag means that AWS SAM will show you prompts to guide you through the deployment process. For this deployment, accept the default options by pressing Enter.
During the deployment process, AWS SAM creates the following resources in your AWS account:
-
An AWS CloudFormation stack named
sam-app
-
A Lambda function with the name
EncryptPDF
-
Two S3 buckets with the names you chose when you edited the
template.yaml
AWS SAM template file -
An IAM execution role for your function with the name format
sam-app-EncryptPDFFunctionRole-
2qGaapHFWOQ8
When AWS SAM finishes creating your resources, you should see the following message:
Successfully created/updated stack - sam-app in us-west-2
Testing the app
To test your app, you upload a PDF file to your source bucket, and confirm that Lambda creates an encrypted version of the file in your destination bucket. In this example, you can either test this manually using the console or the AWS CLI, or by using the provided test script.
For production applications, you can use traditional test methods and techniques, such as unit testing, to confirm the correct functioning of your Lambda function code. Best practice is also to conduct tests like those in the provided test script which perform integration testing with real, cloud-based resources. Integration testing in the cloud confirms that your infrastructure has been correctly deployed and that events flow between different services as expected. To learn more, see How to test serverless functions and applications.
Testing the app manually
You can test your function manually by adding a PDF file to your Amazon S3 source bucket. When you add your file to the source bucket, your Lambda function should be automatically invoked and should store an encrypted version of the file in your target bucket.
Testing the app with the automated script
To test your app using the provided test script, first ensure that the pytest
module is installed in your local environment. You can install
pytest
by running the following command:
pip install pytest
You also need to edit the code in the test_pdf_encrypt.py
file to replace the placeholder bucket names with the names of
your Amazon S3 source and destination buckets. Make the following changes to test_pdf_encrypt.py
:
-
In the
test_source_bucket_available
function, replaceEXAMPLE-BUCKET
with the name of your source bucket. -
In the
test_encrypted_file_in_bucket
function, replaceEXAMPLE-BUCKET-encrypted
with<source-bucket>-encrypted
, where<source-bucket>
is the name of your source bucket. -
In the
cleanup
function, replaceEXAMPLE-BUCKET
with the name of your source bucket, and replaceEXAMPLE-BUCKET-encrypted
with≪source-bucket>-encrypted
, where<source-bucket>
is the name of your source bucket.
To run the tests do the following:
-
Save a PDF file named
test.pdf
in the directory containing thetest_pdf_encrypt.py
andpytest.ini
files. -
Open a terminal or shell program and run the following command from the directory containing the test files.
pytest -s -v
When the test completes, you should see output like the following:
============================================================== test session starts =========================================================
platform linux -- Python 3.12.2, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/pdf_encrypt_app/.hypothesis/examples')
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
rootdir: /home/pdf_encrypt_app, configfile: pytest.ini
plugins: anyio-3.7.1, hypothesis-6.70.0, localserver-0.7.1, random-order-1.1.0
collected 4 items
test_pdf_encrypt.py::test_source_bucket_available PASSED
test_pdf_encrypt.py::test_lambda_invoked PASSED
test_pdf_encrypt.py::test_encrypted_file_in_bucket PASSED
test_pdf_encrypt.py::test_cleanup PASSED
Deleted test.pdf from EXAMPLE-BUCKET
Deleted test_encrypted.pdf from EXAMPLE-BUCKET
-encrypted
=============================================================== 4 passed in 7.32s ==========================================================
Next steps
Now you've created this example app, you can use the provided code as a basis to create other types of file-processing application. Modify the
code in the lambda_function.py
file to implement the file-processing logic for your use case.
Many typical file-processing use cases involve image processing. When using Python, the most popular image-processing libraries like
pillow
When deploying your resources with AWS SAM, you need to take some extra steps to include the right source distribution in your deployment package. Because AWS SAM won't install dependencies
for a different platform than your build machine, specifying the correct source distribution (.whl
file) in your requirements.txt
file won't work if your build machine uses an operating system or architecture that's different from the Lambda execution environment. Instead, you should do one of the following:
-
Use the
--use-container
option when runningsam build
. When you specify this option, AWS SAM downloads a container base image that's compatible with the Lambda execution environment and builds your function's deployment package in a Docker container using that image. To learn more, see Building a Lambda function inside of a provided container. -
Build your function's .zip deployment package yourself using the correct source distribution binary and save the .zip file in the directory you specify as the
CodeUri
in the AWS SAM template. To learn more about building .zip deployment packages for Python using binary distributions, see Creating a .zip deployment package with dependencies and Creating .zip deployment packages with native libraries.