Installing Python dependencies
A Python dependency is any package or distribution that is not included in the Apache Airflow base install for your Apache Airflow version on your Amazon Managed Workflows for Apache Airflow environment.
This topic describes the steps to install Apache Airflow Python dependencies on your Amazon MWAA environment using a requirements.txt
file in your Amazon S3 bucket.
Contents
Prerequisites
You'll need the following before you can complete the steps on this page.
-
Permissions — Your AWS account must have been granted access by your administrator to the AmazonMWAAFullConsoleAccess access control policy for your environment. In addition, your Amazon MWAA environment must be permitted by your execution role to access the AWS resources used by your environment.
-
Access — If you require access to public repositories to install dependencies directly on the web server, your environment must be configured with public network web server access. For more information, see Apache Airflow access modes.
-
Amazon S3 configuration — The Amazon S3 bucket used to store your DAGs, custom plugins in
plugins.zip
, and Python dependencies inrequirements.txt
must be configured with Public Access Blocked and Versioning Enabled.
How it works
On Amazon MWAA, you install all Python dependencies by uploading a requirements.txt
file to your Amazon S3 bucket, then specifying the version of the file on the Amazon MWAA console each time you update the file. Amazon MWAA runs pip3 install -r requirements.txt
to install the Python dependencies on the Apache Airflow scheduler and each of the workers.
To run Python dependencies on your environment, you must do three things:
-
Create a
requirements.txt
file locally. -
Upload the local
requirements.txt
to your Amazon S3 bucket. -
Specify the version of this file in the Requirements file field on the Amazon MWAA console.
Note
If this is the first time you're creating and uploading a requirements.txt
to your Amazon S3 bucket, you also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.
Python dependencies overview
You can install Apache Airflow extras and other Python dependencies from the Python Package Index (PyPi.org), Python wheels (.whl
), or Python dependencies hosted on a private PyPi/PEP-503 Compliant Repo on your environment.
Python dependencies location and size limits
The Apache Airflow Scheduler and the Workers look for
the packages in the requirements.txt
file and the packages are
installed on the environment at /usr/local/airflow/.local/bin
.
-
Size limit. We recommend a
requirements.txt
file that references libraries whose combined size is less than than 1 GB. The more libraries Amazon MWAA needs to install, the longer the startup time on an environment. Although Amazon MWAA doesn't limit the size of installed libraries explicitly, if dependencies can't be installed within ten minutes, the Fargate service will time-out and attempt to rollback the environment to a stable state.
Creating a requirements.txt file
The following steps describe the steps we recommend to create a requirements.txt file locally.
Step one: Test Python dependencies using the Amazon MWAA CLI utility
-
The command line interface (CLI) utility replicates an Amazon Managed Workflows for Apache Airflow environment locally.
-
The CLI builds a Docker container image locally that’s similar to an Amazon MWAA production image. This allows you to run a local Apache Airflow environment to develop and test DAGs, custom plugins, and dependencies before deploying to Amazon MWAA.
-
To run the CLI, see the aws-mwaa-local-runner
on GitHub.
Step two: Create the requirements.txt
The following section describes how to specify Python dependencies from the Python Package Indexrequirements.txt
file.
Uploading requirements.txt
to Amazon S3
You can use the Amazon S3 console or the AWS Command Line Interface (AWS CLI) to upload a requirements.txt
file to your Amazon S3 bucket.
Using the AWS CLI
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. To complete the steps on this page, you need the following:
To upload using the AWS CLI
-
Use the following command to list all of your Amazon S3 buckets.
aws s3 ls
-
Use the following command to list the files and folders in the Amazon S3 bucket for your environment.
aws s3 ls s3://
YOUR_S3_BUCKET_NAME
-
The following command uploads a
requirements.txt
file to an Amazon S3 bucket.aws s3 cp requirements.txt s3://
YOUR_S3_BUCKET_NAME
/requirements.txt
Using the Amazon S3 console
The Amazon S3 console is a web-based user interface that allows you to create and manage the resources in your Amazon S3 bucket.
To upload using the Amazon S3 console
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.
-
Choose Upload.
-
Choose Add file.
-
Select the local copy of your
requirements.txt
, choose Upload.
Installing Python dependencies on your environment
This section describes how to install the dependencies you uploaded to your Amazon S3 bucket by specifying the path to the requirements.txt file, and specifying the version of the requirements.txt file each time it's updated.
Specifying the path to requirements.txt
on the Amazon MWAA console (the first time)
If this is the first time you're creating and uploading a requirements.txt
to your Amazon S3 bucket, you also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Choose Edit.
-
On the DAG code in Amazon S3 pane, choose Browse S3 next to the Requirements file - optional field.
-
Select the
requirements.txt
file on your Amazon S3 bucket. -
Choose Choose.
-
Choose Next, Update environment.
You can begin using the new packages immediately after your environment finishes updating.
Specifying the requirements.txt
version on the Amazon MWAA console
You need to specify the version of your requirements.txt
file on the Amazon MWAA console each time you upload a new version of your requirements.txt
in your Amazon S3 bucket.
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Choose Edit.
-
On the DAG code in Amazon S3 pane, choose a
requirements.txt
version in the dropdown list. -
Choose Next, Update environment.
You can begin using the new packages immediately after your environment finishes updating.
Viewing logs for your requirements.txt
You can view Apache Airflow logs for the Scheduler scheduling your workflows and parsing your dags
folder. The following steps describe how to open the log group for the Scheduler on the Amazon MWAA console, and view Apache Airflow logs on the CloudWatch Logs console.
To view logs for a requirements.txt
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Choose the Airflow scheduler log group on the Monitoring pane.
-
Choose the
requirements_install_ip
log in Log streams. -
You should see the list of packages that were installed on the environment at
/usr/local/airflow/.local/bin
. For example:Collecting appdirs==1.4.4 (from -r /usr/local/airflow/.local/bin (line 1)) Downloading https://files.pythonhosted.org/packages/3b/00/2344469e2084fb28kjdsfiuyweb47389789vxbmnbjhsdgf5463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl Collecting astroid==2.4.2 (from -r /usr/local/airflow/.local/bin (line 2))
-
Review the list of packages and whether any of these encountered an error during installation. If something went wrong, you may see an error similar to the following:
2021-03-05T14:34:42.731-07:00 No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/.local/bin (line 4)) No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/.local/bin (line 4))
What's next?
-
Test your DAGs, custom plugins, and Python dependencies locally using the aws-mwaa-local-runner
on GitHub.