Amazon S3 - Amazon AppFlow

Amazon S3

The following are the requirements and connection instructions for using Amazon Simple Storage Service (Amazon S3) with Amazon AppFlow.

Note

You can use Amazon S3 as a source or a destination.

Requirements

  • Your S3 buckets must be in the same AWS Region as your console and flow.

  • If you use Amazon S3 as the data source, you must place your source files inside a folder in your S3 bucket.

  • If your source files are in CSV format, each file must have a header row. The header row is a series of field names separated by commas.

  • Each source file should not exceed 125 MB in size. However, you can upload multiple CSV/JSONL files in the source location, and Amazon AppFlow will read from all of them to transfer data over a single flow run. You can check for any applicable destination data transfer limits in Quotas for Amazon AppFlow.

  • Amazon AppFlow does not support cross-account access to S3 buckets in order to prevent unauthorized access and potential security concerns.

Connection instructions

To use Amazon S3 as a source or destination while creating a flow
  1. Sign in to the AWS Management Console and open the Amazon AppFlow console at https://console.aws.amazon.com/appflow/.

  2. Choose Create flow.

  3. For Flow details, enter a name and description for the flow.

  4. (Optional) To use a customer managed CMK instead of the default AWS managed CMK, choose Data encryption, Customize encryption settings and then choose an existing CMK or create a new one.

  5. (Optional) To add a tag, choose Tags, Add tag and then enter the key name and value.

  6. Choose Next.

  7. Choose Amazon S3 from the Source name or Destination name dropdown list.

  8. Under Bucket details, select the S3 bucket that you're retrieving from or adding to. You can specify a prefix, which is equivalent to specifying a folder within the S3 bucket where your source files are located or records are to be written to the destination.

Bucket details form with fields for choosing S3 bucket and entering bucket prefix.

Now that you are connected to your S3 bucket, you can continue with the flow creation steps as described in Creating flows in Amazon AppFlow.

Tip

If you aren’t connected successfully, ensure that you have followed the instructions in the Requirements section above.

Notes

  • When you use Amazon S3 as a source, you can run schedule-triggered flows at a maximum frequency of one flow run per minute.

  • When you use Amazon S3 as a destination, the following additional settings are available.

Setting name Description

AWS Glue Data Catalog settings

Catalog the data that you transfer in the AWS Glue Data Catalog. When you catalog your data, you make it easier to discover and access with AWS analytics and machine learning services. For more information, see Cataloging the data output from an Amazon AppFlow flow.

Data format preference

  • You can specify your preferred file format for the input file(s). The following options are currently available: CSV, JSONL

  • You can specify your preferred file format for the exported records. The following options are currently available: JSONL (default), CSV, or Apache Parquet.

Note

If you choose Parquet as the format for your destination file in Amazon S3, the option to aggregate all records into one file per flow run will not be available. When choosing Parquet, Amazon AppFlow will write the output as string, and not declare the data types as defined by the source.

Filename preference

  • You can choose to add a timestamp to the filename.

  • Your filename will end with the file creation timestamp in YYYY-MM-DDThh:mm:sss format.

  • The creation date is in UTC time.

Partition and aggregation settings

Organize the data that you transfer into partitions and files of a specified size. These settings can help you optimize query performance for applications that access the data. For more information, see Partitioning and aggregating data output from an Amazon AppFlow flow.

Supported destinations

When you create a flow that uses Amazon S3 as the data source, you can set the destination to any of the following connectors:

  • Amazon Connect

  • Amazon Honeycode

  • Amazon Redshift

  • Amazon S3

  • Marketo

  • Salesforce

  • SAP OData

  • Snowflake

  • Upsolver

  • Zendesk

You can also set the destination to any custom connectors that you create with the Amazon AppFlow Custom Connector SDKs for Python or Java . You can download these SDKs from GitHub.

Related resources