Step 3: Configure security settings

IAM role

The crawler assumes this role. It must have permissions similar to the AWS managed policy AWSGlueServiceRole. For Amazon S3 and DynamoDB sources, it must also have permissions to access the data store. If the crawler reads Amazon S3 data encrypted with AWS Key Management Service (AWS KMS), then the role must have decrypt permissions on the AWS KMS key.

For an Amazon S3 data store, additional permissions attached to the role would be similar to the following:


{
   "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Action": [
              "s3:GetObject",
              "s3:PutObject"
          ],
          "Resource": [
              "arn:aws:s3:::bucket/object*"
          ]
        }
    ]
}

For an Amazon DynamoDB data store, additional permissions attached to the role would be similar to the following:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DescribeTable",
        "dynamodb:Scan"
      ],
      "Resource": [
        "arn:aws:dynamodb:region:account-id:table/table-name*"
      ]
    }
  ]
}

In order to add your own JDBC driver, additional permissions need to be added.

Grant permissions for the following job actions: CreateJob, DeleteJob, GetJob, GetJobRun, StartJobRun.
Grant permissions for Amazon S3 actions: s3:DeleteObjects, s3:GetObject, s3:ListBucket, s3:PutObject.

Note
The s3:ListBucket is not needed if the Amazon S3 bucket policy is disabled.
Grant service principal access to bucket/folder in the Amazon S3 policy.

Example Amazon S3 policy:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/driver-parent-folder/driver.jar",
                "arn:aws:s3:::bucket-name"
            ]
        }
    ]
}

AWS Glue creates the following folders (_crawler and _glue_job_crawler at the same level as the JDBC driver in your Amazon S3 bucket. For example, if the driver path is <s3-path/driver_folder/driver.jar>, then the following folders will be created if they do not already exist:

<s3-path/driver_folder/_crawler>
<s3-path/driver_folder/_glue_job_crawler>

Optionally, you can add a security configuration to a crawler to specify at-rest encryption options.

For more information, see Step 2: Create an IAM role for AWS Glue and Identity and access management for AWS Glue.

Lake Formation configuration - optional

Allow the crawler to use Lake Formation credentials for crawling the data source.

Checking Use Lake Formation credentials for crawling S3 data source will allow the crawler to use Lake Formation credentials for crawling the data source. If the data source belongs to another account, you must provide the registered account ID. Otherwise, the crawler will crawl only those data sources associated to the account. Only applicable to Amazon S3 and Data Catalog data sources.

Security configuration - optional

Settings include security configurations. For more information, see the following:

Encrypting data written by AWS Glue

Note

Once a security configuration has been set on a crawler, you can change, but you cannot remove it. To lower the level of security on a crawler, explicitly set the security feature to DISABLED within your configuration, or create a new crawler.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Choose data sources and classifiers

Set output and scheduling