CreateCrawlerCommand

Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets field, the jdbcTargets field, or the DynamoDBTargets field.

Example Syntax

Use a bare-bones client and the command you need to make an API call.

import { GlueClient, CreateCrawlerCommand } from "@aws-sdk/client-glue"; // ES Modules import
// const { GlueClient, CreateCrawlerCommand } = require("@aws-sdk/client-glue"); // CommonJS import
const client = new GlueClient(config);
const input = { // CreateCrawlerRequest
  Name: "STRING_VALUE", // required
  Role: "STRING_VALUE", // required
  DatabaseName: "STRING_VALUE",
  Description: "STRING_VALUE",
  Targets: { // CrawlerTargets
    S3Targets: [ // S3TargetList
      { // S3Target
        Path: "STRING_VALUE",
        Exclusions: [ // PathList
          "STRING_VALUE",
        ],
        ConnectionName: "STRING_VALUE",
        SampleSize: Number("int"),
        EventQueueArn: "STRING_VALUE",
        DlqEventQueueArn: "STRING_VALUE",
      },
    ],
    JdbcTargets: [ // JdbcTargetList
      { // JdbcTarget
        ConnectionName: "STRING_VALUE",
        Path: "STRING_VALUE",
        Exclusions: [
          "STRING_VALUE",
        ],
        EnableAdditionalMetadata: [ // EnableAdditionalMetadata
          "COMMENTS" || "RAWTYPES",
        ],
      },
    ],
    MongoDBTargets: [ // MongoDBTargetList
      { // MongoDBTarget
        ConnectionName: "STRING_VALUE",
        Path: "STRING_VALUE",
        ScanAll: true || false,
      },
    ],
    DynamoDBTargets: [ // DynamoDBTargetList
      { // DynamoDBTarget
        Path: "STRING_VALUE",
        scanAll: true || false,
        scanRate: Number("double"),
      },
    ],
    CatalogTargets: [ // CatalogTargetList
      { // CatalogTarget
        DatabaseName: "STRING_VALUE", // required
        Tables: [ // CatalogTablesList // required
          "STRING_VALUE",
        ],
        ConnectionName: "STRING_VALUE",
        EventQueueArn: "STRING_VALUE",
        DlqEventQueueArn: "STRING_VALUE",
      },
    ],
    DeltaTargets: [ // DeltaTargetList
      { // DeltaTarget
        DeltaTables: [
          "STRING_VALUE",
        ],
        ConnectionName: "STRING_VALUE",
        WriteManifest: true || false,
        CreateNativeDeltaTable: true || false,
      },
    ],
    IcebergTargets: [ // IcebergTargetList
      { // IcebergTarget
        Paths: [
          "STRING_VALUE",
        ],
        ConnectionName: "STRING_VALUE",
        Exclusions: [
          "STRING_VALUE",
        ],
        MaximumTraversalDepth: Number("int"),
      },
    ],
    HudiTargets: [ // HudiTargetList
      { // HudiTarget
        Paths: "<PathList>",
        ConnectionName: "STRING_VALUE",
        Exclusions: "<PathList>",
        MaximumTraversalDepth: Number("int"),
      },
    ],
  },
  Schedule: "STRING_VALUE",
  Classifiers: [ // ClassifierNameList
    "STRING_VALUE",
  ],
  TablePrefix: "STRING_VALUE",
  SchemaChangePolicy: { // SchemaChangePolicy
    UpdateBehavior: "LOG" || "UPDATE_IN_DATABASE",
    DeleteBehavior: "LOG" || "DELETE_FROM_DATABASE" || "DEPRECATE_IN_DATABASE",
  },
  RecrawlPolicy: { // RecrawlPolicy
    RecrawlBehavior: "CRAWL_EVERYTHING" || "CRAWL_NEW_FOLDERS_ONLY" || "CRAWL_EVENT_MODE",
  },
  LineageConfiguration: { // LineageConfiguration
    CrawlerLineageSettings: "ENABLE" || "DISABLE",
  },
  LakeFormationConfiguration: { // LakeFormationConfiguration
    UseLakeFormationCredentials: true || false,
    AccountId: "STRING_VALUE",
  },
  Configuration: "STRING_VALUE",
  CrawlerSecurityConfiguration: "STRING_VALUE",
  Tags: { // TagsMap
    "<keys>": "STRING_VALUE",
  },
};
const command = new CreateCrawlerCommand(input);
const response = await client.send(command);
// {};

CreateCrawlerCommand Input

See CreateCrawlerCommandInput for more details

Parameter
Type
Description
Name
Required
string | undefined

Name of the new crawler.

Role
Required
string | undefined

The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources.

Targets
Required
CrawlerTargets | undefined

A list of collection of targets to crawl.

Classifiers
string[] | undefined

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

Configuration
string | undefined

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options .

CrawlerSecurityConfiguration
string | undefined

The name of the SecurityConfiguration structure to be used by this crawler.

DatabaseName
string | undefined

The Glue database where results are written, such as: arn:aws:daylight:us-east-1::database/sometable/*.

Description
string | undefined

A description of the new crawler.

LakeFormationConfiguration
LakeFormationConfiguration | undefined

Specifies Lake Formation configuration settings for the crawler.

LineageConfiguration
LineageConfiguration | undefined

Specifies data lineage configuration settings for the crawler.

RecrawlPolicy
RecrawlPolicy | undefined

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Schedule
string | undefined

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers . For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

SchemaChangePolicy
SchemaChangePolicy | undefined

The policy for the crawler's update and deletion behavior.

TablePrefix
string | undefined

The table prefix used for catalog tables that are created.

Tags
Record<string, string> | undefined

The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in Glue, see Amazon Web Services Tags in Glue  in the developer guide.

CreateCrawlerCommand Output

Parameter
Type
Description
$metadata
Required
ResponseMetadata
Metadata pertaining to this request.

Throws

Name
Fault
Details
AlreadyExistsException
client

A resource to be created or added already exists.

InvalidInputException
client

The input provided was not valid.

OperationTimeoutException
client

The operation timed out.

ResourceNumberLimitExceededException
client

A resource numerical limit was exceeded.

GlueServiceException
Base exception class for all service exceptions from Glue service.