- Navigation GuideYou are on a Command (operation) page with structural examples. Use the navigation breadcrumb if you would like to return to the Client landing page.
CreateCrawlerCommand
Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets
field, the jdbcTargets
field, or the DynamoDBTargets
field.
Example Syntax
Use a bare-bones client and the command you need to make an API call.
import { GlueClient, CreateCrawlerCommand } from "@aws-sdk/client-glue"; // ES Modules import
// const { GlueClient, CreateCrawlerCommand } = require("@aws-sdk/client-glue"); // CommonJS import
const client = new GlueClient(config);
const input = { // CreateCrawlerRequest
Name: "STRING_VALUE", // required
Role: "STRING_VALUE", // required
DatabaseName: "STRING_VALUE",
Description: "STRING_VALUE",
Targets: { // CrawlerTargets
S3Targets: [ // S3TargetList
{ // S3Target
Path: "STRING_VALUE",
Exclusions: [ // PathList
"STRING_VALUE",
],
ConnectionName: "STRING_VALUE",
SampleSize: Number("int"),
EventQueueArn: "STRING_VALUE",
DlqEventQueueArn: "STRING_VALUE",
},
],
JdbcTargets: [ // JdbcTargetList
{ // JdbcTarget
ConnectionName: "STRING_VALUE",
Path: "STRING_VALUE",
Exclusions: [
"STRING_VALUE",
],
EnableAdditionalMetadata: [ // EnableAdditionalMetadata
"COMMENTS" || "RAWTYPES",
],
},
],
MongoDBTargets: [ // MongoDBTargetList
{ // MongoDBTarget
ConnectionName: "STRING_VALUE",
Path: "STRING_VALUE",
ScanAll: true || false,
},
],
DynamoDBTargets: [ // DynamoDBTargetList
{ // DynamoDBTarget
Path: "STRING_VALUE",
scanAll: true || false,
scanRate: Number("double"),
},
],
CatalogTargets: [ // CatalogTargetList
{ // CatalogTarget
DatabaseName: "STRING_VALUE", // required
Tables: [ // CatalogTablesList // required
"STRING_VALUE",
],
ConnectionName: "STRING_VALUE",
EventQueueArn: "STRING_VALUE",
DlqEventQueueArn: "STRING_VALUE",
},
],
DeltaTargets: [ // DeltaTargetList
{ // DeltaTarget
DeltaTables: [
"STRING_VALUE",
],
ConnectionName: "STRING_VALUE",
WriteManifest: true || false,
CreateNativeDeltaTable: true || false,
},
],
IcebergTargets: [ // IcebergTargetList
{ // IcebergTarget
Paths: [
"STRING_VALUE",
],
ConnectionName: "STRING_VALUE",
Exclusions: [
"STRING_VALUE",
],
MaximumTraversalDepth: Number("int"),
},
],
HudiTargets: [ // HudiTargetList
{ // HudiTarget
Paths: "<PathList>",
ConnectionName: "STRING_VALUE",
Exclusions: "<PathList>",
MaximumTraversalDepth: Number("int"),
},
],
},
Schedule: "STRING_VALUE",
Classifiers: [ // ClassifierNameList
"STRING_VALUE",
],
TablePrefix: "STRING_VALUE",
SchemaChangePolicy: { // SchemaChangePolicy
UpdateBehavior: "LOG" || "UPDATE_IN_DATABASE",
DeleteBehavior: "LOG" || "DELETE_FROM_DATABASE" || "DEPRECATE_IN_DATABASE",
},
RecrawlPolicy: { // RecrawlPolicy
RecrawlBehavior: "CRAWL_EVERYTHING" || "CRAWL_NEW_FOLDERS_ONLY" || "CRAWL_EVENT_MODE",
},
LineageConfiguration: { // LineageConfiguration
CrawlerLineageSettings: "ENABLE" || "DISABLE",
},
LakeFormationConfiguration: { // LakeFormationConfiguration
UseLakeFormationCredentials: true || false,
AccountId: "STRING_VALUE",
},
Configuration: "STRING_VALUE",
CrawlerSecurityConfiguration: "STRING_VALUE",
Tags: { // TagsMap
"<keys>": "STRING_VALUE",
},
};
const command = new CreateCrawlerCommand(input);
const response = await client.send(command);
// {};
CreateCrawlerCommand Input
Parameter | Type | Description |
---|
Parameter | Type | Description |
---|---|---|
Name Required | string | undefined | Name of the new crawler. |
Role Required | string | undefined | The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources. |
Targets Required | CrawlerTargets | undefined | A list of collection of targets to crawl. |
Classifiers | string[] | undefined | A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification. |
Configuration | string | undefined | Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options . |
CrawlerSecurityConfiguration | string | undefined | The name of the |
DatabaseName | string | undefined | The Glue database where results are written, such as: |
Description | string | undefined | A description of the new crawler. |
LakeFormationConfiguration | LakeFormationConfiguration | undefined | Specifies Lake Formation configuration settings for the crawler. |
LineageConfiguration | LineageConfiguration | undefined | Specifies data lineage configuration settings for the crawler. |
RecrawlPolicy | RecrawlPolicy | undefined | A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run. |
Schedule | string | undefined | A |
SchemaChangePolicy | SchemaChangePolicy | undefined | The policy for the crawler's update and deletion behavior. |
TablePrefix | string | undefined | The table prefix used for catalog tables that are created. |
Tags | Record<string, string> | undefined | The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in Glue, see Amazon Web Services Tags in Glue in the developer guide. |
CreateCrawlerCommand Output
Parameter | Type | Description |
---|
Parameter | Type | Description |
---|---|---|
$metadata Required | ResponseMetadata | Metadata pertaining to this request. |
Throws
Name | Fault | Details |
---|
Name | Fault | Details |
---|---|---|
AlreadyExistsException | client | A resource to be created or added already exists. |
InvalidInputException | client | The input provided was not valid. |
OperationTimeoutException | client | The operation timed out. |
ResourceNumberLimitExceededException | client | A resource numerical limit was exceeded. |
GlueServiceException | Base exception class for all service exceptions from Glue service. |