Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

SDK for PHP 3.x

Client: Aws\GlueDataBrew\GlueDataBrewClient
Service ID: databrew
Version: 2017-07-25

This page describes the parameters and results for the operations of the AWS Glue DataBrew (2017-07-25), and shows how to use the Aws\GlueDataBrew\GlueDataBrewClient object to call the described operations. This documentation is specific to the 2017-07-25 API version of the service.

Operation Summary

Each of the following operations can be created from a client using $client->getCommand('CommandName'), where "CommandName" is the name of one of the following operations. Note: a command is a value that encapsulates an operation and the parameters used to create an HTTP request.

You can also create and send a command immediately using the magic methods available on a client object: $client->commandName(/* parameters */). You can send the command asynchronously (returning a promise) by appending the word "Async" to the operation name: $client->commandNameAsync(/* parameters */).

BatchDeleteRecipeVersion ( array $params = [] )
Deletes one or more versions of a recipe at a time.
CreateDataset ( array $params = [] )
Creates a new DataBrew dataset.
CreateProfileJob ( array $params = [] )
Creates a new job to analyze a dataset and create its data profile.
CreateProject ( array $params = [] )
Creates a new DataBrew project.
CreateRecipe ( array $params = [] )
Creates a new DataBrew recipe.
CreateRecipeJob ( array $params = [] )
Creates a new job to transform input data, using steps defined in an existing Glue DataBrew recipe
CreateRuleset ( array $params = [] )
Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.
CreateSchedule ( array $params = [] )
Creates a new schedule for one or more DataBrew jobs.
DeleteDataset ( array $params = [] )
Deletes a dataset from DataBrew.
DeleteJob ( array $params = [] )
Deletes the specified DataBrew job.
DeleteProject ( array $params = [] )
Deletes an existing DataBrew project.
DeleteRecipeVersion ( array $params = [] )
Deletes a single version of a DataBrew recipe.
DeleteRuleset ( array $params = [] )
Deletes a ruleset.
DeleteSchedule ( array $params = [] )
Deletes the specified DataBrew schedule.
DescribeDataset ( array $params = [] )
Returns the definition of a specific DataBrew dataset.
DescribeJob ( array $params = [] )
Returns the definition of a specific DataBrew job.
DescribeJobRun ( array $params = [] )
Represents one run of a DataBrew job.
DescribeProject ( array $params = [] )
Returns the definition of a specific DataBrew project.
DescribeRecipe ( array $params = [] )
Returns the definition of a specific DataBrew recipe corresponding to a particular version.
DescribeRuleset ( array $params = [] )
Retrieves detailed information about the ruleset.
DescribeSchedule ( array $params = [] )
Returns the definition of a specific DataBrew schedule.
ListDatasets ( array $params = [] )
Lists all of the DataBrew datasets.
ListJobRuns ( array $params = [] )
Lists all of the previous runs of a particular DataBrew job.
ListJobs ( array $params = [] )
Lists all of the DataBrew jobs that are defined.
ListProjects ( array $params = [] )
Lists all of the DataBrew projects that are defined.
ListRecipeVersions ( array $params = [] )
Lists the versions of a particular DataBrew recipe, except for LATEST_WORKING.
ListRecipes ( array $params = [] )
Lists all of the DataBrew recipes that are defined.
ListRulesets ( array $params = [] )
List all rulesets available in the current account or rulesets associated with a specific resource (dataset).
ListSchedules ( array $params = [] )
Lists the DataBrew schedules that are defined.
ListTagsForResource ( array $params = [] )
Lists all the tags for a DataBrew resource.
PublishRecipe ( array $params = [] )
Publishes a new version of a DataBrew recipe.
SendProjectSessionAction ( array $params = [] )
Performs a recipe step within an interactive DataBrew session that's currently open.
StartJobRun ( array $params = [] )
Runs a DataBrew job.
StartProjectSession ( array $params = [] )
Creates an interactive session, enabling you to manipulate data in a DataBrew project.
StopJobRun ( array $params = [] )
Stops a particular run of a job.
TagResource ( array $params = [] )
Adds metadata tags to a DataBrew resource, such as a dataset, project, recipe, job, or schedule.
UntagResource ( array $params = [] )
Removes metadata tags from a DataBrew resource.
UpdateDataset ( array $params = [] )
Modifies the definition of an existing DataBrew dataset.
UpdateProfileJob ( array $params = [] )
Modifies the definition of an existing profile job.
UpdateProject ( array $params = [] )
Modifies the definition of an existing DataBrew project.
UpdateRecipe ( array $params = [] )
Modifies the definition of the LATEST_WORKING version of a DataBrew recipe.
UpdateRecipeJob ( array $params = [] )
Modifies the definition of an existing DataBrew recipe job.
UpdateRuleset ( array $params = [] )
Updates specified ruleset.
UpdateSchedule ( array $params = [] )
Modifies the definition of an existing DataBrew schedule.

Paginators

Paginators handle automatically iterating over paginated API results. Paginators are associated with specific API operations, and they accept the parameters that the corresponding API operation accepts. You can get a paginator from a client class using getPaginator($paginatorName, $operationParameters). This client supports the following paginators:

ListDatasets
ListJobRuns
ListJobs
ListProjects
ListRecipeVersions
ListRecipes
ListRulesets
ListSchedules

Operations

BatchDeleteRecipeVersion

$result = $client->batchDeleteRecipeVersion([/* ... */]);
$promise = $client->batchDeleteRecipeVersionAsync([/* ... */]);

Deletes one or more versions of a recipe at a time.

The entire request will be rejected if:

  • The recipe does not exist.

  • There is an invalid version identifier in the list of versions.

  • The version list is empty.

  • The version list size exceeds 50.

  • The version list contains duplicate entries.

The request will complete successfully, but with partial failures, if:

  • A version does not exist.

  • A version is being used by a job.

  • You specify LATEST_WORKING, but it's being used by a project.

  • The version fails to be deleted.

The LATEST_WORKING version will only be deleted if the recipe has no other versions. If you try to delete LATEST_WORKING while other versions exist (or if they can't be deleted), then LATEST_WORKING will be listed as partial failure in the response.

Parameter Syntax

$result = $client->batchDeleteRecipeVersion([
    'Name' => '<string>', // REQUIRED
    'RecipeVersions' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the recipe whose versions are to be deleted.

RecipeVersions
Required: Yes
Type: Array of strings

An array of version identifiers, for the recipe versions to be deleted. You can specify numeric versions (X.Y) or LATEST_WORKING. LATEST_PUBLISHED is not supported.

Result Syntax

[
    'Errors' => [
        [
            'ErrorCode' => '<string>',
            'ErrorMessage' => '<string>',
            'RecipeVersion' => '<string>',
        ],
        // ...
    ],
    'Name' => '<string>',
]

Result Details

Members
Errors
Type: Array of RecipeVersionErrorDetail structures

Errors, if any, that occurred while attempting to delete the recipe versions.

Name
Required: Yes
Type: string

The name of the recipe that was modified.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

CreateDataset

$result = $client->createDataset([/* ... */]);
$promise = $client->createDatasetAsync([/* ... */]);

Creates a new DataBrew dataset.

Parameter Syntax

$result = $client->createDataset([
    'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC',
    'FormatOptions' => [
        'Csv' => [
            'Delimiter' => '<string>',
            'HeaderRow' => true || false,
        ],
        'Excel' => [
            'HeaderRow' => true || false,
            'SheetIndexes' => [<integer>, ...],
            'SheetNames' => ['<string>', ...],
        ],
        'Json' => [
            'MultiLine' => true || false,
        ],
    ],
    'Input' => [ // REQUIRED
        'DataCatalogInputDefinition' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
            'TempDirectory' => [
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'DatabaseInputDefinition' => [
            'DatabaseTableName' => '<string>',
            'GlueConnectionName' => '<string>', // REQUIRED
            'QueryString' => '<string>',
            'TempDirectory' => [
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'Metadata' => [
            'SourceArn' => '<string>',
        ],
        'S3InputDefinition' => [
            'Bucket' => '<string>', // REQUIRED
            'BucketOwner' => '<string>',
            'Key' => '<string>',
        ],
    ],
    'Name' => '<string>', // REQUIRED
    'PathOptions' => [
        'FilesLimit' => [
            'MaxFiles' => <integer>, // REQUIRED
            'Order' => 'DESCENDING|ASCENDING',
            'OrderedBy' => 'LAST_MODIFIED_DATE',
        ],
        'LastModifiedDateCondition' => [
            'Expression' => '<string>', // REQUIRED
            'ValuesMap' => ['<string>', ...], // REQUIRED
        ],
        'Parameters' => [
            '<PathParameterName>' => [
                'CreateColumn' => true || false,
                'DatetimeOptions' => [
                    'Format' => '<string>', // REQUIRED
                    'LocaleCode' => '<string>',
                    'TimezoneOffset' => '<string>',
                ],
                'Filter' => [
                    'Expression' => '<string>', // REQUIRED
                    'ValuesMap' => ['<string>', ...], // REQUIRED
                ],
                'Name' => '<string>', // REQUIRED
                'Type' => 'Datetime|Number|String', // REQUIRED
            ],
            // ...
        ],
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Format
Type: string

The file format of a dataset that is created from an Amazon S3 file or folder.

FormatOptions
Type: FormatOptions structure

Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.

Input
Required: Yes
Type: Input structure

Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.

Name
Required: Yes
Type: string

The name of the dataset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

PathOptions
Type: PathOptions structure

A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this dataset.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the dataset that you created.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateProfileJob

$result = $client->createProfileJob([/* ... */]);
$promise = $client->createProfileJobAsync([/* ... */]);

Creates a new job to analyze a dataset and create its data profile.

Parameter Syntax

$result = $client->createProfileJob([
    'Configuration' => [
        'ColumnStatisticsConfigurations' => [
            [
                'Selectors' => [
                    [
                        'Name' => '<string>',
                        'Regex' => '<string>',
                    ],
                    // ...
                ],
                'Statistics' => [ // REQUIRED
                    'IncludedStatistics' => ['<string>', ...],
                    'Overrides' => [
                        [
                            'Parameters' => ['<string>', ...], // REQUIRED
                            'Statistic' => '<string>', // REQUIRED
                        ],
                        // ...
                    ],
                ],
            ],
            // ...
        ],
        'DatasetStatisticsConfiguration' => [
            'IncludedStatistics' => ['<string>', ...],
            'Overrides' => [
                [
                    'Parameters' => ['<string>', ...], // REQUIRED
                    'Statistic' => '<string>', // REQUIRED
                ],
                // ...
            ],
        ],
        'EntityDetectorConfiguration' => [
            'AllowedStatistics' => [
                [
                    'Statistics' => ['<string>', ...], // REQUIRED
                ],
                // ...
            ],
            'EntityTypes' => ['<string>', ...], // REQUIRED
        ],
        'ProfileColumns' => [
            [
                'Name' => '<string>',
                'Regex' => '<string>',
            ],
            // ...
        ],
    ],
    'DatasetName' => '<string>', // REQUIRED
    'EncryptionKeyArn' => '<string>',
    'EncryptionMode' => 'SSE-KMS|SSE-S3',
    'JobSample' => [
        'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
        'Size' => <integer>,
    ],
    'LogSubscription' => 'ENABLE|DISABLE',
    'MaxCapacity' => <integer>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'OutputLocation' => [ // REQUIRED
        'Bucket' => '<string>', // REQUIRED
        'BucketOwner' => '<string>',
        'Key' => '<string>',
    ],
    'RoleArn' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
    'ValidationConfigurations' => [
        [
            'RulesetArn' => '<string>', // REQUIRED
            'ValidationMode' => 'CHECK_ALL',
        ],
        // ...
    ],
]);

Parameter Details

Members
Configuration
Type: ProfileConfiguration structure

Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.

DatasetName
Required: Yes
Type: string

The name of the dataset that this job is to act upon.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - SSE-KMS - Server-side encryption with KMS-managed keys.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

JobSample
Type: JobSample structure

Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.

LogSubscription
Type: string

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

MaxCapacity
Type: int

The maximum number of nodes that DataBrew can use when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

The name of the job to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

OutputLocation
Required: Yes
Type: S3Location structure

Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the job that was created.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateProject

$result = $client->createProject([/* ... */]);
$promise = $client->createProjectAsync([/* ... */]);

Creates a new DataBrew project.

Parameter Syntax

$result = $client->createProject([
    'DatasetName' => '<string>', // REQUIRED
    'Name' => '<string>', // REQUIRED
    'RecipeName' => '<string>', // REQUIRED
    'RoleArn' => '<string>', // REQUIRED
    'Sample' => [
        'Size' => <integer>,
        'Type' => 'FIRST_N|LAST_N|RANDOM', // REQUIRED
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
DatasetName
Required: Yes
Type: string

The name of an existing dataset to associate this project with.

Name
Required: Yes
Type: string

A unique name for the new project. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

RecipeName
Required: Yes
Type: string

The name of an existing recipe to associate with the project.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed for this request.

Sample
Type: Sample structure

Represents the sample size and sampling type for DataBrew to use for interactive data analysis.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this project.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the project that you created.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

InternalServerException:

An internal service failure occurred.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateRecipe

$result = $client->createRecipe([/* ... */]);
$promise = $client->createRecipeAsync([/* ... */]);

Creates a new DataBrew recipe.

Parameter Syntax

$result = $client->createRecipe([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Steps' => [ // REQUIRED
        [
            'Action' => [ // REQUIRED
                'Operation' => '<string>', // REQUIRED
                'Parameters' => ['<string>', ...],
            ],
            'ConditionExpressions' => [
                [
                    'Condition' => '<string>', // REQUIRED
                    'TargetColumn' => '<string>', // REQUIRED
                    'Value' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
Description
Type: string

A description for the recipe.

Name
Required: Yes
Type: string

A unique name for the recipe. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

Steps
Required: Yes
Type: Array of RecipeStep structures

An array containing the steps to be performed by the recipe. Each recipe step consists of one recipe action and (optionally) an array of condition expressions.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this recipe.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the recipe that you created.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateRecipeJob

$result = $client->createRecipeJob([/* ... */]);
$promise = $client->createRecipeJobAsync([/* ... */]);

Creates a new job to transform input data, using steps defined in an existing Glue DataBrew recipe

Parameter Syntax

$result = $client->createRecipeJob([
    'DataCatalogOutputs' => [
        [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'DatabaseOptions' => [
                'TableName' => '<string>', // REQUIRED
                'TempDirectory' => [
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'Overwrite' => true || false,
            'S3Options' => [
                'Location' => [ // REQUIRED
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'TableName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'DatabaseOutputs' => [
        [
            'DatabaseOptions' => [ // REQUIRED
                'TableName' => '<string>', // REQUIRED
                'TempDirectory' => [
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'DatabaseOutputMode' => 'NEW_TABLE',
            'GlueConnectionName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'DatasetName' => '<string>',
    'EncryptionKeyArn' => '<string>',
    'EncryptionMode' => 'SSE-KMS|SSE-S3',
    'LogSubscription' => 'ENABLE|DISABLE',
    'MaxCapacity' => <integer>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'Outputs' => [
        [
            'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
            'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
            'FormatOptions' => [
                'Csv' => [
                    'Delimiter' => '<string>',
                ],
            ],
            'Location' => [ // REQUIRED
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
            'MaxOutputFiles' => <integer>,
            'Overwrite' => true || false,
            'PartitionColumns' => ['<string>', ...],
        ],
        // ...
    ],
    'ProjectName' => '<string>',
    'RecipeReference' => [
        'Name' => '<string>', // REQUIRED
        'RecipeVersion' => '<string>',
    ],
    'RoleArn' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
]);

Parameter Details

Members
DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write to.

DatasetName
Type: string

The name of the dataset that this job processes.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - Server-side encryption with keys managed by KMS.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

LogSubscription
Type: string

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

MaxCapacity
Type: int

The maximum number of nodes that DataBrew can consume when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

A unique name for the job. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

Outputs
Type: Array of Output structures

One or more artifacts that represent the output from running the job.

ProjectName
Type: string

Either the name of an existing project, or a combination of a recipe and a dataset to associate with the recipe.

RecipeReference
Type: RecipeReference structure

Represents the name and version of a DataBrew recipe.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the job that you created.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateRuleset

$result = $client->createRuleset([/* ... */]);
$promise = $client->createRulesetAsync([/* ... */]);

Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.

Parameter Syntax

$result = $client->createRuleset([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Rules' => [ // REQUIRED
        [
            'CheckExpression' => '<string>', // REQUIRED
            'ColumnSelectors' => [
                [
                    'Name' => '<string>',
                    'Regex' => '<string>',
                ],
                // ...
            ],
            'Disabled' => true || false,
            'Name' => '<string>', // REQUIRED
            'SubstitutionMap' => ['<string>', ...],
            'Threshold' => [
                'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN',
                'Unit' => 'COUNT|PERCENTAGE',
                'Value' => <float>, // REQUIRED
            ],
        ],
        // ...
    ],
    'Tags' => ['<string>', ...],
    'TargetArn' => '<string>', // REQUIRED
]);

Parameter Details

Members
Description
Type: string

The description of the ruleset.

Name
Required: Yes
Type: string

The name of the ruleset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

Rules
Required: Yes
Type: Array of Rule structures

A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to the ruleset.

TargetArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The unique name of the created ruleset.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

CreateSchedule

$result = $client->createSchedule([/* ... */]);
$promise = $client->createScheduleAsync([/* ... */]);

Creates a new schedule for one or more DataBrew jobs. Jobs can be run at a specific date and time, or at regular intervals.

Parameter Syntax

$result = $client->createSchedule([
    'CronExpression' => '<string>', // REQUIRED
    'JobNames' => ['<string>', ...],
    'Name' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...],
]);

Parameter Details

Members
CronExpression
Required: Yes
Type: string

The date or dates and time or times when the jobs are to be run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.

JobNames
Type: Array of strings

The name or names of one or more jobs to be run.

Name
Required: Yes
Type: string

A unique name for the schedule. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags to apply to this schedule.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the schedule that was created.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

DeleteDataset

$result = $client->deleteDataset([/* ... */]);
$promise = $client->deleteDatasetAsync([/* ... */]);

Deletes a dataset from DataBrew.

Parameter Syntax

$result = $client->deleteDataset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the dataset to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the dataset that you deleted.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DeleteJob

$result = $client->deleteJob([/* ... */]);
$promise = $client->deleteJobAsync([/* ... */]);

Deletes the specified DataBrew job.

Parameter Syntax

$result = $client->deleteJob([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the job to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the job that you deleted.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DeleteProject

$result = $client->deleteProject([/* ... */]);
$promise = $client->deleteProjectAsync([/* ... */]);

Deletes an existing DataBrew project.

Parameter Syntax

$result = $client->deleteProject([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the project to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the project that you deleted.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DeleteRecipeVersion

$result = $client->deleteRecipeVersion([/* ... */]);
$promise = $client->deleteRecipeVersionAsync([/* ... */]);

Deletes a single version of a DataBrew recipe.

Parameter Syntax

$result = $client->deleteRecipeVersion([
    'Name' => '<string>', // REQUIRED
    'RecipeVersion' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the recipe.

RecipeVersion
Required: Yes
Type: string

The version of the recipe to be deleted. You can specify a numeric versions (X.Y) or LATEST_WORKING. LATEST_PUBLISHED is not supported.

Result Syntax

[
    'Name' => '<string>',
    'RecipeVersion' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the recipe that was deleted.

RecipeVersion
Required: Yes
Type: string

The version of the recipe that was deleted.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DeleteRuleset

$result = $client->deleteRuleset([/* ... */]);
$promise = $client->deleteRulesetAsync([/* ... */]);

Deletes a ruleset.

Parameter Syntax

$result = $client->deleteRuleset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the ruleset to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the deleted ruleset.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

DeleteSchedule

$result = $client->deleteSchedule([/* ... */]);
$promise = $client->deleteScheduleAsync([/* ... */]);

Deletes the specified DataBrew schedule.

Parameter Syntax

$result = $client->deleteSchedule([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the schedule to be deleted.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the schedule that was deleted.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeDataset

$result = $client->describeDataset([/* ... */]);
$promise = $client->describeDatasetAsync([/* ... */]);

Returns the definition of a specific DataBrew dataset.

Parameter Syntax

$result = $client->describeDataset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the dataset to be described.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC',
    'FormatOptions' => [
        'Csv' => [
            'Delimiter' => '<string>',
            'HeaderRow' => true || false,
        ],
        'Excel' => [
            'HeaderRow' => true || false,
            'SheetIndexes' => [<integer>, ...],
            'SheetNames' => ['<string>', ...],
        ],
        'Json' => [
            'MultiLine' => true || false,
        ],
    ],
    'Input' => [
        'DataCatalogInputDefinition' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'TableName' => '<string>',
            'TempDirectory' => [
                'Bucket' => '<string>',
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'DatabaseInputDefinition' => [
            'DatabaseTableName' => '<string>',
            'GlueConnectionName' => '<string>',
            'QueryString' => '<string>',
            'TempDirectory' => [
                'Bucket' => '<string>',
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'Metadata' => [
            'SourceArn' => '<string>',
        ],
        'S3InputDefinition' => [
            'Bucket' => '<string>',
            'BucketOwner' => '<string>',
            'Key' => '<string>',
        ],
    ],
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
    'PathOptions' => [
        'FilesLimit' => [
            'MaxFiles' => <integer>,
            'Order' => 'DESCENDING|ASCENDING',
            'OrderedBy' => 'LAST_MODIFIED_DATE',
        ],
        'LastModifiedDateCondition' => [
            'Expression' => '<string>',
            'ValuesMap' => ['<string>', ...],
        ],
        'Parameters' => [
            '<PathParameterName>' => [
                'CreateColumn' => true || false,
                'DatetimeOptions' => [
                    'Format' => '<string>',
                    'LocaleCode' => '<string>',
                    'TimezoneOffset' => '<string>',
                ],
                'Filter' => [
                    'Expression' => '<string>',
                    'ValuesMap' => ['<string>', ...],
                ],
                'Name' => '<string>',
                'Type' => 'Datetime|Number|String',
            ],
            // ...
        ],
    ],
    'ResourceArn' => '<string>',
    'Source' => 'S3|DATA-CATALOG|DATABASE',
    'Tags' => ['<string>', ...],
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the dataset was created.

CreatedBy
Type: string

The identifier (user name) of the user who created the dataset.

Format
Type: string

The file format of a dataset that is created from an Amazon S3 file or folder.

FormatOptions
Type: FormatOptions structure

Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.

Input
Required: Yes
Type: Input structure

Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.

LastModifiedBy
Type: string

The identifier (user name) of the user who last modified the dataset.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the dataset was last modified.

Name
Required: Yes
Type: string

The name of the dataset.

PathOptions
Type: PathOptions structure

A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.

ResourceArn
Type: string

The Amazon Resource Name (ARN) of the dataset.

Source
Type: string

The location of the data for this dataset, Amazon S3 or the Glue Data Catalog.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags associated with this dataset.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeJob

$result = $client->describeJob([/* ... */]);
$promise = $client->describeJobAsync([/* ... */]);

Returns the definition of a specific DataBrew job.

Parameter Syntax

$result = $client->describeJob([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the job to be described.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'DataCatalogOutputs' => [
        [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'DatabaseOptions' => [
                'TableName' => '<string>',
                'TempDirectory' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'Overwrite' => true || false,
            'S3Options' => [
                'Location' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'TableName' => '<string>',
        ],
        // ...
    ],
    'DatabaseOutputs' => [
        [
            'DatabaseOptions' => [
                'TableName' => '<string>',
                'TempDirectory' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'DatabaseOutputMode' => 'NEW_TABLE',
            'GlueConnectionName' => '<string>',
        ],
        // ...
    ],
    'DatasetName' => '<string>',
    'EncryptionKeyArn' => '<string>',
    'EncryptionMode' => 'SSE-KMS|SSE-S3',
    'JobSample' => [
        'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
        'Size' => <integer>,
    ],
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'LogSubscription' => 'ENABLE|DISABLE',
    'MaxCapacity' => <integer>,
    'MaxRetries' => <integer>,
    'Name' => '<string>',
    'Outputs' => [
        [
            'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
            'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
            'FormatOptions' => [
                'Csv' => [
                    'Delimiter' => '<string>',
                ],
            ],
            'Location' => [
                'Bucket' => '<string>',
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
            'MaxOutputFiles' => <integer>,
            'Overwrite' => true || false,
            'PartitionColumns' => ['<string>', ...],
        ],
        // ...
    ],
    'ProfileConfiguration' => [
        'ColumnStatisticsConfigurations' => [
            [
                'Selectors' => [
                    [
                        'Name' => '<string>',
                        'Regex' => '<string>',
                    ],
                    // ...
                ],
                'Statistics' => [
                    'IncludedStatistics' => ['<string>', ...],
                    'Overrides' => [
                        [
                            'Parameters' => ['<string>', ...],
                            'Statistic' => '<string>',
                        ],
                        // ...
                    ],
                ],
            ],
            // ...
        ],
        'DatasetStatisticsConfiguration' => [
            'IncludedStatistics' => ['<string>', ...],
            'Overrides' => [
                [
                    'Parameters' => ['<string>', ...],
                    'Statistic' => '<string>',
                ],
                // ...
            ],
        ],
        'EntityDetectorConfiguration' => [
            'AllowedStatistics' => [
                [
                    'Statistics' => ['<string>', ...],
                ],
                // ...
            ],
            'EntityTypes' => ['<string>', ...],
        ],
        'ProfileColumns' => [
            [
                'Name' => '<string>',
                'Regex' => '<string>',
            ],
            // ...
        ],
    ],
    'ProjectName' => '<string>',
    'RecipeReference' => [
        'Name' => '<string>',
        'RecipeVersion' => '<string>',
    ],
    'ResourceArn' => '<string>',
    'RoleArn' => '<string>',
    'Tags' => ['<string>', ...],
    'Timeout' => <integer>,
    'Type' => 'PROFILE|RECIPE',
    'ValidationConfigurations' => [
        [
            'RulesetArn' => '<string>',
            'ValidationMode' => 'CHECK_ALL',
        ],
        // ...
    ],
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the job was created.

CreatedBy
Type: string

The identifier (user name) of the user associated with the creation of the job.

DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

DatasetName
Type: string

The dataset that the job acts upon.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - Server-side encryption with keys managed by KMS.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

JobSample
Type: JobSample structure

Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed.

LastModifiedBy
Type: string

The identifier (user name) of the user who last modified the job.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the job was last modified.

LogSubscription
Type: string

Indicates whether Amazon CloudWatch logging is enabled for this job.

MaxCapacity
Type: int

The maximum number of compute nodes that DataBrew can consume when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

The name of the job.

Outputs
Type: Array of Output structures

One or more artifacts that represent the output from running the job.

ProfileConfiguration
Type: ProfileConfiguration structure

Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.

ProjectName
Type: string

The DataBrew project associated with this job.

RecipeReference
Type: RecipeReference structure

Represents the name and version of a DataBrew recipe.

ResourceArn
Type: string

The Amazon Resource Name (ARN) of the job.

RoleArn
Type: string

The ARN of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags associated with this job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

Type
Type: string

The job type, which must be one of the following:

  • PROFILE - The job analyzes the dataset to determine its size, data types, data distribution, and more.

  • RECIPE - The job applies one or more transformations to a dataset.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeJobRun

$result = $client->describeJobRun([/* ... */]);
$promise = $client->describeJobRunAsync([/* ... */]);

Represents one run of a DataBrew job.

Parameter Syntax

$result = $client->describeJobRun([
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the job being processed during this run.

RunId
Required: Yes
Type: string

The unique identifier of the job run.

Result Syntax

[
    'Attempt' => <integer>,
    'CompletedOn' => <DateTime>,
    'DataCatalogOutputs' => [
        [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>',
            'DatabaseOptions' => [
                'TableName' => '<string>',
                'TempDirectory' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'Overwrite' => true || false,
            'S3Options' => [
                'Location' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'TableName' => '<string>',
        ],
        // ...
    ],
    'DatabaseOutputs' => [
        [
            'DatabaseOptions' => [
                'TableName' => '<string>',
                'TempDirectory' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'DatabaseOutputMode' => 'NEW_TABLE',
            'GlueConnectionName' => '<string>',
        ],
        // ...
    ],
    'DatasetName' => '<string>',
    'ErrorMessage' => '<string>',
    'ExecutionTime' => <integer>,
    'JobName' => '<string>',
    'JobSample' => [
        'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
        'Size' => <integer>,
    ],
    'LogGroupName' => '<string>',
    'LogSubscription' => 'ENABLE|DISABLE',
    'Outputs' => [
        [
            'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
            'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
            'FormatOptions' => [
                'Csv' => [
                    'Delimiter' => '<string>',
                ],
            ],
            'Location' => [
                'Bucket' => '<string>',
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
            'MaxOutputFiles' => <integer>,
            'Overwrite' => true || false,
            'PartitionColumns' => ['<string>', ...],
        ],
        // ...
    ],
    'ProfileConfiguration' => [
        'ColumnStatisticsConfigurations' => [
            [
                'Selectors' => [
                    [
                        'Name' => '<string>',
                        'Regex' => '<string>',
                    ],
                    // ...
                ],
                'Statistics' => [
                    'IncludedStatistics' => ['<string>', ...],
                    'Overrides' => [
                        [
                            'Parameters' => ['<string>', ...],
                            'Statistic' => '<string>',
                        ],
                        // ...
                    ],
                ],
            ],
            // ...
        ],
        'DatasetStatisticsConfiguration' => [
            'IncludedStatistics' => ['<string>', ...],
            'Overrides' => [
                [
                    'Parameters' => ['<string>', ...],
                    'Statistic' => '<string>',
                ],
                // ...
            ],
        ],
        'EntityDetectorConfiguration' => [
            'AllowedStatistics' => [
                [
                    'Statistics' => ['<string>', ...],
                ],
                // ...
            ],
            'EntityTypes' => ['<string>', ...],
        ],
        'ProfileColumns' => [
            [
                'Name' => '<string>',
                'Regex' => '<string>',
            ],
            // ...
        ],
    ],
    'RecipeReference' => [
        'Name' => '<string>',
        'RecipeVersion' => '<string>',
    ],
    'RunId' => '<string>',
    'StartedBy' => '<string>',
    'StartedOn' => <DateTime>,
    'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
    'ValidationConfigurations' => [
        [
            'RulesetArn' => '<string>',
            'ValidationMode' => 'CHECK_ALL',
        ],
        // ...
    ],
]

Result Details

Members
Attempt
Type: int

The number of times that DataBrew has attempted to run the job.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the job completed processing.

DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

DatasetName
Type: string

The name of the dataset for the job to process.

ErrorMessage
Type: string

A message indicating an error (if any) that was encountered when the job ran.

ExecutionTime
Type: int

The amount of time, in seconds, during which the job run consumed resources.

JobName
Required: Yes
Type: string

The name of the job being processed during this run.

JobSample
Type: JobSample structure

Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.

LogGroupName
Type: string

The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.

LogSubscription
Type: string

The current status of Amazon CloudWatch logging for the job run.

Outputs
Type: Array of Output structures

One or more output artifacts from a job run.

ProfileConfiguration
Type: ProfileConfiguration structure

Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.

RecipeReference
Type: RecipeReference structure

Represents the name and version of a DataBrew recipe.

RunId
Type: string

The unique identifier of the job run.

StartedBy
Type: string

The Amazon Resource Name (ARN) of the user who started the job run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the job run began.

State
Type: string

The current state of the job run entity itself.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeProject

$result = $client->describeProject([/* ... */]);
$promise = $client->describeProjectAsync([/* ... */]);

Returns the definition of a specific DataBrew project.

Parameter Syntax

$result = $client->describeProject([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the project to be described.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'DatasetName' => '<string>',
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
    'OpenDate' => <DateTime>,
    'OpenedBy' => '<string>',
    'RecipeName' => '<string>',
    'ResourceArn' => '<string>',
    'RoleArn' => '<string>',
    'Sample' => [
        'Size' => <integer>,
        'Type' => 'FIRST_N|LAST_N|RANDOM',
    ],
    'SessionStatus' => 'ASSIGNED|FAILED|INITIALIZING|PROVISIONING|READY|RECYCLING|ROTATING|TERMINATED|TERMINATING|UPDATING',
    'Tags' => ['<string>', ...],
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the project was created.

CreatedBy
Type: string

The identifier (user name) of the user who created the project.

DatasetName
Type: string

The dataset associated with the project.

LastModifiedBy
Type: string

The identifier (user name) of the user who last modified the project.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the project was last modified.

Name
Required: Yes
Type: string

The name of the project.

OpenDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the project was opened.

OpenedBy
Type: string

The identifier (user name) of the user that opened the project for use.

RecipeName
Type: string

The recipe associated with this job.

ResourceArn
Type: string

The Amazon Resource Name (ARN) of the project.

RoleArn
Type: string

The ARN of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Sample
Type: Sample structure

Represents the sample size and sampling type for DataBrew to use for interactive data analysis.

SessionStatus
Type: string

Describes the current state of the session:

  • PROVISIONING - allocating resources for the session.

  • INITIALIZING - getting the session ready for first use.

  • ASSIGNED - the session is ready for use.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags associated with this project.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeRecipe

$result = $client->describeRecipe([/* ... */]);
$promise = $client->describeRecipeAsync([/* ... */]);

Returns the definition of a specific DataBrew recipe corresponding to a particular version.

Parameter Syntax

$result = $client->describeRecipe([
    'Name' => '<string>', // REQUIRED
    'RecipeVersion' => '<string>',
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the recipe to be described.

RecipeVersion
Type: string

The recipe version identifier. If this parameter isn't specified, then the latest published version is returned.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'Description' => '<string>',
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
    'ProjectName' => '<string>',
    'PublishedBy' => '<string>',
    'PublishedDate' => <DateTime>,
    'RecipeVersion' => '<string>',
    'ResourceArn' => '<string>',
    'Steps' => [
        [
            'Action' => [
                'Operation' => '<string>',
                'Parameters' => ['<string>', ...],
            ],
            'ConditionExpressions' => [
                [
                    'Condition' => '<string>',
                    'TargetColumn' => '<string>',
                    'Value' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
    'Tags' => ['<string>', ...],
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the recipe was created.

CreatedBy
Type: string

The identifier (user name) of the user who created the recipe.

Description
Type: string

The description of the recipe.

LastModifiedBy
Type: string

The identifier (user name) of the user who last modified the recipe.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the recipe was last modified.

Name
Required: Yes
Type: string

The name of the recipe.

ProjectName
Type: string

The name of the project associated with this recipe.

PublishedBy
Type: string

The identifier (user name) of the user who last published the recipe.

PublishedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the recipe was last published.

RecipeVersion
Type: string

The recipe version identifier.

ResourceArn
Type: string

The ARN of the recipe.

Steps
Type: Array of RecipeStep structures

One or more steps to be performed by the recipe. Each step consists of an action, and the conditions under which the action should succeed.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags associated with this project.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeRuleset

$result = $client->describeRuleset([/* ... */]);
$promise = $client->describeRulesetAsync([/* ... */]);

Retrieves detailed information about the ruleset.

Parameter Syntax

$result = $client->describeRuleset([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the ruleset to be described.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'Description' => '<string>',
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
    'ResourceArn' => '<string>',
    'Rules' => [
        [
            'CheckExpression' => '<string>',
            'ColumnSelectors' => [
                [
                    'Name' => '<string>',
                    'Regex' => '<string>',
                ],
                // ...
            ],
            'Disabled' => true || false,
            'Name' => '<string>',
            'SubstitutionMap' => ['<string>', ...],
            'Threshold' => [
                'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN',
                'Unit' => 'COUNT|PERCENTAGE',
                'Value' => <float>,
            ],
        ],
        // ...
    ],
    'Tags' => ['<string>', ...],
    'TargetArn' => '<string>',
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the ruleset was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the ruleset.

Description
Type: string

The description of the ruleset.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the ruleset.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The modification date and time of the ruleset.

Name
Required: Yes
Type: string

The name of the ruleset.

ResourceArn
Type: string

The Amazon Resource Name (ARN) for the ruleset.

Rules
Type: Array of Rule structures

A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the ruleset.

TargetArn
Type: string

The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

DescribeSchedule

$result = $client->describeSchedule([/* ... */]);
$promise = $client->describeScheduleAsync([/* ... */]);

Returns the definition of a specific DataBrew schedule.

Parameter Syntax

$result = $client->describeSchedule([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the schedule to be described.

Result Syntax

[
    'CreateDate' => <DateTime>,
    'CreatedBy' => '<string>',
    'CronExpression' => '<string>',
    'JobNames' => ['<string>', ...],
    'LastModifiedBy' => '<string>',
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
    'ResourceArn' => '<string>',
    'Tags' => ['<string>', ...],
]

Result Details

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the schedule was created.

CreatedBy
Type: string

The identifier (user name) of the user who created the schedule.

CronExpression
Type: string

The date or dates and time or times when the jobs are to be run for the schedule. For more information, see Cron expressions in the Glue DataBrew Developer Guide.

JobNames
Type: Array of strings

The name or names of one or more jobs to be run by using the schedule.

LastModifiedBy
Type: string

The identifier (user name) of the user who last modified the schedule.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the schedule was last modified.

Name
Required: Yes
Type: string

The name of the schedule.

ResourceArn
Type: string

The Amazon Resource Name (ARN) of the schedule.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags associated with this schedule.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

ListDatasets

$result = $client->listDatasets([/* ... */]);
$promise = $client->listDatasetsAsync([/* ... */]);

Lists all of the DataBrew datasets.

Parameter Syntax

$result = $client->listDatasets([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

Result Syntax

[
    'Datasets' => [
        [
            'AccountId' => '<string>',
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC',
            'FormatOptions' => [
                'Csv' => [
                    'Delimiter' => '<string>',
                    'HeaderRow' => true || false,
                ],
                'Excel' => [
                    'HeaderRow' => true || false,
                    'SheetIndexes' => [<integer>, ...],
                    'SheetNames' => ['<string>', ...],
                ],
                'Json' => [
                    'MultiLine' => true || false,
                ],
            ],
            'Input' => [
                'DataCatalogInputDefinition' => [
                    'CatalogId' => '<string>',
                    'DatabaseName' => '<string>',
                    'TableName' => '<string>',
                    'TempDirectory' => [
                        'Bucket' => '<string>',
                        'BucketOwner' => '<string>',
                        'Key' => '<string>',
                    ],
                ],
                'DatabaseInputDefinition' => [
                    'DatabaseTableName' => '<string>',
                    'GlueConnectionName' => '<string>',
                    'QueryString' => '<string>',
                    'TempDirectory' => [
                        'Bucket' => '<string>',
                        'BucketOwner' => '<string>',
                        'Key' => '<string>',
                    ],
                ],
                'Metadata' => [
                    'SourceArn' => '<string>',
                ],
                'S3InputDefinition' => [
                    'Bucket' => '<string>',
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'PathOptions' => [
                'FilesLimit' => [
                    'MaxFiles' => <integer>,
                    'Order' => 'DESCENDING|ASCENDING',
                    'OrderedBy' => 'LAST_MODIFIED_DATE',
                ],
                'LastModifiedDateCondition' => [
                    'Expression' => '<string>',
                    'ValuesMap' => ['<string>', ...],
                ],
                'Parameters' => [
                    '<PathParameterName>' => [
                        'CreateColumn' => true || false,
                        'DatetimeOptions' => [
                            'Format' => '<string>',
                            'LocaleCode' => '<string>',
                            'TimezoneOffset' => '<string>',
                        ],
                        'Filter' => [
                            'Expression' => '<string>',
                            'ValuesMap' => ['<string>', ...],
                        ],
                        'Name' => '<string>',
                        'Type' => 'Datetime|Number|String',
                    ],
                    // ...
                ],
            ],
            'ResourceArn' => '<string>',
            'Source' => 'S3|DATA-CATALOG|DATABASE',
            'Tags' => ['<string>', ...],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Datasets
Required: Yes
Type: Array of Dataset structures

A list of datasets that are defined.

NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Errors

ValidationException:

The input parameters for this request failed validation.

ListJobRuns

$result = $client->listJobRuns([/* ... */]);
$promise = $client->listJobRunsAsync([/* ... */]);

Lists all of the previous runs of a particular DataBrew job.

Parameter Syntax

$result = $client->listJobRuns([
    'MaxResults' => <integer>,
    'Name' => '<string>', // REQUIRED
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

Name
Required: Yes
Type: string

The name of the job.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

Result Syntax

[
    'JobRuns' => [
        [
            'Attempt' => <integer>,
            'CompletedOn' => <DateTime>,
            'DataCatalogOutputs' => [
                [
                    'CatalogId' => '<string>',
                    'DatabaseName' => '<string>',
                    'DatabaseOptions' => [
                        'TableName' => '<string>',
                        'TempDirectory' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'Overwrite' => true || false,
                    'S3Options' => [
                        'Location' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'TableName' => '<string>',
                ],
                // ...
            ],
            'DatabaseOutputs' => [
                [
                    'DatabaseOptions' => [
                        'TableName' => '<string>',
                        'TempDirectory' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'DatabaseOutputMode' => 'NEW_TABLE',
                    'GlueConnectionName' => '<string>',
                ],
                // ...
            ],
            'DatasetName' => '<string>',
            'ErrorMessage' => '<string>',
            'ExecutionTime' => <integer>,
            'JobName' => '<string>',
            'JobSample' => [
                'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
                'Size' => <integer>,
            ],
            'LogGroupName' => '<string>',
            'LogSubscription' => 'ENABLE|DISABLE',
            'Outputs' => [
                [
                    'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
                    'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
                    'FormatOptions' => [
                        'Csv' => [
                            'Delimiter' => '<string>',
                        ],
                    ],
                    'Location' => [
                        'Bucket' => '<string>',
                        'BucketOwner' => '<string>',
                        'Key' => '<string>',
                    ],
                    'MaxOutputFiles' => <integer>,
                    'Overwrite' => true || false,
                    'PartitionColumns' => ['<string>', ...],
                ],
                // ...
            ],
            'RecipeReference' => [
                'Name' => '<string>',
                'RecipeVersion' => '<string>',
            ],
            'RunId' => '<string>',
            'StartedBy' => '<string>',
            'StartedOn' => <DateTime>,
            'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT',
            'ValidationConfigurations' => [
                [
                    'RulesetArn' => '<string>',
                    'ValidationMode' => 'CHECK_ALL',
                ],
                // ...
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
JobRuns
Required: Yes
Type: Array of JobRun structures

A list of job runs that have occurred for the specified job.

NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

ListJobs

$result = $client->listJobs([/* ... */]);
$promise = $client->listJobsAsync([/* ... */]);

Lists all of the DataBrew jobs that are defined.

Parameter Syntax

$result = $client->listJobs([
    'DatasetName' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'ProjectName' => '<string>',
]);

Parameter Details

Members
DatasetName
Type: string

The name of a dataset. Using this parameter indicates to return only those jobs that act on the specified dataset.

MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

A token generated by DataBrew that specifies where to continue pagination if a previous request was truncated. To get the next set of pages, pass in the NextToken value from the response object of the previous page call.

ProjectName
Type: string

The name of a project. Using this parameter indicates to return only those jobs that are associated with the specified project.

Result Syntax

[
    'Jobs' => [
        [
            'AccountId' => '<string>',
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'DataCatalogOutputs' => [
                [
                    'CatalogId' => '<string>',
                    'DatabaseName' => '<string>',
                    'DatabaseOptions' => [
                        'TableName' => '<string>',
                        'TempDirectory' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'Overwrite' => true || false,
                    'S3Options' => [
                        'Location' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'TableName' => '<string>',
                ],
                // ...
            ],
            'DatabaseOutputs' => [
                [
                    'DatabaseOptions' => [
                        'TableName' => '<string>',
                        'TempDirectory' => [
                            'Bucket' => '<string>',
                            'BucketOwner' => '<string>',
                            'Key' => '<string>',
                        ],
                    ],
                    'DatabaseOutputMode' => 'NEW_TABLE',
                    'GlueConnectionName' => '<string>',
                ],
                // ...
            ],
            'DatasetName' => '<string>',
            'EncryptionKeyArn' => '<string>',
            'EncryptionMode' => 'SSE-KMS|SSE-S3',
            'JobSample' => [
                'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
                'Size' => <integer>,
            ],
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'LogSubscription' => 'ENABLE|DISABLE',
            'MaxCapacity' => <integer>,
            'MaxRetries' => <integer>,
            'Name' => '<string>',
            'Outputs' => [
                [
                    'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
                    'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
                    'FormatOptions' => [
                        'Csv' => [
                            'Delimiter' => '<string>',
                        ],
                    ],
                    'Location' => [
                        'Bucket' => '<string>',
                        'BucketOwner' => '<string>',
                        'Key' => '<string>',
                    ],
                    'MaxOutputFiles' => <integer>,
                    'Overwrite' => true || false,
                    'PartitionColumns' => ['<string>', ...],
                ],
                // ...
            ],
            'ProjectName' => '<string>',
            'RecipeReference' => [
                'Name' => '<string>',
                'RecipeVersion' => '<string>',
            ],
            'ResourceArn' => '<string>',
            'RoleArn' => '<string>',
            'Tags' => ['<string>', ...],
            'Timeout' => <integer>,
            'Type' => 'PROFILE|RECIPE',
            'ValidationConfigurations' => [
                [
                    'RulesetArn' => '<string>',
                    'ValidationMode' => 'CHECK_ALL',
                ],
                // ...
            ],
        ],
        // ...
    ],
    'NextToken' => '<string>',
]

Result Details

Members
Jobs
Required: Yes
Type: Array of Job structures

A list of jobs that are defined.

NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Errors

ValidationException:

The input parameters for this request failed validation.

ListProjects

$result = $client->listProjects([/* ... */]);
$promise = $client->listProjectsAsync([/* ... */]);

Lists all of the DataBrew projects that are defined.

Parameter Syntax

$result = $client->listProjects([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

Result Syntax

[
    'NextToken' => '<string>',
    'Projects' => [
        [
            'AccountId' => '<string>',
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'DatasetName' => '<string>',
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'OpenDate' => <DateTime>,
            'OpenedBy' => '<string>',
            'RecipeName' => '<string>',
            'ResourceArn' => '<string>',
            'RoleArn' => '<string>',
            'Sample' => [
                'Size' => <integer>,
                'Type' => 'FIRST_N|LAST_N|RANDOM',
            ],
            'Tags' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Projects
Required: Yes
Type: Array of Project structures

A list of projects that are defined .

Errors

ValidationException:

The input parameters for this request failed validation.

ListRecipeVersions

$result = $client->listRecipeVersions([/* ... */]);
$promise = $client->listRecipeVersionsAsync([/* ... */]);

Lists the versions of a particular DataBrew recipe, except for LATEST_WORKING.

Parameter Syntax

$result = $client->listRecipeVersions([
    'MaxResults' => <integer>,
    'Name' => '<string>', // REQUIRED
    'NextToken' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

Name
Required: Yes
Type: string

The name of the recipe for which to return version information.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

Result Syntax

[
    'NextToken' => '<string>',
    'Recipes' => [
        [
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'Description' => '<string>',
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'ProjectName' => '<string>',
            'PublishedBy' => '<string>',
            'PublishedDate' => <DateTime>,
            'RecipeVersion' => '<string>',
            'ResourceArn' => '<string>',
            'Steps' => [
                [
                    'Action' => [
                        'Operation' => '<string>',
                        'Parameters' => ['<string>', ...],
                    ],
                    'ConditionExpressions' => [
                        [
                            'Condition' => '<string>',
                            'TargetColumn' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                ],
                // ...
            ],
            'Tags' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Recipes
Required: Yes
Type: Array of Recipe structures

A list of versions for the specified recipe.

Errors

ValidationException:

The input parameters for this request failed validation.

ListRecipes

$result = $client->listRecipes([/* ... */]);
$promise = $client->listRecipesAsync([/* ... */]);

Lists all of the DataBrew recipes that are defined.

Parameter Syntax

$result = $client->listRecipes([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'RecipeVersion' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

RecipeVersion
Type: string

Return only those recipes with a version identifier of LATEST_WORKING or LATEST_PUBLISHED. If RecipeVersion is omitted, ListRecipes returns all of the LATEST_PUBLISHED recipe versions.

Valid values: LATEST_WORKING | LATEST_PUBLISHED

Result Syntax

[
    'NextToken' => '<string>',
    'Recipes' => [
        [
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'Description' => '<string>',
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'ProjectName' => '<string>',
            'PublishedBy' => '<string>',
            'PublishedDate' => <DateTime>,
            'RecipeVersion' => '<string>',
            'ResourceArn' => '<string>',
            'Steps' => [
                [
                    'Action' => [
                        'Operation' => '<string>',
                        'Parameters' => ['<string>', ...],
                    ],
                    'ConditionExpressions' => [
                        [
                            'Condition' => '<string>',
                            'TargetColumn' => '<string>',
                            'Value' => '<string>',
                        ],
                        // ...
                    ],
                ],
                // ...
            ],
            'Tags' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Recipes
Required: Yes
Type: Array of Recipe structures

A list of recipes that are defined.

Errors

ValidationException:

The input parameters for this request failed validation.

ListRulesets

$result = $client->listRulesets([/* ... */]);
$promise = $client->listRulesetsAsync([/* ... */]);

List all rulesets available in the current account or rulesets associated with a specific resource (dataset).

Parameter Syntax

$result = $client->listRulesets([
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
    'TargetArn' => '<string>',
]);

Parameter Details

Members
MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

A token generated by DataBrew that specifies where to continue pagination if a previous request was truncated. To get the next set of pages, pass in the NextToken value from the response object of the previous page call.

TargetArn
Type: string

The Amazon Resource Name (ARN) of a resource (dataset). Using this parameter indicates to return only those rulesets that are associated with the specified resource.

Result Syntax

[
    'NextToken' => '<string>',
    'Rulesets' => [
        [
            'AccountId' => '<string>',
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'Description' => '<string>',
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'ResourceArn' => '<string>',
            'RuleCount' => <integer>,
            'Tags' => ['<string>', ...],
            'TargetArn' => '<string>',
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Rulesets
Required: Yes
Type: Array of RulesetItem structures

A list of RulesetItem. RulesetItem contains meta data of a ruleset.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

ListSchedules

$result = $client->listSchedules([/* ... */]);
$promise = $client->listSchedulesAsync([/* ... */]);

Lists the DataBrew schedules that are defined.

Parameter Syntax

$result = $client->listSchedules([
    'JobName' => '<string>',
    'MaxResults' => <integer>,
    'NextToken' => '<string>',
]);

Parameter Details

Members
JobName
Type: string

The name of the job that these schedules apply to.

MaxResults
Type: int

The maximum number of results to return in this request.

NextToken
Type: string

The token returned by a previous call to retrieve the next set of results.

Result Syntax

[
    'NextToken' => '<string>',
    'Schedules' => [
        [
            'AccountId' => '<string>',
            'CreateDate' => <DateTime>,
            'CreatedBy' => '<string>',
            'CronExpression' => '<string>',
            'JobNames' => ['<string>', ...],
            'LastModifiedBy' => '<string>',
            'LastModifiedDate' => <DateTime>,
            'Name' => '<string>',
            'ResourceArn' => '<string>',
            'Tags' => ['<string>', ...],
        ],
        // ...
    ],
]

Result Details

Members
NextToken
Type: string

A token that you can use in a subsequent call to retrieve the next set of results.

Schedules
Required: Yes
Type: Array of Schedule structures

A list of schedules that are defined.

Errors

ValidationException:

The input parameters for this request failed validation.

ListTagsForResource

$result = $client->listTagsForResource([/* ... */]);
$promise = $client->listTagsForResourceAsync([/* ... */]);

Lists all the tags for a DataBrew resource.

Parameter Syntax

$result = $client->listTagsForResource([
    'ResourceArn' => '<string>', // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) string that uniquely identifies the DataBrew resource.

Result Syntax

[
    'Tags' => ['<string>', ...],
]

Result Details

Members
Tags
Type: Associative array of custom strings keys (TagKey) to strings

A list of tags associated with the DataBrew resource.

Errors

InternalServerException:

An internal service failure occurred.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

PublishRecipe

$result = $client->publishRecipe([/* ... */]);
$promise = $client->publishRecipeAsync([/* ... */]);

Publishes a new version of a DataBrew recipe.

Parameter Syntax

$result = $client->publishRecipe([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Description
Type: string

A description of the recipe to be published, for this version of the recipe.

Name
Required: Yes
Type: string

The name of the recipe to be published.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the recipe that you published.

Errors

ValidationException:

The input parameters for this request failed validation.

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

SendProjectSessionAction

$result = $client->sendProjectSessionAction([/* ... */]);
$promise = $client->sendProjectSessionActionAsync([/* ... */]);

Performs a recipe step within an interactive DataBrew session that's currently open.

Parameter Syntax

$result = $client->sendProjectSessionAction([
    'ClientSessionId' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Preview' => true || false,
    'RecipeStep' => [
        'Action' => [ // REQUIRED
            'Operation' => '<string>', // REQUIRED
            'Parameters' => ['<string>', ...],
        ],
        'ConditionExpressions' => [
            [
                'Condition' => '<string>', // REQUIRED
                'TargetColumn' => '<string>', // REQUIRED
                'Value' => '<string>',
            ],
            // ...
        ],
    ],
    'StepIndex' => <integer>,
    'ViewFrame' => [
        'Analytics' => 'ENABLE|DISABLE',
        'ColumnRange' => <integer>,
        'HiddenColumns' => ['<string>', ...],
        'RowRange' => <integer>,
        'StartColumnIndex' => <integer>, // REQUIRED
        'StartRowIndex' => <integer>,
    ],
]);

Parameter Details

Members
ClientSessionId
Type: string

A unique identifier for an interactive session that's currently open and ready for work. The action will be performed on this session.

Name
Required: Yes
Type: string

The name of the project to apply the action to.

Preview
Type: boolean

If true, the result of the recipe step will be returned, but not applied.

RecipeStep
Type: RecipeStep structure

Represents a single step from a DataBrew recipe to be performed.

StepIndex
Type: int

The index from which to preview a step. This index is used to preview the result of steps that have already been applied, so that the resulting view frame is from earlier in the view frame stack.

ViewFrame
Type: ViewFrame structure

Represents the data being transformed during an action.

Result Syntax

[
    'ActionId' => <integer>,
    'Name' => '<string>',
    'Result' => '<string>',
]

Result Details

Members
ActionId
Type: int

A unique identifier for the action that was performed.

Name
Required: Yes
Type: string

The name of the project that was affected by the action.

Result
Type: string

A message indicating the result of performing the action.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

StartJobRun

$result = $client->startJobRun([/* ... */]);
$promise = $client->startJobRunAsync([/* ... */]);

Runs a DataBrew job.

Parameter Syntax

$result = $client->startJobRun([
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the job to be run.

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Required: Yes
Type: string

A system-generated identifier for this particular job run.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

StartProjectSession

$result = $client->startProjectSession([/* ... */]);
$promise = $client->startProjectSessionAsync([/* ... */]);

Creates an interactive session, enabling you to manipulate data in a DataBrew project.

Parameter Syntax

$result = $client->startProjectSession([
    'AssumeControl' => true || false,
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
AssumeControl
Type: boolean

A value that, if true, enables you to take control of a session, even if a different client is currently accessing the project.

Name
Required: Yes
Type: string

The name of the project to act upon.

Result Syntax

[
    'ClientSessionId' => '<string>',
    'Name' => '<string>',
]

Result Details

Members
ClientSessionId
Type: string

A system-generated identifier for the session.

Name
Required: Yes
Type: string

The name of the project to be acted upon.

Errors

ConflictException:

Updating or deleting a resource can cause an inconsistent state.

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

StopJobRun

$result = $client->stopJobRun([/* ... */]);
$promise = $client->stopJobRunAsync([/* ... */]);

Stops a particular run of a job.

Parameter Syntax

$result = $client->stopJobRun([
    'Name' => '<string>', // REQUIRED
    'RunId' => '<string>', // REQUIRED
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the job to be stopped.

RunId
Required: Yes
Type: string

The ID of the job run to be stopped.

Result Syntax

[
    'RunId' => '<string>',
]

Result Details

Members
RunId
Required: Yes
Type: string

The ID of the job run that you stopped.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

TagResource

$result = $client->tagResource([/* ... */]);
$promise = $client->tagResourceAsync([/* ... */]);

Adds metadata tags to a DataBrew resource, such as a dataset, project, recipe, job, or schedule.

Parameter Syntax

$result = $client->tagResource([
    'ResourceArn' => '<string>', // REQUIRED
    'Tags' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

The DataBrew resource to which tags should be added. The value for this parameter is an Amazon Resource Name (ARN). For DataBrew, you can tag a dataset, a job, a project, or a recipe.

Tags
Required: Yes
Type: Associative array of custom strings keys (TagKey) to strings

One or more tags to be assigned to the resource.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InternalServerException:

An internal service failure occurred.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UntagResource

$result = $client->untagResource([/* ... */]);
$promise = $client->untagResourceAsync([/* ... */]);

Removes metadata tags from a DataBrew resource.

Parameter Syntax

$result = $client->untagResource([
    'ResourceArn' => '<string>', // REQUIRED
    'TagKeys' => ['<string>', ...], // REQUIRED
]);

Parameter Details

Members
ResourceArn
Required: Yes
Type: string

A DataBrew resource from which you want to remove a tag or tags. The value for this parameter is an Amazon Resource Name (ARN).

TagKeys
Required: Yes
Type: Array of strings

The tag keys (names) of one or more tags to be removed.

Result Syntax

[]

Result Details

The results for this operation are always empty.

Errors

InternalServerException:

An internal service failure occurred.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateDataset

$result = $client->updateDataset([/* ... */]);
$promise = $client->updateDatasetAsync([/* ... */]);

Modifies the definition of an existing DataBrew dataset.

Parameter Syntax

$result = $client->updateDataset([
    'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC',
    'FormatOptions' => [
        'Csv' => [
            'Delimiter' => '<string>',
            'HeaderRow' => true || false,
        ],
        'Excel' => [
            'HeaderRow' => true || false,
            'SheetIndexes' => [<integer>, ...],
            'SheetNames' => ['<string>', ...],
        ],
        'Json' => [
            'MultiLine' => true || false,
        ],
    ],
    'Input' => [ // REQUIRED
        'DataCatalogInputDefinition' => [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'TableName' => '<string>', // REQUIRED
            'TempDirectory' => [
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'DatabaseInputDefinition' => [
            'DatabaseTableName' => '<string>',
            'GlueConnectionName' => '<string>', // REQUIRED
            'QueryString' => '<string>',
            'TempDirectory' => [
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
        ],
        'Metadata' => [
            'SourceArn' => '<string>',
        ],
        'S3InputDefinition' => [
            'Bucket' => '<string>', // REQUIRED
            'BucketOwner' => '<string>',
            'Key' => '<string>',
        ],
    ],
    'Name' => '<string>', // REQUIRED
    'PathOptions' => [
        'FilesLimit' => [
            'MaxFiles' => <integer>, // REQUIRED
            'Order' => 'DESCENDING|ASCENDING',
            'OrderedBy' => 'LAST_MODIFIED_DATE',
        ],
        'LastModifiedDateCondition' => [
            'Expression' => '<string>', // REQUIRED
            'ValuesMap' => ['<string>', ...], // REQUIRED
        ],
        'Parameters' => [
            '<PathParameterName>' => [
                'CreateColumn' => true || false,
                'DatetimeOptions' => [
                    'Format' => '<string>', // REQUIRED
                    'LocaleCode' => '<string>',
                    'TimezoneOffset' => '<string>',
                ],
                'Filter' => [
                    'Expression' => '<string>', // REQUIRED
                    'ValuesMap' => ['<string>', ...], // REQUIRED
                ],
                'Name' => '<string>', // REQUIRED
                'Type' => 'Datetime|Number|String', // REQUIRED
            ],
            // ...
        ],
    ],
]);

Parameter Details

Members
Format
Type: string

The file format of a dataset that is created from an Amazon S3 file or folder.

FormatOptions
Type: FormatOptions structure

Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.

Input
Required: Yes
Type: Input structure

Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.

Name
Required: Yes
Type: string

The name of the dataset to be updated.

PathOptions
Type: PathOptions structure

A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the dataset that you updated.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateProfileJob

$result = $client->updateProfileJob([/* ... */]);
$promise = $client->updateProfileJobAsync([/* ... */]);

Modifies the definition of an existing profile job.

Parameter Syntax

$result = $client->updateProfileJob([
    'Configuration' => [
        'ColumnStatisticsConfigurations' => [
            [
                'Selectors' => [
                    [
                        'Name' => '<string>',
                        'Regex' => '<string>',
                    ],
                    // ...
                ],
                'Statistics' => [ // REQUIRED
                    'IncludedStatistics' => ['<string>', ...],
                    'Overrides' => [
                        [
                            'Parameters' => ['<string>', ...], // REQUIRED
                            'Statistic' => '<string>', // REQUIRED
                        ],
                        // ...
                    ],
                ],
            ],
            // ...
        ],
        'DatasetStatisticsConfiguration' => [
            'IncludedStatistics' => ['<string>', ...],
            'Overrides' => [
                [
                    'Parameters' => ['<string>', ...], // REQUIRED
                    'Statistic' => '<string>', // REQUIRED
                ],
                // ...
            ],
        ],
        'EntityDetectorConfiguration' => [
            'AllowedStatistics' => [
                [
                    'Statistics' => ['<string>', ...], // REQUIRED
                ],
                // ...
            ],
            'EntityTypes' => ['<string>', ...], // REQUIRED
        ],
        'ProfileColumns' => [
            [
                'Name' => '<string>',
                'Regex' => '<string>',
            ],
            // ...
        ],
    ],
    'EncryptionKeyArn' => '<string>',
    'EncryptionMode' => 'SSE-KMS|SSE-S3',
    'JobSample' => [
        'Mode' => 'FULL_DATASET|CUSTOM_ROWS',
        'Size' => <integer>,
    ],
    'LogSubscription' => 'ENABLE|DISABLE',
    'MaxCapacity' => <integer>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'OutputLocation' => [ // REQUIRED
        'Bucket' => '<string>', // REQUIRED
        'BucketOwner' => '<string>',
        'Key' => '<string>',
    ],
    'RoleArn' => '<string>', // REQUIRED
    'Timeout' => <integer>,
    'ValidationConfigurations' => [
        [
            'RulesetArn' => '<string>', // REQUIRED
            'ValidationMode' => 'CHECK_ALL',
        ],
        // ...
    ],
]);

Parameter Details

Members
Configuration
Type: ProfileConfiguration structure

Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - Server-side encryption with keys managed by KMS.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

JobSample
Type: JobSample structure

Sample configuration for Profile Jobs only. Determines the number of rows on which the Profile job will be executed. If a JobSample value is not provided for profile jobs, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.

LogSubscription
Type: string

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

MaxCapacity
Type: int

The maximum number of compute nodes that DataBrew can use when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

The name of the job to be updated.

OutputLocation
Required: Yes
Type: S3Location structure

Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the job that was updated.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateProject

$result = $client->updateProject([/* ... */]);
$promise = $client->updateProjectAsync([/* ... */]);

Modifies the definition of an existing DataBrew project.

Parameter Syntax

$result = $client->updateProject([
    'Name' => '<string>', // REQUIRED
    'RoleArn' => '<string>', // REQUIRED
    'Sample' => [
        'Size' => <integer>,
        'Type' => 'FIRST_N|LAST_N|RANDOM', // REQUIRED
    ],
]);

Parameter Details

Members
Name
Required: Yes
Type: string

The name of the project to be updated.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the IAM role to be assumed for this request.

Sample
Type: Sample structure

Represents the sample size and sampling type for DataBrew to use for interactive data analysis.

Result Syntax

[
    'LastModifiedDate' => <DateTime>,
    'Name' => '<string>',
]

Result Details

Members
LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the project was last modified.

Name
Required: Yes
Type: string

The name of the project that you updated.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateRecipe

$result = $client->updateRecipe([/* ... */]);
$promise = $client->updateRecipeAsync([/* ... */]);

Modifies the definition of the LATEST_WORKING version of a DataBrew recipe.

Parameter Syntax

$result = $client->updateRecipe([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Steps' => [
        [
            'Action' => [ // REQUIRED
                'Operation' => '<string>', // REQUIRED
                'Parameters' => ['<string>', ...],
            ],
            'ConditionExpressions' => [
                [
                    'Condition' => '<string>', // REQUIRED
                    'TargetColumn' => '<string>', // REQUIRED
                    'Value' => '<string>',
                ],
                // ...
            ],
        ],
        // ...
    ],
]);

Parameter Details

Members
Description
Type: string

A description of the recipe.

Name
Required: Yes
Type: string

The name of the recipe to be updated.

Steps
Type: Array of RecipeStep structures

One or more steps to be performed by the recipe. Each step consists of an action, and the conditions under which the action should succeed.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the recipe that was updated.

Errors

ValidationException:

The input parameters for this request failed validation.

ResourceNotFoundException:

One or more resources can't be found.

UpdateRecipeJob

$result = $client->updateRecipeJob([/* ... */]);
$promise = $client->updateRecipeJobAsync([/* ... */]);

Modifies the definition of an existing DataBrew recipe job.

Parameter Syntax

$result = $client->updateRecipeJob([
    'DataCatalogOutputs' => [
        [
            'CatalogId' => '<string>',
            'DatabaseName' => '<string>', // REQUIRED
            'DatabaseOptions' => [
                'TableName' => '<string>', // REQUIRED
                'TempDirectory' => [
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'Overwrite' => true || false,
            'S3Options' => [
                'Location' => [ // REQUIRED
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'TableName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'DatabaseOutputs' => [
        [
            'DatabaseOptions' => [ // REQUIRED
                'TableName' => '<string>', // REQUIRED
                'TempDirectory' => [
                    'Bucket' => '<string>', // REQUIRED
                    'BucketOwner' => '<string>',
                    'Key' => '<string>',
                ],
            ],
            'DatabaseOutputMode' => 'NEW_TABLE',
            'GlueConnectionName' => '<string>', // REQUIRED
        ],
        // ...
    ],
    'EncryptionKeyArn' => '<string>',
    'EncryptionMode' => 'SSE-KMS|SSE-S3',
    'LogSubscription' => 'ENABLE|DISABLE',
    'MaxCapacity' => <integer>,
    'MaxRetries' => <integer>,
    'Name' => '<string>', // REQUIRED
    'Outputs' => [
        [
            'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB',
            'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER',
            'FormatOptions' => [
                'Csv' => [
                    'Delimiter' => '<string>',
                ],
            ],
            'Location' => [ // REQUIRED
                'Bucket' => '<string>', // REQUIRED
                'BucketOwner' => '<string>',
                'Key' => '<string>',
            ],
            'MaxOutputFiles' => <integer>,
            'Overwrite' => true || false,
            'PartitionColumns' => ['<string>', ...],
        ],
        // ...
    ],
    'RoleArn' => '<string>', // REQUIRED
    'Timeout' => <integer>,
]);

Parameter Details

Members
DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - Server-side encryption with keys managed by KMS.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

LogSubscription
Type: string

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

MaxCapacity
Type: int

The maximum number of nodes that DataBrew can consume when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

The name of the job to update.

Outputs
Type: Array of Output structures

One or more artifacts that represent the output from running the job.

RoleArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the job that you updated.

Errors

AccessDeniedException:

Access to the specified resource was denied.

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateRuleset

$result = $client->updateRuleset([/* ... */]);
$promise = $client->updateRulesetAsync([/* ... */]);

Updates specified ruleset.

Parameter Syntax

$result = $client->updateRuleset([
    'Description' => '<string>',
    'Name' => '<string>', // REQUIRED
    'Rules' => [ // REQUIRED
        [
            'CheckExpression' => '<string>', // REQUIRED
            'ColumnSelectors' => [
                [
                    'Name' => '<string>',
                    'Regex' => '<string>',
                ],
                // ...
            ],
            'Disabled' => true || false,
            'Name' => '<string>', // REQUIRED
            'SubstitutionMap' => ['<string>', ...],
            'Threshold' => [
                'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN',
                'Unit' => 'COUNT|PERCENTAGE',
                'Value' => <float>, // REQUIRED
            ],
        ],
        // ...
    ],
]);

Parameter Details

Members
Description
Type: string

The description of the ruleset.

Name
Required: Yes
Type: string

The name of the ruleset to be updated.

Rules
Required: Yes
Type: Array of Rule structures

A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the updated ruleset.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ValidationException:

The input parameters for this request failed validation.

UpdateSchedule

$result = $client->updateSchedule([/* ... */]);
$promise = $client->updateScheduleAsync([/* ... */]);

Modifies the definition of an existing DataBrew schedule.

Parameter Syntax

$result = $client->updateSchedule([
    'CronExpression' => '<string>', // REQUIRED
    'JobNames' => ['<string>', ...],
    'Name' => '<string>', // REQUIRED
]);

Parameter Details

Members
CronExpression
Required: Yes
Type: string

The date or dates and time or times when the jobs are to be run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.

JobNames
Type: Array of strings

The name or names of one or more jobs to be run for this schedule.

Name
Required: Yes
Type: string

The name of the schedule to update.

Result Syntax

[
    'Name' => '<string>',
]

Result Details

Members
Name
Required: Yes
Type: string

The name of the schedule that was updated.

Errors

ResourceNotFoundException:

One or more resources can't be found.

ServiceQuotaExceededException:

A service quota is exceeded.

ValidationException:

The input parameters for this request failed validation.

Shapes

AccessDeniedException

Description

Access to the specified resource was denied.

Members
Message
Type: string

AllowedStatistics

Description

Configuration of statistics that are allowed to be run on columns that contain detected entities. When undefined, no statistics will be computed on columns that contain detected entities.

Members
Statistics
Required: Yes
Type: Array of strings

One or more column statistics to allow for columns that contain detected entities.

ColumnSelector

Description

Selector of a column from a dataset for profile job configuration. One selector includes either a column name or a regular expression.

Members
Name
Type: string

The name of a column from a dataset.

Regex
Type: string

A regular expression for selecting a column from a dataset.

ColumnStatisticsConfiguration

Description

Configuration for column evaluations for a profile job. ColumnStatisticsConfiguration can be used to select evaluations and override parameters of evaluations for particular columns.

Members
Selectors
Type: Array of ColumnSelector structures

List of column selectors. Selectors can be used to select columns from the dataset. When selectors are undefined, configuration will be applied to all supported columns.

Statistics
Required: Yes
Type: StatisticsConfiguration structure

Configuration for evaluations. Statistics can be used to select evaluations and override parameters of evaluations.

ConditionExpression

Description

Represents an individual condition that evaluates to true or false.

Conditions are used with recipe actions. The action is only performed for column values where the condition evaluates to true.

If a recipe requires more than one condition, then the recipe must specify multiple ConditionExpression elements. Each condition is applied to the rows in a dataset first, before the recipe action is performed.

Members
Condition
Required: Yes
Type: string

A specific condition to apply to a recipe action. For more information, see Recipe structure in the Glue DataBrew Developer Guide.

TargetColumn
Required: Yes
Type: string

A column to apply this condition to.

Value
Type: string

A value that the condition must evaluate to for the condition to succeed.

ConflictException

Description

Updating or deleting a resource can cause an inconsistent state.

Members
Message
Type: string

CsvOptions

Description

Represents a set of options that define how DataBrew will read a comma-separated value (CSV) file when creating a dataset from that file.

Members
Delimiter
Type: string

A single character that specifies the delimiter being used in the CSV file.

HeaderRow
Type: boolean

A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.

CsvOutputOptions

Description

Represents a set of options that define how DataBrew will write a comma-separated value (CSV) file.

Members
Delimiter
Type: string

A single character that specifies the delimiter used to create CSV job output.

DataCatalogInputDefinition

Description

Represents how metadata stored in the Glue Data Catalog is defined in a DataBrew dataset.

Members
CatalogId
Type: string

The unique identifier of the Amazon Web Services account that holds the Data Catalog that stores the data.

DatabaseName
Required: Yes
Type: string

The name of a database in the Data Catalog.

TableName
Required: Yes
Type: string

The name of a database table in the Data Catalog. This table corresponds to a DataBrew dataset.

TempDirectory
Type: S3Location structure

Represents an Amazon location where DataBrew can store intermediate results.

DataCatalogOutput

Description

Represents options that specify how and where in the Glue Data Catalog DataBrew writes the output generated by recipe jobs.

Members
CatalogId
Type: string

The unique identifier of the Amazon Web Services account that holds the Data Catalog that stores the data.

DatabaseName
Required: Yes
Type: string

The name of a database in the Data Catalog.

DatabaseOptions
Type: DatabaseTableOutputOptions structure

Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.

Overwrite
Type: boolean

A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.

S3Options
Type: S3TableOutputOptions structure

Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.

TableName
Required: Yes
Type: string

The name of a table in the Data Catalog.

DatabaseInputDefinition

Description

Connection information for dataset input files stored in a database.

Members
DatabaseTableName
Type: string

The table within the target database.

GlueConnectionName
Required: Yes
Type: string

The Glue Connection that stores the connection information for the target database.

QueryString
Type: string

Custom SQL to run against the provided Glue connection. This SQL will be used as the input for DataBrew projects and jobs.

TempDirectory
Type: S3Location structure

Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.

DatabaseOutput

Description

Represents a JDBC database output object which defines the output destination for a DataBrew recipe job to write into.

Members
DatabaseOptions
Required: Yes
Type: DatabaseTableOutputOptions structure

Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.

DatabaseOutputMode
Type: string

The output mode to write into the database. Currently supported option: NEW_TABLE.

GlueConnectionName
Required: Yes
Type: string

The Glue connection that stores the connection information for the target database.

DatabaseTableOutputOptions

Description

Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.

Members
TableName
Required: Yes
Type: string

A prefix for the name of a table DataBrew will create in the database.

TempDirectory
Type: S3Location structure

Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.

Dataset

Description

Represents a dataset that can be processed by DataBrew.

Members
AccountId
Type: string

The ID of the Amazon Web Services account that owns the dataset.

CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the dataset was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the dataset.

Format
Type: string

The file format of a dataset that is created from an Amazon S3 file or folder.

FormatOptions
Type: FormatOptions structure

A set of options that define how DataBrew interprets the data in the dataset.

Input
Required: Yes
Type: Input structure

Information on how DataBrew can find the dataset, in either the Glue Data Catalog or Amazon S3.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the dataset.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last modification date and time of the dataset.

Name
Required: Yes
Type: string

The unique name of the dataset.

PathOptions
Type: PathOptions structure

A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.

ResourceArn
Type: string

The unique Amazon Resource Name (ARN) for the dataset.

Source
Type: string

The location of the data for the dataset, either Amazon S3 or the Glue Data Catalog.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the dataset.

DatasetParameter

Description

Represents a dataset parameter that defines type and conditions for a parameter in the Amazon S3 path of the dataset.

Members
CreateColumn
Type: boolean

Optional boolean value that defines whether the captured value of this parameter should be used to create a new column in a dataset.

DatetimeOptions
Type: DatetimeOptions structure

Additional parameter options such as a format and a timezone. Required for datetime parameters.

Filter
Type: FilterExpression structure

The optional filter expression structure to apply additional matching criteria to the parameter.

Name
Required: Yes
Type: string

The name of the parameter that is used in the dataset's Amazon S3 path.

Type
Required: Yes
Type: string

The type of the dataset parameter, can be one of a 'String', 'Number' or 'Datetime'.

DatetimeOptions

Description

Represents additional options for correct interpretation of datetime parameters used in the Amazon S3 path of a dataset.

Members
Format
Required: Yes
Type: string

Required option, that defines the datetime format used for a date parameter in the Amazon S3 path. Should use only supported datetime specifiers and separation characters, all literal a-z or A-Z characters should be escaped with single quotes. E.g. "MM.dd.yyyy-'at'-HH:mm".

LocaleCode
Type: string

Optional value for a non-US locale code, needed for correct interpretation of some date formats.

TimezoneOffset
Type: string

Optional value for a timezone offset of the datetime parameter value in the Amazon S3 path. Shouldn't be used if Format for this parameter includes timezone fields. If no offset specified, UTC is assumed.

EntityDetectorConfiguration

Description

Configuration of entity detection for a profile job. When undefined, entity detection is disabled.

Members
AllowedStatistics
Type: Array of AllowedStatistics structures

Configuration of statistics that are allowed to be run on columns that contain detected entities. When undefined, no statistics will be computed on columns that contain detected entities.

EntityTypes
Required: Yes
Type: Array of strings

Entity types to detect. Can be any of the following:

  • USA_SSN

  • EMAIL

  • USA_ITIN

  • USA_PASSPORT_NUMBER

  • PHONE_NUMBER

  • USA_DRIVING_LICENSE

  • BANK_ACCOUNT

  • CREDIT_CARD

  • IP_ADDRESS

  • MAC_ADDRESS

  • USA_DEA_NUMBER

  • USA_HCPCS_CODE

  • USA_NATIONAL_PROVIDER_IDENTIFIER

  • USA_NATIONAL_DRUG_CODE

  • USA_HEALTH_INSURANCE_CLAIM_NUMBER

  • USA_MEDICARE_BENEFICIARY_IDENTIFIER

  • USA_CPT_CODE

  • PERSON_NAME

  • DATE

The Entity type group USA_ALL is also supported, and includes all of the above entity types except PERSON_NAME and DATE.

ExcelOptions

Description

Represents a set of options that define how DataBrew will interpret a Microsoft Excel file when creating a dataset from that file.

Members
HeaderRow
Type: boolean

A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.

SheetIndexes
Type: Array of ints

One or more sheet numbers in the Excel file that will be included in the dataset.

SheetNames
Type: Array of strings

One or more named sheets in the Excel file that will be included in the dataset.

FilesLimit

Description

Represents a limit imposed on number of Amazon S3 files that should be selected for a dataset from a connected Amazon S3 path.

Members
MaxFiles
Required: Yes
Type: int

The number of Amazon S3 files to select.

Order
Type: string

A criteria to use for Amazon S3 files sorting before their selection. By default uses DESCENDING order, i.e. most recent files are selected first. Another possible value is ASCENDING.

OrderedBy
Type: string

A criteria to use for Amazon S3 files sorting before their selection. By default uses LAST_MODIFIED_DATE as a sorting criteria. Currently it's the only allowed value.

FilterExpression

Description

Represents a structure for defining parameter conditions. Supported conditions are described here: Supported conditions for dynamic datasets in the Glue DataBrew Developer Guide.

Members
Expression
Required: Yes
Type: string

The expression which includes condition names followed by substitution variables, possibly grouped and combined with other conditions. For example, "(starts_with :prefix1 or starts_with :prefix2) and (ends_with :suffix1 or ends_with :suffix2)". Substitution variables should start with ':' symbol.

ValuesMap
Required: Yes
Type: Associative array of custom strings keys (ValueReference) to strings

The map of substitution variable names to their values used in this filter expression.

FormatOptions

Description

Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.

Members
Csv
Type: CsvOptions structure

Options that define how CSV input is to be interpreted by DataBrew.

Excel
Type: ExcelOptions structure

Options that define how Excel input is to be interpreted by DataBrew.

Json
Type: JsonOptions structure

Options that define how JSON input is to be interpreted by DataBrew.

Input

Description

Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.

Members
DataCatalogInputDefinition
Type: DataCatalogInputDefinition structure

The Glue Data Catalog parameters for the data.

DatabaseInputDefinition
Type: DatabaseInputDefinition structure

Connection information for dataset input files stored in a database.

Metadata
Type: Metadata structure

Contains additional resource information needed for specific datasets.

S3InputDefinition
Type: S3Location structure

The Amazon S3 location where the data is stored.

InternalServerException

Description

An internal service failure occurred.

Members
Message
Type: string

Job

Description

Represents all of the attributes of a DataBrew job.

Members
AccountId
Type: string

The ID of the Amazon Web Services account that owns the job.

CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the job was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the job.

DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

DatasetName
Type: string

A dataset that the job is to process.

EncryptionKeyArn
Type: string

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output. For more information, see Encrypting data written by DataBrew jobs

EncryptionMode
Type: string

The encryption mode for the job, which can be one of the following:

  • SSE-KMS - Server-side encryption with keys managed by KMS.

  • SSE-S3 - Server-side encryption with keys managed by Amazon S3.

JobSample
Type: JobSample structure

A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default value is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the job.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The modification date and time of the job.

LogSubscription
Type: string

The current status of Amazon CloudWatch logging for the job.

MaxCapacity
Type: int

The maximum number of nodes that can be consumed when the job processes data.

MaxRetries
Type: int

The maximum number of times to retry the job after a job run fails.

Name
Required: Yes
Type: string

The unique name of the job.

Outputs
Type: Array of Output structures

One or more artifacts that represent output from running the job.

ProjectName
Type: string

The name of the project that the job is associated with.

RecipeReference
Type: RecipeReference structure

A set of steps that the job runs.

ResourceArn
Type: string

The unique Amazon Resource Name (ARN) for the job.

RoleArn
Type: string

The Amazon Resource Name (ARN) of the role to be assumed for this job.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the job.

Timeout
Type: int

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

Type
Type: string

The job type of the job, which must be one of the following:

  • PROFILE - A job to analyze a dataset, to determine its size, data types, data distribution, and more.

  • RECIPE - A job to apply one or more transformations to a dataset.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job.

JobRun

Description

Represents one run of a DataBrew job.

Members
Attempt
Type: int

The number of times that DataBrew has attempted to run the job.

CompletedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the job completed processing.

DataCatalogOutputs
Type: Array of DataCatalogOutput structures

One or more artifacts that represent the Glue Data Catalog output from running the job.

DatabaseOutputs
Type: Array of DatabaseOutput structures

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

DatasetName
Type: string

The name of the dataset for the job to process.

ErrorMessage
Type: string

A message indicating an error (if any) that was encountered when the job ran.

ExecutionTime
Type: int

The amount of time, in seconds, during which a job run consumed resources.

JobName
Type: string

The name of the job being processed during this run.

JobSample
Type: JobSample structure

A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.

LogGroupName
Type: string

The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.

LogSubscription
Type: string

The current status of Amazon CloudWatch logging for the job run.

Outputs
Type: Array of Output structures

One or more output artifacts from a job run.

RecipeReference
Type: RecipeReference structure

The set of steps processed by the job.

RunId
Type: string

The unique identifier of the job run.

StartedBy
Type: string

The Amazon Resource Name (ARN) of the user who initiated the job run.

StartedOn
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the job run began.

State
Type: string

The current state of the job run entity itself.

ValidationConfigurations
Type: Array of ValidationConfiguration structures

List of validation configurations that are applied to the profile job run.

JobSample

Description

A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.

Members
Mode
Type: string

A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:

  • FULL_DATASET - The profile job is run on the entire dataset.

  • CUSTOM_ROWS - The profile job is run on the number of rows specified in the Size parameter.

Size
Type: long (int|float)

The Size parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.

Long.MAX_VALUE = 9223372036854775807

JsonOptions

Description

Represents the JSON-specific options that define how input is to be interpreted by Glue DataBrew.

Members
MultiLine
Type: boolean

A value that specifies whether JSON input contains embedded new line characters.

Metadata

Description

Contains additional resource information needed for specific datasets.

Members
SourceArn
Type: string

The Amazon Resource Name (ARN) associated with the dataset. Currently, DataBrew only supports ARNs from Amazon AppFlow.

Output

Description

Represents options that specify how and where in Amazon S3 DataBrew writes the output generated by recipe jobs or profile jobs.

Members
CompressionFormat
Type: string

The compression algorithm used to compress the output text of the job.

Format
Type: string

The data format of the output of the job.

FormatOptions
Type: OutputFormatOptions structure

Represents options that define how DataBrew formats job output files.

Location
Required: Yes
Type: S3Location structure

The location in Amazon S3 where the job writes its output.

MaxOutputFiles
Type: int

Maximum number of files to be generated by the job and written to the output folder. For output partitioned by column(s), the MaxOutputFiles value is the maximum number of files per partition.

Overwrite
Type: boolean

A value that, if true, means that any data in the location specified for output is overwritten with new output.

PartitionColumns
Type: Array of strings

The names of one or more partition columns for the output of the job.

OutputFormatOptions

Description

Represents a set of options that define the structure of comma-separated (CSV) job output.

Members
Csv
Type: CsvOutputOptions structure

Represents a set of options that define the structure of comma-separated value (CSV) job output.

PathOptions

Description

Represents a set of options that define how DataBrew selects files for a given Amazon S3 path in a dataset.

Members
FilesLimit
Type: FilesLimit structure

If provided, this structure imposes a limit on a number of files that should be selected.

LastModifiedDateCondition
Type: FilterExpression structure

If provided, this structure defines a date range for matching Amazon S3 objects based on their LastModifiedDate attribute in Amazon S3.

Parameters
Type: Associative array of custom strings keys (PathParameterName) to DatasetParameter structures

A structure that maps names of parameters used in the Amazon S3 path of a dataset to their definitions.

ProfileConfiguration

Description

Configuration for profile jobs. Configuration can be used to select columns, do evaluations, and override default parameters of evaluations. When configuration is undefined, the profile job will apply default settings to all supported columns.

Members
ColumnStatisticsConfigurations
Type: Array of ColumnStatisticsConfiguration structures

List of configurations for column evaluations. ColumnStatisticsConfigurations are used to select evaluations and override parameters of evaluations for particular columns. When ColumnStatisticsConfigurations is undefined, the profile job will profile all supported columns and run all supported evaluations.

DatasetStatisticsConfiguration
Type: StatisticsConfiguration structure

Configuration for inter-column evaluations. Configuration can be used to select evaluations and override parameters of evaluations. When configuration is undefined, the profile job will run all supported inter-column evaluations.

EntityDetectorConfiguration
Type: EntityDetectorConfiguration structure

Configuration of entity detection for a profile job. When undefined, entity detection is disabled.

ProfileColumns
Type: Array of ColumnSelector structures

List of column selectors. ProfileColumns can be used to select columns from the dataset. When ProfileColumns is undefined, the profile job will profile all supported columns.

Project

Description

Represents all of the attributes of a DataBrew project.

Members
AccountId
Type: string

The ID of the Amazon Web Services account that owns the project.

CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the project was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who crated the project.

DatasetName
Type: string

The dataset that the project is to act upon.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the project.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last modification date and time for the project.

Name
Required: Yes
Type: string

The unique name of a project.

OpenDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the project was opened.

OpenedBy
Type: string

The Amazon Resource Name (ARN) of the user that opened the project for use.

RecipeName
Required: Yes
Type: string

The name of a recipe that will be developed during a project session.

ResourceArn
Type: string

The Amazon Resource Name (ARN) for the project.

RoleArn
Type: string

The Amazon Resource Name (ARN) of the role that will be assumed for this project.

Sample
Type: Sample structure

The sample size and sampling type to apply to the data. If this parameter isn't specified, then the sample consists of the first 500 rows from the dataset.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the project.

Recipe

Description

Represents one or more actions to be performed on a DataBrew dataset.

Members
CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the recipe was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the recipe.

Description
Type: string

The description of the recipe.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the recipe.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The last modification date and time of the recipe.

Name
Required: Yes
Type: string

The unique name for the recipe.

ProjectName
Type: string

The name of the project that the recipe is associated with.

PublishedBy
Type: string

The Amazon Resource Name (ARN) of the user who published the recipe.

PublishedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the recipe was published.

RecipeVersion
Type: string

The identifier for the version for the recipe. Must be one of the following:

  • Numeric version (X.Y) - X and Y stand for major and minor version numbers. The maximum length of each is 6 digits, and neither can be negative values. Both X and Y are required, and "0.0" isn't a valid version.

  • LATEST_WORKING - the most recent valid version being developed in a DataBrew project.

  • LATEST_PUBLISHED - the most recent published version.

ResourceArn
Type: string

The Amazon Resource Name (ARN) for the recipe.

Steps
Type: Array of RecipeStep structures

A list of steps that are defined by the recipe.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the recipe.

RecipeAction

Description

Represents a transformation and associated parameters that are used to apply a change to a DataBrew dataset. For more information, see Recipe actions reference.

Members
Operation
Required: Yes
Type: string

The name of a valid DataBrew transformation to be performed on the data.

Parameters
Type: Associative array of custom strings keys (ParameterName) to strings

Contextual parameters for the transformation.

RecipeReference

Description

Represents the name and version of a DataBrew recipe.

Members
Name
Required: Yes
Type: string

The name of the recipe.

RecipeVersion
Type: string

The identifier for the version for the recipe.

RecipeStep

Description

Represents a single step from a DataBrew recipe to be performed.

Members
Action
Required: Yes
Type: RecipeAction structure

The particular action to be performed in the recipe step.

ConditionExpressions
Type: Array of ConditionExpression structures

One or more conditions that must be met for the recipe step to succeed.

All of the conditions in the array must be met. In other words, all of the conditions must be combined using a logical AND operation.

RecipeVersionErrorDetail

Description

Represents any errors encountered when attempting to delete multiple recipe versions.

Members
ErrorCode
Type: string

The HTTP status code for the error.

ErrorMessage
Type: string

The text of the error message.

RecipeVersion
Type: string

The identifier for the recipe version associated with this error.

ResourceNotFoundException

Description

One or more resources can't be found.

Members
Message
Type: string

Rule

Description

Represents a single data quality requirement that should be validated in the scope of this dataset.

Members
CheckExpression
Required: Yes
Type: string

The expression which includes column references, condition names followed by variable references, possibly grouped and combined with other conditions. For example, (:col1 starts_with :prefix1 or :col1 starts_with :prefix2) and (:col1 ends_with :suffix1 or :col1 ends_with :suffix2). Column and value references are substitution variables that should start with the ':' symbol. Depending on the context, substitution variables' values can be either an actual value or a column name. These values are defined in the SubstitutionMap. If a CheckExpression starts with a column reference, then ColumnSelectors in the rule should be null. If ColumnSelectors has been defined, then there should be no column reference in the left side of a condition, for example, is_between :val1 and :val2.

For more information, see Available checks

ColumnSelectors
Type: Array of ColumnSelector structures

List of column selectors. Selectors can be used to select columns using a name or regular expression from the dataset. Rule will be applied to selected columns.

Disabled
Type: boolean

A value that specifies whether the rule is disabled. Once a rule is disabled, a profile job will not validate it during a job run. Default value is false.

Name
Required: Yes
Type: string

The name of the rule.

SubstitutionMap
Type: Associative array of custom strings keys (ValueReference) to strings

The map of substitution variable names to their values used in a check expression. Variable names should start with a ':' (colon). Variable values can either be actual values or column names. To differentiate between the two, column names should be enclosed in backticks, for example, ":col1": "`Column A`".

Threshold
Type: Threshold structure

The threshold used with a non-aggregate check expression. Non-aggregate check expressions will be applied to each row in a specific column, and the threshold will be used to determine whether the validation succeeds.

RulesetItem

Description

Contains metadata about the ruleset.

Members
AccountId
Type: string

The ID of the Amazon Web Services account that owns the ruleset.

CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the ruleset was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the ruleset.

Description
Type: string

The description of the ruleset.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the ruleset.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The modification date and time of the ruleset.

Name
Required: Yes
Type: string

The name of the ruleset.

ResourceArn
Type: string

The Amazon Resource Name (ARN) for the ruleset.

RuleCount
Type: int

The number of rules that are defined in the ruleset.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the ruleset.

TargetArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.

S3Location

Description

Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.

Members
Bucket
Required: Yes
Type: string

The Amazon S3 bucket name.

BucketOwner
Type: string

The Amazon Web Services account ID of the bucket owner.

Key
Type: string

The unique name of the object in the bucket.

S3TableOutputOptions

Description

Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.

Members
Location
Required: Yes
Type: S3Location structure

Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.

Sample

Description

Represents the sample size and sampling type for DataBrew to use for interactive data analysis.

Members
Size
Type: int

The number of rows in the sample.

Type
Required: Yes
Type: string

The way in which DataBrew obtains rows from a dataset.

Schedule

Description

Represents one or more dates and times when a job is to run.

Members
AccountId
Type: string

The ID of the Amazon Web Services account that owns the schedule.

CreateDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time that the schedule was created.

CreatedBy
Type: string

The Amazon Resource Name (ARN) of the user who created the schedule.

CronExpression
Type: string

The dates and times when the job is to run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.

JobNames
Type: Array of strings

A list of jobs to be run, according to the schedule.

LastModifiedBy
Type: string

The Amazon Resource Name (ARN) of the user who last modified the schedule.

LastModifiedDate
Type: timestamp (string|DateTime or anything parsable by strtotime)

The date and time when the schedule was last modified.

Name
Required: Yes
Type: string

The name of the schedule.

ResourceArn
Type: string

The Amazon Resource Name (ARN) of the schedule.

Tags
Type: Associative array of custom strings keys (TagKey) to strings

Metadata tags that have been applied to the schedule.

ServiceQuotaExceededException

Description

A service quota is exceeded.

Members
Message
Type: string

StatisticOverride

Description

Override of a particular evaluation for a profile job.

Members
Parameters
Required: Yes
Type: Associative array of custom strings keys (ParameterName) to strings

A map that includes overrides of an evaluation’s parameters.

Statistic
Required: Yes
Type: string

The name of an evaluation

StatisticsConfiguration

Description

Configuration of evaluations for a profile job. This configuration can be used to select evaluations and override the parameters of selected evaluations.

Members
IncludedStatistics
Type: Array of strings

List of included evaluations. When the list is undefined, all supported evaluations will be included.

Overrides
Type: Array of StatisticOverride structures

List of overrides for evaluations.

Threshold

Description

The threshold used with a non-aggregate check expression. The non-aggregate check expression will be applied to each row in a specific column. Then the threshold will be used to determine whether the validation succeeds.

Members
Type
Type: string

The type of a threshold. Used for comparison of an actual count of rows that satisfy the rule to the threshold value.

Unit
Type: string

Unit of threshold value. Can be either a COUNT or PERCENTAGE of the full sample size used for validation.

Value
Required: Yes
Type: double

The value of a threshold.

ValidationConfiguration

Description

Configuration for data quality validation. Used to select the Rulesets and Validation Mode to be used in the profile job. When ValidationConfiguration is null, the profile job will run without data quality validation.

Members
RulesetArn
Required: Yes
Type: string

The Amazon Resource Name (ARN) for the ruleset to be validated in the profile job. The TargetArn of the selected ruleset should be the same as the Amazon Resource Name (ARN) of the dataset that is associated with the profile job.

ValidationMode
Type: string

Mode of data quality validation. Default mode is “CHECK_ALL” which verifies all rules defined in the selected ruleset.

ValidationException

Description

The input parameters for this request failed validation.

Members
Message
Type: string

ViewFrame

Description

Represents the data being transformed during an action.

Members
Analytics
Type: string

Controls if analytics computation is enabled or disabled. Enabled by default.

ColumnRange
Type: int

The number of columns to include in the view frame, beginning with the StartColumnIndex value and ignoring any columns in the HiddenColumns list.

HiddenColumns
Type: Array of strings

A list of columns to hide in the view frame.

RowRange
Type: int

The number of rows to include in the view frame, beginning with the StartRowIndex value.

StartColumnIndex
Required: Yes
Type: int

The starting index for the range of columns to return in the view frame.

StartRowIndex
Type: int

The starting index for the range of rows to return in the view frame.