AWS Glue DataBrew 2017-07-25
- Client: Aws\GlueDataBrew\GlueDataBrewClient
- Service ID: databrew
- Version: 2017-07-25
This page describes the parameters and results for the operations of the AWS Glue DataBrew (2017-07-25), and shows how to use the Aws\GlueDataBrew\GlueDataBrewClient object to call the described operations. This documentation is specific to the 2017-07-25 API version of the service.
Operation Summary
Each of the following operations can be created from a client using
$client->getCommand('CommandName')
, where "CommandName" is the
name of one of the following operations. Note: a command is a value that
encapsulates an operation and the parameters used to create an HTTP request.
You can also create and send a command immediately using the magic methods
available on a client object: $client->commandName(/* parameters */)
.
You can send the command asynchronously (returning a promise) by appending the
word "Async" to the operation name: $client->commandNameAsync(/* parameters */)
.
- BatchDeleteRecipeVersion ( array $params = [] )
- Deletes one or more versions of a recipe at a time.
- CreateDataset ( array $params = [] )
- Creates a new DataBrew dataset.
- CreateProfileJob ( array $params = [] )
- Creates a new job to analyze a dataset and create its data profile.
- CreateProject ( array $params = [] )
- Creates a new DataBrew project.
- CreateRecipe ( array $params = [] )
- Creates a new DataBrew recipe.
- CreateRecipeJob ( array $params = [] )
- Creates a new job to transform input data, using steps defined in an existing Glue DataBrew recipe
- CreateRuleset ( array $params = [] )
- Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.
- CreateSchedule ( array $params = [] )
- Creates a new schedule for one or more DataBrew jobs.
- DeleteDataset ( array $params = [] )
- Deletes a dataset from DataBrew.
- DeleteJob ( array $params = [] )
- Deletes the specified DataBrew job.
- DeleteProject ( array $params = [] )
- Deletes an existing DataBrew project.
- DeleteRecipeVersion ( array $params = [] )
- Deletes a single version of a DataBrew recipe.
- DeleteRuleset ( array $params = [] )
- Deletes a ruleset.
- DeleteSchedule ( array $params = [] )
- Deletes the specified DataBrew schedule.
- DescribeDataset ( array $params = [] )
- Returns the definition of a specific DataBrew dataset.
- DescribeJob ( array $params = [] )
- Returns the definition of a specific DataBrew job.
- DescribeJobRun ( array $params = [] )
- Represents one run of a DataBrew job.
- DescribeProject ( array $params = [] )
- Returns the definition of a specific DataBrew project.
- DescribeRecipe ( array $params = [] )
- Returns the definition of a specific DataBrew recipe corresponding to a particular version.
- DescribeRuleset ( array $params = [] )
- Retrieves detailed information about the ruleset.
- DescribeSchedule ( array $params = [] )
- Returns the definition of a specific DataBrew schedule.
- ListDatasets ( array $params = [] )
- Lists all of the DataBrew datasets.
- ListJobRuns ( array $params = [] )
- Lists all of the previous runs of a particular DataBrew job.
- ListJobs ( array $params = [] )
- Lists all of the DataBrew jobs that are defined.
- ListProjects ( array $params = [] )
- Lists all of the DataBrew projects that are defined.
- ListRecipeVersions ( array $params = [] )
- Lists the versions of a particular DataBrew recipe, except for LATEST_WORKING.
- ListRecipes ( array $params = [] )
- Lists all of the DataBrew recipes that are defined.
- ListRulesets ( array $params = [] )
- List all rulesets available in the current account or rulesets associated with a specific resource (dataset).
- ListSchedules ( array $params = [] )
- Lists the DataBrew schedules that are defined.
- ListTagsForResource ( array $params = [] )
- Lists all the tags for a DataBrew resource.
- PublishRecipe ( array $params = [] )
- Publishes a new version of a DataBrew recipe.
- SendProjectSessionAction ( array $params = [] )
- Performs a recipe step within an interactive DataBrew session that's currently open.
- StartJobRun ( array $params = [] )
- Runs a DataBrew job.
- StartProjectSession ( array $params = [] )
- Creates an interactive session, enabling you to manipulate data in a DataBrew project.
- StopJobRun ( array $params = [] )
- Stops a particular run of a job.
- TagResource ( array $params = [] )
- Adds metadata tags to a DataBrew resource, such as a dataset, project, recipe, job, or schedule.
- UntagResource ( array $params = [] )
- Removes metadata tags from a DataBrew resource.
- UpdateDataset ( array $params = [] )
- Modifies the definition of an existing DataBrew dataset.
- UpdateProfileJob ( array $params = [] )
- Modifies the definition of an existing profile job.
- UpdateProject ( array $params = [] )
- Modifies the definition of an existing DataBrew project.
- UpdateRecipe ( array $params = [] )
- Modifies the definition of the LATEST_WORKING version of a DataBrew recipe.
- UpdateRecipeJob ( array $params = [] )
- Modifies the definition of an existing DataBrew recipe job.
- UpdateRuleset ( array $params = [] )
- Updates specified ruleset.
- UpdateSchedule ( array $params = [] )
- Modifies the definition of an existing DataBrew schedule.
Paginators
Paginators handle automatically iterating over paginated API results. Paginators are associated with specific API operations, and they accept the parameters that the corresponding API operation accepts. You can get a paginator from a client class using getPaginator($paginatorName, $operationParameters). This client supports the following paginators:
- ListDatasets
- ListJobRuns
- ListJobs
- ListProjects
- ListRecipeVersions
- ListRecipes
- ListRulesets
- ListSchedules
Operations
BatchDeleteRecipeVersion
$result = $client->batchDeleteRecipeVersion
([/* ... */]); $promise = $client->batchDeleteRecipeVersionAsync
([/* ... */]);
Deletes one or more versions of a recipe at a time.
The entire request will be rejected if:
-
The recipe does not exist.
-
There is an invalid version identifier in the list of versions.
-
The version list is empty.
-
The version list size exceeds 50.
-
The version list contains duplicate entries.
The request will complete successfully, but with partial failures, if:
-
A version does not exist.
-
A version is being used by a job.
-
You specify
LATEST_WORKING
, but it's being used by a project. -
The version fails to be deleted.
The LATEST_WORKING
version will only be deleted if the recipe has no other versions. If you try to delete LATEST_WORKING
while other versions exist (or if they can't be deleted), then LATEST_WORKING
will be listed as partial failure in the response.
Parameter Syntax
$result = $client->batchDeleteRecipeVersion([ 'Name' => '<string>', // REQUIRED 'RecipeVersions' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe whose versions are to be deleted.
- RecipeVersions
-
- Required: Yes
- Type: Array of strings
An array of version identifiers, for the recipe versions to be deleted. You can specify numeric versions (
X.Y
) orLATEST_WORKING
.LATEST_PUBLISHED
is not supported.
Result Syntax
[ 'Errors' => [ [ 'ErrorCode' => '<string>', 'ErrorMessage' => '<string>', 'RecipeVersion' => '<string>', ], // ... ], 'Name' => '<string>', ]
Result Details
Members
- Errors
-
- Type: Array of RecipeVersionErrorDetail structures
Errors, if any, that occurred while attempting to delete the recipe versions.
- Name
-
- Required: Yes
- Type: string
The name of the recipe that was modified.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
CreateDataset
$result = $client->createDataset
([/* ... */]); $promise = $client->createDatasetAsync
([/* ... */]);
Creates a new DataBrew dataset.
Parameter Syntax
$result = $client->createDataset([ 'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', 'HeaderRow' => true || false, ], 'Excel' => [ 'HeaderRow' => true || false, 'SheetIndexes' => [<integer>, ...], 'SheetNames' => ['<string>', ...], ], 'Json' => [ 'MultiLine' => true || false, ], ], 'Input' => [ // REQUIRED 'DataCatalogInputDefinition' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseInputDefinition' => [ 'DatabaseTableName' => '<string>', 'GlueConnectionName' => '<string>', // REQUIRED 'QueryString' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Metadata' => [ 'SourceArn' => '<string>', ], 'S3InputDefinition' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Name' => '<string>', // REQUIRED 'PathOptions' => [ 'FilesLimit' => [ 'MaxFiles' => <integer>, // REQUIRED 'Order' => 'DESCENDING|ASCENDING', 'OrderedBy' => 'LAST_MODIFIED_DATE', ], 'LastModifiedDateCondition' => [ 'Expression' => '<string>', // REQUIRED 'ValuesMap' => ['<string>', ...], // REQUIRED ], 'Parameters' => [ '<PathParameterName>' => [ 'CreateColumn' => true || false, 'DatetimeOptions' => [ 'Format' => '<string>', // REQUIRED 'LocaleCode' => '<string>', 'TimezoneOffset' => '<string>', ], 'Filter' => [ 'Expression' => '<string>', // REQUIRED 'ValuesMap' => ['<string>', ...], // REQUIRED ], 'Name' => '<string>', // REQUIRED 'Type' => 'Datetime|Number|String', // REQUIRED ], // ... ], ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Format
-
- Type: string
The file format of a dataset that is created from an Amazon S3 file or folder.
- FormatOptions
-
- Type: FormatOptions structure
Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.
- Input
-
- Required: Yes
- Type: Input structure
Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.
- Name
-
- Required: Yes
- Type: string
The name of the dataset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- PathOptions
-
- Type: PathOptions structure
A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this dataset.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the dataset that you created.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateProfileJob
$result = $client->createProfileJob
([/* ... */]); $promise = $client->createProfileJobAsync
([/* ... */]);
Creates a new job to analyze a dataset and create its data profile.
Parameter Syntax
$result = $client->createProfileJob([ 'Configuration' => [ 'ColumnStatisticsConfigurations' => [ [ 'Selectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Statistics' => [ // REQUIRED 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], // REQUIRED 'Statistic' => '<string>', // REQUIRED ], // ... ], ], ], // ... ], 'DatasetStatisticsConfiguration' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], // REQUIRED 'Statistic' => '<string>', // REQUIRED ], // ... ], ], 'EntityDetectorConfiguration' => [ 'AllowedStatistics' => [ [ 'Statistics' => ['<string>', ...], // REQUIRED ], // ... ], 'EntityTypes' => ['<string>', ...], // REQUIRED ], 'ProfileColumns' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], ], 'DatasetName' => '<string>', // REQUIRED 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'OutputLocation' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'RoleArn' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', // REQUIRED 'ValidationMode' => 'CHECK_ALL', ], // ... ], ]);
Parameter Details
Members
- Configuration
-
- Type: ProfileConfiguration structure
Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.
- DatasetName
-
- Required: Yes
- Type: string
The name of the dataset that this job is to act upon.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
-SSE-KMS
- Server-side encryption with KMS-managed keys. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- JobSample
-
- Type: JobSample structure
Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.
- LogSubscription
-
- Type: string
Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.
- MaxCapacity
-
- Type: int
The maximum number of nodes that DataBrew can use when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
The name of the job to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- OutputLocation
-
- Required: Yes
- Type: S3Location structure
Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
. - ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job that was created.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateProject
$result = $client->createProject
([/* ... */]); $promise = $client->createProjectAsync
([/* ... */]);
Creates a new DataBrew project.
Parameter Syntax
$result = $client->createProject([ 'DatasetName' => '<string>', // REQUIRED 'Name' => '<string>', // REQUIRED 'RecipeName' => '<string>', // REQUIRED 'RoleArn' => '<string>', // REQUIRED 'Sample' => [ 'Size' => <integer>, 'Type' => 'FIRST_N|LAST_N|RANDOM', // REQUIRED ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- DatasetName
-
- Required: Yes
- Type: string
The name of an existing dataset to associate this project with.
- Name
-
- Required: Yes
- Type: string
A unique name for the new project. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- RecipeName
-
- Required: Yes
- Type: string
The name of an existing recipe to associate with the project.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed for this request.
- Sample
-
- Type: Sample structure
Represents the sample size and sampling type for DataBrew to use for interactive data analysis.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this project.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the project that you created.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- InternalServerException:
An internal service failure occurred.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateRecipe
$result = $client->createRecipe
([/* ... */]); $promise = $client->createRecipeAsync
([/* ... */]);
Creates a new DataBrew recipe.
Parameter Syntax
$result = $client->createRecipe([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Steps' => [ // REQUIRED [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], // ... ], 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- Description
-
- Type: string
A description for the recipe.
- Name
-
- Required: Yes
- Type: string
A unique name for the recipe. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- Steps
-
- Required: Yes
- Type: Array of RecipeStep structures
An array containing the steps to be performed by the recipe. Each recipe step consists of one recipe action and (optionally) an array of condition expressions.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this recipe.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe that you created.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateRecipeJob
$result = $client->createRecipeJob
([/* ... */]); $promise = $client->createRecipeJobAsync
([/* ... */]);
Creates a new job to transform input data, using steps defined in an existing Glue DataBrew recipe
Parameter Syntax
$result = $client->createRecipeJob([ 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'DatabaseOptions' => [ 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', // REQUIRED ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ // REQUIRED 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', // REQUIRED ], // ... ], 'DatasetName' => '<string>', 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'ProjectName' => '<string>', 'RecipeReference' => [ 'Name' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', ], 'RoleArn' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], 'Timeout' => <integer>, ]);
Parameter Details
Members
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write to.
- DatasetName
-
- Type: string
The name of the dataset that this job processes.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
- Server-side encryption with keys managed by KMS. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- LogSubscription
-
- Type: string
Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.
- MaxCapacity
-
- Type: int
The maximum number of nodes that DataBrew can consume when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
A unique name for the job. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- Outputs
-
- Type: Array of Output structures
One or more artifacts that represent the output from running the job.
- ProjectName
-
- Type: string
Either the name of an existing project, or a combination of a recipe and a dataset to associate with the recipe.
- RecipeReference
-
- Type: RecipeReference structure
Represents the name and version of a DataBrew recipe.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job that you created.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateRuleset
$result = $client->createRuleset
([/* ... */]); $promise = $client->createRulesetAsync
([/* ... */]);
Creates a new ruleset that can be used in a profile job to validate the data quality of a dataset.
Parameter Syntax
$result = $client->createRuleset([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Rules' => [ // REQUIRED [ 'CheckExpression' => '<string>', // REQUIRED 'ColumnSelectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Disabled' => true || false, 'Name' => '<string>', // REQUIRED 'SubstitutionMap' => ['<string>', ...], 'Threshold' => [ 'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN', 'Unit' => 'COUNT|PERCENTAGE', 'Value' => <float>, // REQUIRED ], ], // ... ], 'Tags' => ['<string>', ...], 'TargetArn' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Description
-
- Type: string
The description of the ruleset.
- Name
-
- Required: Yes
- Type: string
The name of the ruleset to be created. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- Rules
-
- Required: Yes
- Type: Array of Rule structures
A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to the ruleset.
- TargetArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The unique name of the created ruleset.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
CreateSchedule
$result = $client->createSchedule
([/* ... */]); $promise = $client->createScheduleAsync
([/* ... */]);
Creates a new schedule for one or more DataBrew jobs. Jobs can be run at a specific date and time, or at regular intervals.
Parameter Syntax
$result = $client->createSchedule([ 'CronExpression' => '<string>', // REQUIRED 'JobNames' => ['<string>', ...], 'Name' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], ]);
Parameter Details
Members
- CronExpression
-
- Required: Yes
- Type: string
The date or dates and time or times when the jobs are to be run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.
- JobNames
-
- Type: Array of strings
The name or names of one or more jobs to be run.
- Name
-
- Required: Yes
- Type: string
A unique name for the schedule. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags to apply to this schedule.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the schedule that was created.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
DeleteDataset
$result = $client->deleteDataset
([/* ... */]); $promise = $client->deleteDatasetAsync
([/* ... */]);
Deletes a dataset from DataBrew.
Parameter Syntax
$result = $client->deleteDataset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the dataset to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the dataset that you deleted.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DeleteJob
$result = $client->deleteJob
([/* ... */]); $promise = $client->deleteJobAsync
([/* ... */]);
Deletes the specified DataBrew job.
Parameter Syntax
$result = $client->deleteJob([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job that you deleted.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DeleteProject
$result = $client->deleteProject
([/* ... */]); $promise = $client->deleteProjectAsync
([/* ... */]);
Deletes an existing DataBrew project.
Parameter Syntax
$result = $client->deleteProject([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the project to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the project that you deleted.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DeleteRecipeVersion
$result = $client->deleteRecipeVersion
([/* ... */]); $promise = $client->deleteRecipeVersionAsync
([/* ... */]);
Deletes a single version of a DataBrew recipe.
Parameter Syntax
$result = $client->deleteRecipeVersion([ 'Name' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe.
- RecipeVersion
-
- Required: Yes
- Type: string
The version of the recipe to be deleted. You can specify a numeric versions (
X.Y
) orLATEST_WORKING
.LATEST_PUBLISHED
is not supported.
Result Syntax
[ 'Name' => '<string>', 'RecipeVersion' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe that was deleted.
- RecipeVersion
-
- Required: Yes
- Type: string
The version of the recipe that was deleted.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DeleteRuleset
$result = $client->deleteRuleset
([/* ... */]); $promise = $client->deleteRulesetAsync
([/* ... */]);
Deletes a ruleset.
Parameter Syntax
$result = $client->deleteRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the ruleset to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the deleted ruleset.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
DeleteSchedule
$result = $client->deleteSchedule
([/* ... */]); $promise = $client->deleteScheduleAsync
([/* ... */]);
Deletes the specified DataBrew schedule.
Parameter Syntax
$result = $client->deleteSchedule([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the schedule to be deleted.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the schedule that was deleted.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeDataset
$result = $client->describeDataset
([/* ... */]); $promise = $client->describeDatasetAsync
([/* ... */]);
Returns the definition of a specific DataBrew dataset.
Parameter Syntax
$result = $client->describeDataset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the dataset to be described.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', 'HeaderRow' => true || false, ], 'Excel' => [ 'HeaderRow' => true || false, 'SheetIndexes' => [<integer>, ...], 'SheetNames' => ['<string>', ...], ], 'Json' => [ 'MultiLine' => true || false, ], ], 'Input' => [ 'DataCatalogInputDefinition' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseInputDefinition' => [ 'DatabaseTableName' => '<string>', 'GlueConnectionName' => '<string>', 'QueryString' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Metadata' => [ 'SourceArn' => '<string>', ], 'S3InputDefinition' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'PathOptions' => [ 'FilesLimit' => [ 'MaxFiles' => <integer>, 'Order' => 'DESCENDING|ASCENDING', 'OrderedBy' => 'LAST_MODIFIED_DATE', ], 'LastModifiedDateCondition' => [ 'Expression' => '<string>', 'ValuesMap' => ['<string>', ...], ], 'Parameters' => [ '<PathParameterName>' => [ 'CreateColumn' => true || false, 'DatetimeOptions' => [ 'Format' => '<string>', 'LocaleCode' => '<string>', 'TimezoneOffset' => '<string>', ], 'Filter' => [ 'Expression' => '<string>', 'ValuesMap' => ['<string>', ...], ], 'Name' => '<string>', 'Type' => 'Datetime|Number|String', ], // ... ], ], 'ResourceArn' => '<string>', 'Source' => 'S3|DATA-CATALOG|DATABASE', 'Tags' => ['<string>', ...], ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the dataset was created.
- CreatedBy
-
- Type: string
The identifier (user name) of the user who created the dataset.
- Format
-
- Type: string
The file format of a dataset that is created from an Amazon S3 file or folder.
- FormatOptions
-
- Type: FormatOptions structure
Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.
- Input
-
- Required: Yes
- Type: Input structure
Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.
- LastModifiedBy
-
- Type: string
The identifier (user name) of the user who last modified the dataset.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the dataset was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the dataset.
- PathOptions
-
- Type: PathOptions structure
A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) of the dataset.
- Source
-
- Type: string
The location of the data for this dataset, Amazon S3 or the Glue Data Catalog.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags associated with this dataset.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeJob
$result = $client->describeJob
([/* ... */]); $promise = $client->describeJobAsync
([/* ... */]);
Returns the definition of a specific DataBrew job.
Parameter Syntax
$result = $client->describeJob([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job to be described.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', ], // ... ], 'DatasetName' => '<string>', 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'ProfileConfiguration' => [ 'ColumnStatisticsConfigurations' => [ [ 'Selectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Statistics' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], 'Statistic' => '<string>', ], // ... ], ], ], // ... ], 'DatasetStatisticsConfiguration' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], 'Statistic' => '<string>', ], // ... ], ], 'EntityDetectorConfiguration' => [ 'AllowedStatistics' => [ [ 'Statistics' => ['<string>', ...], ], // ... ], 'EntityTypes' => ['<string>', ...], ], 'ProfileColumns' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], ], 'ProjectName' => '<string>', 'RecipeReference' => [ 'Name' => '<string>', 'RecipeVersion' => '<string>', ], 'ResourceArn' => '<string>', 'RoleArn' => '<string>', 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'Type' => 'PROFILE|RECIPE', 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', 'ValidationMode' => 'CHECK_ALL', ], // ... ], ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the job was created.
- CreatedBy
-
- Type: string
The identifier (user name) of the user associated with the creation of the job.
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- DatasetName
-
- Type: string
The dataset that the job acts upon.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
- Server-side encryption with keys managed by KMS. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- JobSample
-
- Type: JobSample structure
Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed.
- LastModifiedBy
-
- Type: string
The identifier (user name) of the user who last modified the job.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the job was last modified.
- LogSubscription
-
- Type: string
Indicates whether Amazon CloudWatch logging is enabled for this job.
- MaxCapacity
-
- Type: int
The maximum number of compute nodes that DataBrew can consume when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
The name of the job.
- Outputs
-
- Type: Array of Output structures
One or more artifacts that represent the output from running the job.
- ProfileConfiguration
-
- Type: ProfileConfiguration structure
Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.
- ProjectName
-
- Type: string
The DataBrew project associated with this job.
- RecipeReference
-
- Type: RecipeReference structure
Represents the name and version of a DataBrew recipe.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) of the job.
- RoleArn
-
- Type: string
The ARN of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags associated with this job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
. - Type
-
- Type: string
The job type, which must be one of the following:
-
PROFILE
- The job analyzes the dataset to determine its size, data types, data distribution, and more. -
RECIPE
- The job applies one or more transformations to a dataset.
- ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeJobRun
$result = $client->describeJobRun
([/* ... */]); $promise = $client->describeJobRunAsync
([/* ... */]);
Represents one run of a DataBrew job.
Parameter Syntax
$result = $client->describeJobRun([ 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job being processed during this run.
- RunId
-
- Required: Yes
- Type: string
The unique identifier of the job run.
Result Syntax
[ 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', ], // ... ], 'DatasetName' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <integer>, 'JobName' => '<string>', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LogGroupName' => '<string>', 'LogSubscription' => 'ENABLE|DISABLE', 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'ProfileConfiguration' => [ 'ColumnStatisticsConfigurations' => [ [ 'Selectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Statistics' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], 'Statistic' => '<string>', ], // ... ], ], ], // ... ], 'DatasetStatisticsConfiguration' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], 'Statistic' => '<string>', ], // ... ], ], 'EntityDetectorConfiguration' => [ 'AllowedStatistics' => [ [ 'Statistics' => ['<string>', ...], ], // ... ], 'EntityTypes' => ['<string>', ...], ], 'ProfileColumns' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], ], 'RecipeReference' => [ 'Name' => '<string>', 'RecipeVersion' => '<string>', ], 'RunId' => '<string>', 'StartedBy' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', 'ValidationMode' => 'CHECK_ALL', ], // ... ], ]
Result Details
Members
- Attempt
-
- Type: int
The number of times that DataBrew has attempted to run the job.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the job completed processing.
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- DatasetName
-
- Type: string
The name of the dataset for the job to process.
- ErrorMessage
-
- Type: string
A message indicating an error (if any) that was encountered when the job ran.
- ExecutionTime
-
- Type: int
The amount of time, in seconds, during which the job run consumed resources.
- JobName
-
- Required: Yes
- Type: string
The name of the job being processed during this run.
- JobSample
-
- Type: JobSample structure
Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.
- LogGroupName
-
- Type: string
The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.
- LogSubscription
-
- Type: string
The current status of Amazon CloudWatch logging for the job run.
- Outputs
-
- Type: Array of Output structures
One or more output artifacts from a job run.
- ProfileConfiguration
-
- Type: ProfileConfiguration structure
Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.
- RecipeReference
-
- Type: RecipeReference structure
Represents the name and version of a DataBrew recipe.
- RunId
-
- Type: string
The unique identifier of the job run.
- StartedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who started the job run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the job run began.
- State
-
- Type: string
The current state of the job run entity itself.
- ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeProject
$result = $client->describeProject
([/* ... */]); $promise = $client->describeProjectAsync
([/* ... */]);
Returns the definition of a specific DataBrew project.
Parameter Syntax
$result = $client->describeProject([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the project to be described.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'DatasetName' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'OpenDate' => <DateTime>, 'OpenedBy' => '<string>', 'RecipeName' => '<string>', 'ResourceArn' => '<string>', 'RoleArn' => '<string>', 'Sample' => [ 'Size' => <integer>, 'Type' => 'FIRST_N|LAST_N|RANDOM', ], 'SessionStatus' => 'ASSIGNED|FAILED|INITIALIZING|PROVISIONING|READY|RECYCLING|ROTATING|TERMINATED|TERMINATING|UPDATING', 'Tags' => ['<string>', ...], ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the project was created.
- CreatedBy
-
- Type: string
The identifier (user name) of the user who created the project.
- DatasetName
-
- Type: string
The dataset associated with the project.
- LastModifiedBy
-
- Type: string
The identifier (user name) of the user who last modified the project.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the project was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the project.
- OpenDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the project was opened.
- OpenedBy
-
- Type: string
The identifier (user name) of the user that opened the project for use.
- RecipeName
-
- Type: string
The recipe associated with this job.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) of the project.
- RoleArn
-
- Type: string
The ARN of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Sample
-
- Type: Sample structure
Represents the sample size and sampling type for DataBrew to use for interactive data analysis.
- SessionStatus
-
- Type: string
Describes the current state of the session:
-
PROVISIONING
- allocating resources for the session. -
INITIALIZING
- getting the session ready for first use. -
ASSIGNED
- the session is ready for use.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags associated with this project.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeRecipe
$result = $client->describeRecipe
([/* ... */]); $promise = $client->describeRecipeAsync
([/* ... */]);
Returns the definition of a specific DataBrew recipe corresponding to a particular version.
Parameter Syntax
$result = $client->describeRecipe([ 'Name' => '<string>', // REQUIRED 'RecipeVersion' => '<string>', ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe to be described.
- RecipeVersion
-
- Type: string
The recipe version identifier. If this parameter isn't specified, then the latest published version is returned.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Description' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ProjectName' => '<string>', 'PublishedBy' => '<string>', 'PublishedDate' => <DateTime>, 'RecipeVersion' => '<string>', 'ResourceArn' => '<string>', 'Steps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], 'Tags' => ['<string>', ...], ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the recipe was created.
- CreatedBy
-
- Type: string
The identifier (user name) of the user who created the recipe.
- Description
-
- Type: string
The description of the recipe.
- LastModifiedBy
-
- Type: string
The identifier (user name) of the user who last modified the recipe.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the recipe was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the recipe.
- ProjectName
-
- Type: string
The name of the project associated with this recipe.
- PublishedBy
-
- Type: string
The identifier (user name) of the user who last published the recipe.
- PublishedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the recipe was last published.
- RecipeVersion
-
- Type: string
The recipe version identifier.
- ResourceArn
-
- Type: string
The ARN of the recipe.
- Steps
-
- Type: Array of RecipeStep structures
One or more steps to be performed by the recipe. Each step consists of an action, and the conditions under which the action should succeed.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags associated with this project.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeRuleset
$result = $client->describeRuleset
([/* ... */]); $promise = $client->describeRulesetAsync
([/* ... */]);
Retrieves detailed information about the ruleset.
Parameter Syntax
$result = $client->describeRuleset([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the ruleset to be described.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Description' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ResourceArn' => '<string>', 'Rules' => [ [ 'CheckExpression' => '<string>', 'ColumnSelectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Disabled' => true || false, 'Name' => '<string>', 'SubstitutionMap' => ['<string>', ...], 'Threshold' => [ 'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN', 'Unit' => 'COUNT|PERCENTAGE', 'Value' => <float>, ], ], // ... ], 'Tags' => ['<string>', ...], 'TargetArn' => '<string>', ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the ruleset was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the ruleset.
- Description
-
- Type: string
The description of the ruleset.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the ruleset.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The modification date and time of the ruleset.
- Name
-
- Required: Yes
- Type: string
The name of the ruleset.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) for the ruleset.
- Rules
-
- Type: Array of Rule structures
A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the ruleset.
- TargetArn
-
- Type: string
The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
DescribeSchedule
$result = $client->describeSchedule
([/* ... */]); $promise = $client->describeScheduleAsync
([/* ... */]);
Returns the definition of a specific DataBrew schedule.
Parameter Syntax
$result = $client->describeSchedule([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the schedule to be described.
Result Syntax
[ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'CronExpression' => '<string>', 'JobNames' => ['<string>', ...], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ResourceArn' => '<string>', 'Tags' => ['<string>', ...], ]
Result Details
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the schedule was created.
- CreatedBy
-
- Type: string
The identifier (user name) of the user who created the schedule.
- CronExpression
-
- Type: string
The date or dates and time or times when the jobs are to be run for the schedule. For more information, see Cron expressions in the Glue DataBrew Developer Guide.
- JobNames
-
- Type: Array of strings
The name or names of one or more jobs to be run by using the schedule.
- LastModifiedBy
-
- Type: string
The identifier (user name) of the user who last modified the schedule.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the schedule was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the schedule.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) of the schedule.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags associated with this schedule.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
ListDatasets
$result = $client->listDatasets
([/* ... */]); $promise = $client->listDatasetsAsync
([/* ... */]);
Lists all of the DataBrew datasets.
Parameter Syntax
$result = $client->listDatasets([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
Result Syntax
[ 'Datasets' => [ [ 'AccountId' => '<string>', 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', 'HeaderRow' => true || false, ], 'Excel' => [ 'HeaderRow' => true || false, 'SheetIndexes' => [<integer>, ...], 'SheetNames' => ['<string>', ...], ], 'Json' => [ 'MultiLine' => true || false, ], ], 'Input' => [ 'DataCatalogInputDefinition' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseInputDefinition' => [ 'DatabaseTableName' => '<string>', 'GlueConnectionName' => '<string>', 'QueryString' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Metadata' => [ 'SourceArn' => '<string>', ], 'S3InputDefinition' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'PathOptions' => [ 'FilesLimit' => [ 'MaxFiles' => <integer>, 'Order' => 'DESCENDING|ASCENDING', 'OrderedBy' => 'LAST_MODIFIED_DATE', ], 'LastModifiedDateCondition' => [ 'Expression' => '<string>', 'ValuesMap' => ['<string>', ...], ], 'Parameters' => [ '<PathParameterName>' => [ 'CreateColumn' => true || false, 'DatetimeOptions' => [ 'Format' => '<string>', 'LocaleCode' => '<string>', 'TimezoneOffset' => '<string>', ], 'Filter' => [ 'Expression' => '<string>', 'ValuesMap' => ['<string>', ...], ], 'Name' => '<string>', 'Type' => 'Datetime|Number|String', ], // ... ], ], 'ResourceArn' => '<string>', 'Source' => 'S3|DATA-CATALOG|DATABASE', 'Tags' => ['<string>', ...], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Datasets
-
- Required: Yes
- Type: Array of Dataset structures
A list of datasets that are defined.
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
Errors
- ValidationException:
The input parameters for this request failed validation.
ListJobRuns
$result = $client->listJobRuns
([/* ... */]); $promise = $client->listJobRunsAsync
([/* ... */]);
Lists all of the previous runs of a particular DataBrew job.
Parameter Syntax
$result = $client->listJobRuns([ 'MaxResults' => <integer>, 'Name' => '<string>', // REQUIRED 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- Name
-
- Required: Yes
- Type: string
The name of the job.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
Result Syntax
[ 'JobRuns' => [ [ 'Attempt' => <integer>, 'CompletedOn' => <DateTime>, 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', ], // ... ], 'DatasetName' => '<string>', 'ErrorMessage' => '<string>', 'ExecutionTime' => <integer>, 'JobName' => '<string>', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LogGroupName' => '<string>', 'LogSubscription' => 'ENABLE|DISABLE', 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'RecipeReference' => [ 'Name' => '<string>', 'RecipeVersion' => '<string>', ], 'RunId' => '<string>', 'StartedBy' => '<string>', 'StartedOn' => <DateTime>, 'State' => 'STARTING|RUNNING|STOPPING|STOPPED|SUCCEEDED|FAILED|TIMEOUT', 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', 'ValidationMode' => 'CHECK_ALL', ], // ... ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- JobRuns
-
- Required: Yes
- Type: Array of JobRun structures
A list of job runs that have occurred for the specified job.
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
ListJobs
$result = $client->listJobs
([/* ... */]); $promise = $client->listJobsAsync
([/* ... */]);
Lists all of the DataBrew jobs that are defined.
Parameter Syntax
$result = $client->listJobs([ 'DatasetName' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', 'ProjectName' => '<string>', ]);
Parameter Details
Members
- DatasetName
-
- Type: string
The name of a dataset. Using this parameter indicates to return only those jobs that act on the specified dataset.
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
A token generated by DataBrew that specifies where to continue pagination if a previous request was truncated. To get the next set of pages, pass in the NextToken value from the response object of the previous page call.
- ProjectName
-
- Type: string
The name of a project. Using this parameter indicates to return only those jobs that are associated with the specified project.
Result Syntax
[ 'Jobs' => [ [ 'AccountId' => '<string>', 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ 'TableName' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', ], // ... ], 'DatasetName' => '<string>', 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ 'Bucket' => '<string>', 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'ProjectName' => '<string>', 'RecipeReference' => [ 'Name' => '<string>', 'RecipeVersion' => '<string>', ], 'ResourceArn' => '<string>', 'RoleArn' => '<string>', 'Tags' => ['<string>', ...], 'Timeout' => <integer>, 'Type' => 'PROFILE|RECIPE', 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', 'ValidationMode' => 'CHECK_ALL', ], // ... ], ], // ... ], 'NextToken' => '<string>', ]
Result Details
Members
- Jobs
-
- Required: Yes
- Type: Array of Job structures
A list of jobs that are defined.
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
Errors
- ValidationException:
The input parameters for this request failed validation.
ListProjects
$result = $client->listProjects
([/* ... */]); $promise = $client->listProjectsAsync
([/* ... */]);
Lists all of the DataBrew projects that are defined.
Parameter Syntax
$result = $client->listProjects([ 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
Result Syntax
[ 'NextToken' => '<string>', 'Projects' => [ [ 'AccountId' => '<string>', 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'DatasetName' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'OpenDate' => <DateTime>, 'OpenedBy' => '<string>', 'RecipeName' => '<string>', 'ResourceArn' => '<string>', 'RoleArn' => '<string>', 'Sample' => [ 'Size' => <integer>, 'Type' => 'FIRST_N|LAST_N|RANDOM', ], 'Tags' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
- Projects
-
- Required: Yes
- Type: Array of Project structures
A list of projects that are defined .
Errors
- ValidationException:
The input parameters for this request failed validation.
ListRecipeVersions
$result = $client->listRecipeVersions
([/* ... */]); $promise = $client->listRecipeVersionsAsync
([/* ... */]);
Lists the versions of a particular DataBrew recipe, except for LATEST_WORKING
.
Parameter Syntax
$result = $client->listRecipeVersions([ 'MaxResults' => <integer>, 'Name' => '<string>', // REQUIRED 'NextToken' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- Name
-
- Required: Yes
- Type: string
The name of the recipe for which to return version information.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
Result Syntax
[ 'NextToken' => '<string>', 'Recipes' => [ [ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Description' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ProjectName' => '<string>', 'PublishedBy' => '<string>', 'PublishedDate' => <DateTime>, 'RecipeVersion' => '<string>', 'ResourceArn' => '<string>', 'Steps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], 'Tags' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
- Recipes
-
- Required: Yes
- Type: Array of Recipe structures
A list of versions for the specified recipe.
Errors
- ValidationException:
The input parameters for this request failed validation.
ListRecipes
$result = $client->listRecipes
([/* ... */]); $promise = $client->listRecipesAsync
([/* ... */]);
Lists all of the DataBrew recipes that are defined.
Parameter Syntax
$result = $client->listRecipes([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'RecipeVersion' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
- RecipeVersion
-
- Type: string
Return only those recipes with a version identifier of
LATEST_WORKING
orLATEST_PUBLISHED
. IfRecipeVersion
is omitted,ListRecipes
returns all of theLATEST_PUBLISHED
recipe versions.Valid values:
LATEST_WORKING
|LATEST_PUBLISHED
Result Syntax
[ 'NextToken' => '<string>', 'Recipes' => [ [ 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Description' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ProjectName' => '<string>', 'PublishedBy' => '<string>', 'PublishedDate' => <DateTime>, 'RecipeVersion' => '<string>', 'ResourceArn' => '<string>', 'Steps' => [ [ 'Action' => [ 'Operation' => '<string>', 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', 'TargetColumn' => '<string>', 'Value' => '<string>', ], // ... ], ], // ... ], 'Tags' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
- Recipes
-
- Required: Yes
- Type: Array of Recipe structures
A list of recipes that are defined.
Errors
- ValidationException:
The input parameters for this request failed validation.
ListRulesets
$result = $client->listRulesets
([/* ... */]); $promise = $client->listRulesetsAsync
([/* ... */]);
List all rulesets available in the current account or rulesets associated with a specific resource (dataset).
Parameter Syntax
$result = $client->listRulesets([ 'MaxResults' => <integer>, 'NextToken' => '<string>', 'TargetArn' => '<string>', ]);
Parameter Details
Members
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
A token generated by DataBrew that specifies where to continue pagination if a previous request was truncated. To get the next set of pages, pass in the NextToken value from the response object of the previous page call.
- TargetArn
-
- Type: string
The Amazon Resource Name (ARN) of a resource (dataset). Using this parameter indicates to return only those rulesets that are associated with the specified resource.
Result Syntax
[ 'NextToken' => '<string>', 'Rulesets' => [ [ 'AccountId' => '<string>', 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'Description' => '<string>', 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ResourceArn' => '<string>', 'RuleCount' => <integer>, 'Tags' => ['<string>', ...], 'TargetArn' => '<string>', ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
- Rulesets
-
- Required: Yes
- Type: Array of RulesetItem structures
A list of RulesetItem. RulesetItem contains meta data of a ruleset.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
ListSchedules
$result = $client->listSchedules
([/* ... */]); $promise = $client->listSchedulesAsync
([/* ... */]);
Lists the DataBrew schedules that are defined.
Parameter Syntax
$result = $client->listSchedules([ 'JobName' => '<string>', 'MaxResults' => <integer>, 'NextToken' => '<string>', ]);
Parameter Details
Members
- JobName
-
- Type: string
The name of the job that these schedules apply to.
- MaxResults
-
- Type: int
The maximum number of results to return in this request.
- NextToken
-
- Type: string
The token returned by a previous call to retrieve the next set of results.
Result Syntax
[ 'NextToken' => '<string>', 'Schedules' => [ [ 'AccountId' => '<string>', 'CreateDate' => <DateTime>, 'CreatedBy' => '<string>', 'CronExpression' => '<string>', 'JobNames' => ['<string>', ...], 'LastModifiedBy' => '<string>', 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', 'ResourceArn' => '<string>', 'Tags' => ['<string>', ...], ], // ... ], ]
Result Details
Members
- NextToken
-
- Type: string
A token that you can use in a subsequent call to retrieve the next set of results.
- Schedules
-
- Required: Yes
- Type: Array of Schedule structures
A list of schedules that are defined.
Errors
- ValidationException:
The input parameters for this request failed validation.
ListTagsForResource
$result = $client->listTagsForResource
([/* ... */]); $promise = $client->listTagsForResourceAsync
([/* ... */]);
Lists all the tags for a DataBrew resource.
Parameter Syntax
$result = $client->listTagsForResource([ 'ResourceArn' => '<string>', // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) string that uniquely identifies the DataBrew resource.
Result Syntax
[ 'Tags' => ['<string>', ...], ]
Result Details
Members
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
A list of tags associated with the DataBrew resource.
Errors
- InternalServerException:
An internal service failure occurred.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
PublishRecipe
$result = $client->publishRecipe
([/* ... */]); $promise = $client->publishRecipeAsync
([/* ... */]);
Publishes a new version of a DataBrew recipe.
Parameter Syntax
$result = $client->publishRecipe([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the recipe to be published, for this version of the recipe.
- Name
-
- Required: Yes
- Type: string
The name of the recipe to be published.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe that you published.
Errors
- ValidationException:
The input parameters for this request failed validation.
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
SendProjectSessionAction
$result = $client->sendProjectSessionAction
([/* ... */]); $promise = $client->sendProjectSessionActionAsync
([/* ... */]);
Performs a recipe step within an interactive DataBrew session that's currently open.
Parameter Syntax
$result = $client->sendProjectSessionAction([ 'ClientSessionId' => '<string>', 'Name' => '<string>', // REQUIRED 'Preview' => true || false, 'RecipeStep' => [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], 'StepIndex' => <integer>, 'ViewFrame' => [ 'Analytics' => 'ENABLE|DISABLE', 'ColumnRange' => <integer>, 'HiddenColumns' => ['<string>', ...], 'RowRange' => <integer>, 'StartColumnIndex' => <integer>, // REQUIRED 'StartRowIndex' => <integer>, ], ]);
Parameter Details
Members
- ClientSessionId
-
- Type: string
A unique identifier for an interactive session that's currently open and ready for work. The action will be performed on this session.
- Name
-
- Required: Yes
- Type: string
The name of the project to apply the action to.
- Preview
-
- Type: boolean
If true, the result of the recipe step will be returned, but not applied.
- RecipeStep
-
- Type: RecipeStep structure
Represents a single step from a DataBrew recipe to be performed.
- StepIndex
-
- Type: int
The index from which to preview a step. This index is used to preview the result of steps that have already been applied, so that the resulting view frame is from earlier in the view frame stack.
- ViewFrame
-
- Type: ViewFrame structure
Represents the data being transformed during an action.
Result Syntax
[ 'ActionId' => <integer>, 'Name' => '<string>', 'Result' => '<string>', ]
Result Details
Members
- ActionId
-
- Type: int
A unique identifier for the action that was performed.
- Name
-
- Required: Yes
- Type: string
The name of the project that was affected by the action.
- Result
-
- Type: string
A message indicating the result of performing the action.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
StartJobRun
$result = $client->startJobRun
([/* ... */]); $promise = $client->startJobRunAsync
([/* ... */]);
Runs a DataBrew job.
Parameter Syntax
$result = $client->startJobRun([ 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job to be run.
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Required: Yes
- Type: string
A system-generated identifier for this particular job run.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
StartProjectSession
$result = $client->startProjectSession
([/* ... */]); $promise = $client->startProjectSessionAsync
([/* ... */]);
Creates an interactive session, enabling you to manipulate data in a DataBrew project.
Parameter Syntax
$result = $client->startProjectSession([ 'AssumeControl' => true || false, 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- AssumeControl
-
- Type: boolean
A value that, if true, enables you to take control of a session, even if a different client is currently accessing the project.
- Name
-
- Required: Yes
- Type: string
The name of the project to act upon.
Result Syntax
[ 'ClientSessionId' => '<string>', 'Name' => '<string>', ]
Result Details
Members
- ClientSessionId
-
- Type: string
A system-generated identifier for the session.
- Name
-
- Required: Yes
- Type: string
The name of the project to be acted upon.
Errors
- ConflictException:
Updating or deleting a resource can cause an inconsistent state.
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
StopJobRun
$result = $client->stopJobRun
([/* ... */]); $promise = $client->stopJobRunAsync
([/* ... */]);
Stops a particular run of a job.
Parameter Syntax
$result = $client->stopJobRun([ 'Name' => '<string>', // REQUIRED 'RunId' => '<string>', // REQUIRED ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job to be stopped.
- RunId
-
- Required: Yes
- Type: string
The ID of the job run to be stopped.
Result Syntax
[ 'RunId' => '<string>', ]
Result Details
Members
- RunId
-
- Required: Yes
- Type: string
The ID of the job run that you stopped.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
TagResource
$result = $client->tagResource
([/* ... */]); $promise = $client->tagResourceAsync
([/* ... */]);
Adds metadata tags to a DataBrew resource, such as a dataset, project, recipe, job, or schedule.
Parameter Syntax
$result = $client->tagResource([ 'ResourceArn' => '<string>', // REQUIRED 'Tags' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
The DataBrew resource to which tags should be added. The value for this parameter is an Amazon Resource Name (ARN). For DataBrew, you can tag a dataset, a job, a project, or a recipe.
- Tags
-
- Required: Yes
- Type: Associative array of custom strings keys (TagKey) to strings
One or more tags to be assigned to the resource.
Result Syntax
[]
Result Details
Errors
- InternalServerException:
An internal service failure occurred.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UntagResource
$result = $client->untagResource
([/* ... */]); $promise = $client->untagResourceAsync
([/* ... */]);
Removes metadata tags from a DataBrew resource.
Parameter Syntax
$result = $client->untagResource([ 'ResourceArn' => '<string>', // REQUIRED 'TagKeys' => ['<string>', ...], // REQUIRED ]);
Parameter Details
Members
- ResourceArn
-
- Required: Yes
- Type: string
A DataBrew resource from which you want to remove a tag or tags. The value for this parameter is an Amazon Resource Name (ARN).
- TagKeys
-
- Required: Yes
- Type: Array of strings
The tag keys (names) of one or more tags to be removed.
Result Syntax
[]
Result Details
Errors
- InternalServerException:
An internal service failure occurred.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateDataset
$result = $client->updateDataset
([/* ... */]); $promise = $client->updateDatasetAsync
([/* ... */]);
Modifies the definition of an existing DataBrew dataset.
Parameter Syntax
$result = $client->updateDataset([ 'Format' => 'CSV|JSON|PARQUET|EXCEL|ORC', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', 'HeaderRow' => true || false, ], 'Excel' => [ 'HeaderRow' => true || false, 'SheetIndexes' => [<integer>, ...], 'SheetNames' => ['<string>', ...], ], 'Json' => [ 'MultiLine' => true || false, ], ], 'Input' => [ // REQUIRED 'DataCatalogInputDefinition' => [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseInputDefinition' => [ 'DatabaseTableName' => '<string>', 'GlueConnectionName' => '<string>', // REQUIRED 'QueryString' => '<string>', 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Metadata' => [ 'SourceArn' => '<string>', ], 'S3InputDefinition' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Name' => '<string>', // REQUIRED 'PathOptions' => [ 'FilesLimit' => [ 'MaxFiles' => <integer>, // REQUIRED 'Order' => 'DESCENDING|ASCENDING', 'OrderedBy' => 'LAST_MODIFIED_DATE', ], 'LastModifiedDateCondition' => [ 'Expression' => '<string>', // REQUIRED 'ValuesMap' => ['<string>', ...], // REQUIRED ], 'Parameters' => [ '<PathParameterName>' => [ 'CreateColumn' => true || false, 'DatetimeOptions' => [ 'Format' => '<string>', // REQUIRED 'LocaleCode' => '<string>', 'TimezoneOffset' => '<string>', ], 'Filter' => [ 'Expression' => '<string>', // REQUIRED 'ValuesMap' => ['<string>', ...], // REQUIRED ], 'Name' => '<string>', // REQUIRED 'Type' => 'Datetime|Number|String', // REQUIRED ], // ... ], ], ]);
Parameter Details
Members
- Format
-
- Type: string
The file format of a dataset that is created from an Amazon S3 file or folder.
- FormatOptions
-
- Type: FormatOptions structure
Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.
- Input
-
- Required: Yes
- Type: Input structure
Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.
- Name
-
- Required: Yes
- Type: string
The name of the dataset to be updated.
- PathOptions
-
- Type: PathOptions structure
A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the dataset that you updated.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateProfileJob
$result = $client->updateProfileJob
([/* ... */]); $promise = $client->updateProfileJobAsync
([/* ... */]);
Modifies the definition of an existing profile job.
Parameter Syntax
$result = $client->updateProfileJob([ 'Configuration' => [ 'ColumnStatisticsConfigurations' => [ [ 'Selectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Statistics' => [ // REQUIRED 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], // REQUIRED 'Statistic' => '<string>', // REQUIRED ], // ... ], ], ], // ... ], 'DatasetStatisticsConfiguration' => [ 'IncludedStatistics' => ['<string>', ...], 'Overrides' => [ [ 'Parameters' => ['<string>', ...], // REQUIRED 'Statistic' => '<string>', // REQUIRED ], // ... ], ], 'EntityDetectorConfiguration' => [ 'AllowedStatistics' => [ [ 'Statistics' => ['<string>', ...], // REQUIRED ], // ... ], 'EntityTypes' => ['<string>', ...], // REQUIRED ], 'ProfileColumns' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], ], 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'JobSample' => [ 'Mode' => 'FULL_DATASET|CUSTOM_ROWS', 'Size' => <integer>, ], 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'OutputLocation' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'RoleArn' => '<string>', // REQUIRED 'Timeout' => <integer>, 'ValidationConfigurations' => [ [ 'RulesetArn' => '<string>', // REQUIRED 'ValidationMode' => 'CHECK_ALL', ], // ... ], ]);
Parameter Details
Members
- Configuration
-
- Type: ProfileConfiguration structure
Configuration for profile jobs. Used to select columns, do evaluations, and override default parameters of evaluations. When configuration is null, the profile job will run with default settings.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
- Server-side encryption with keys managed by KMS. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- JobSample
-
- Type: JobSample structure
Sample configuration for Profile Jobs only. Determines the number of rows on which the Profile job will be executed. If a JobSample value is not provided for profile jobs, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.
- LogSubscription
-
- Type: string
Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.
- MaxCapacity
-
- Type: int
The maximum number of compute nodes that DataBrew can use when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
The name of the job to be updated.
- OutputLocation
-
- Required: Yes
- Type: S3Location structure
Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
. - ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job that was updated.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateProject
$result = $client->updateProject
([/* ... */]); $promise = $client->updateProjectAsync
([/* ... */]);
Modifies the definition of an existing DataBrew project.
Parameter Syntax
$result = $client->updateProject([ 'Name' => '<string>', // REQUIRED 'RoleArn' => '<string>', // REQUIRED 'Sample' => [ 'Size' => <integer>, 'Type' => 'FIRST_N|LAST_N|RANDOM', // REQUIRED ], ]);
Parameter Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the project to be updated.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the IAM role to be assumed for this request.
- Sample
-
- Type: Sample structure
Represents the sample size and sampling type for DataBrew to use for interactive data analysis.
Result Syntax
[ 'LastModifiedDate' => <DateTime>, 'Name' => '<string>', ]
Result Details
Members
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the project was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the project that you updated.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateRecipe
$result = $client->updateRecipe
([/* ... */]); $promise = $client->updateRecipeAsync
([/* ... */]);
Modifies the definition of the LATEST_WORKING
version of a DataBrew recipe.
Parameter Syntax
$result = $client->updateRecipe([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Steps' => [ [ 'Action' => [ // REQUIRED 'Operation' => '<string>', // REQUIRED 'Parameters' => ['<string>', ...], ], 'ConditionExpressions' => [ [ 'Condition' => '<string>', // REQUIRED 'TargetColumn' => '<string>', // REQUIRED 'Value' => '<string>', ], // ... ], ], // ... ], ]);
Parameter Details
Members
- Description
-
- Type: string
A description of the recipe.
- Name
-
- Required: Yes
- Type: string
The name of the recipe to be updated.
- Steps
-
- Type: Array of RecipeStep structures
One or more steps to be performed by the recipe. Each step consists of an action, and the conditions under which the action should succeed.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe that was updated.
Errors
- ValidationException:
The input parameters for this request failed validation.
- ResourceNotFoundException:
One or more resources can't be found.
UpdateRecipeJob
$result = $client->updateRecipeJob
([/* ... */]); $promise = $client->updateRecipeJobAsync
([/* ... */]);
Modifies the definition of an existing DataBrew recipe job.
Parameter Syntax
$result = $client->updateRecipeJob([ 'DataCatalogOutputs' => [ [ 'CatalogId' => '<string>', 'DatabaseName' => '<string>', // REQUIRED 'DatabaseOptions' => [ 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'Overwrite' => true || false, 'S3Options' => [ 'Location' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'TableName' => '<string>', // REQUIRED ], // ... ], 'DatabaseOutputs' => [ [ 'DatabaseOptions' => [ // REQUIRED 'TableName' => '<string>', // REQUIRED 'TempDirectory' => [ 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], ], 'DatabaseOutputMode' => 'NEW_TABLE', 'GlueConnectionName' => '<string>', // REQUIRED ], // ... ], 'EncryptionKeyArn' => '<string>', 'EncryptionMode' => 'SSE-KMS|SSE-S3', 'LogSubscription' => 'ENABLE|DISABLE', 'MaxCapacity' => <integer>, 'MaxRetries' => <integer>, 'Name' => '<string>', // REQUIRED 'Outputs' => [ [ 'CompressionFormat' => 'GZIP|LZ4|SNAPPY|BZIP2|DEFLATE|LZO|BROTLI|ZSTD|ZLIB', 'Format' => 'CSV|JSON|PARQUET|GLUEPARQUET|AVRO|ORC|XML|TABLEAUHYPER', 'FormatOptions' => [ 'Csv' => [ 'Delimiter' => '<string>', ], ], 'Location' => [ // REQUIRED 'Bucket' => '<string>', // REQUIRED 'BucketOwner' => '<string>', 'Key' => '<string>', ], 'MaxOutputFiles' => <integer>, 'Overwrite' => true || false, 'PartitionColumns' => ['<string>', ...], ], // ... ], 'RoleArn' => '<string>', // REQUIRED 'Timeout' => <integer>, ]);
Parameter Details
Members
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
- Server-side encryption with keys managed by KMS. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- LogSubscription
-
- Type: string
Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.
- MaxCapacity
-
- Type: int
The maximum number of nodes that DataBrew can consume when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
The name of the job to update.
- Outputs
-
- Type: Array of Output structures
One or more artifacts that represent the output from running the job.
- RoleArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the job that you updated.
Errors
- AccessDeniedException:
Access to the specified resource was denied.
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateRuleset
$result = $client->updateRuleset
([/* ... */]); $promise = $client->updateRulesetAsync
([/* ... */]);
Updates specified ruleset.
Parameter Syntax
$result = $client->updateRuleset([ 'Description' => '<string>', 'Name' => '<string>', // REQUIRED 'Rules' => [ // REQUIRED [ 'CheckExpression' => '<string>', // REQUIRED 'ColumnSelectors' => [ [ 'Name' => '<string>', 'Regex' => '<string>', ], // ... ], 'Disabled' => true || false, 'Name' => '<string>', // REQUIRED 'SubstitutionMap' => ['<string>', ...], 'Threshold' => [ 'Type' => 'GREATER_THAN_OR_EQUAL|LESS_THAN_OR_EQUAL|GREATER_THAN|LESS_THAN', 'Unit' => 'COUNT|PERCENTAGE', 'Value' => <float>, // REQUIRED ], ], // ... ], ]);
Parameter Details
Members
- Description
-
- Type: string
The description of the ruleset.
- Name
-
- Required: Yes
- Type: string
The name of the ruleset to be updated.
- Rules
-
- Required: Yes
- Type: Array of Rule structures
A list of rules that are defined with the ruleset. A rule includes one or more checks to be validated on a DataBrew dataset.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the updated ruleset.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ValidationException:
The input parameters for this request failed validation.
UpdateSchedule
$result = $client->updateSchedule
([/* ... */]); $promise = $client->updateScheduleAsync
([/* ... */]);
Modifies the definition of an existing DataBrew schedule.
Parameter Syntax
$result = $client->updateSchedule([ 'CronExpression' => '<string>', // REQUIRED 'JobNames' => ['<string>', ...], 'Name' => '<string>', // REQUIRED ]);
Parameter Details
Members
- CronExpression
-
- Required: Yes
- Type: string
The date or dates and time or times when the jobs are to be run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.
- JobNames
-
- Type: Array of strings
The name or names of one or more jobs to be run for this schedule.
- Name
-
- Required: Yes
- Type: string
The name of the schedule to update.
Result Syntax
[ 'Name' => '<string>', ]
Result Details
Members
- Name
-
- Required: Yes
- Type: string
The name of the schedule that was updated.
Errors
- ResourceNotFoundException:
One or more resources can't be found.
- ServiceQuotaExceededException:
A service quota is exceeded.
- ValidationException:
The input parameters for this request failed validation.
Shapes
AccessDeniedException
Description
Access to the specified resource was denied.
Members
- Message
-
- Type: string
AllowedStatistics
Description
Configuration of statistics that are allowed to be run on columns that contain detected entities. When undefined, no statistics will be computed on columns that contain detected entities.
Members
- Statistics
-
- Required: Yes
- Type: Array of strings
One or more column statistics to allow for columns that contain detected entities.
ColumnSelector
Description
Selector of a column from a dataset for profile job configuration. One selector includes either a column name or a regular expression.
Members
- Name
-
- Type: string
The name of a column from a dataset.
- Regex
-
- Type: string
A regular expression for selecting a column from a dataset.
ColumnStatisticsConfiguration
Description
Configuration for column evaluations for a profile job. ColumnStatisticsConfiguration can be used to select evaluations and override parameters of evaluations for particular columns.
Members
- Selectors
-
- Type: Array of ColumnSelector structures
List of column selectors. Selectors can be used to select columns from the dataset. When selectors are undefined, configuration will be applied to all supported columns.
- Statistics
-
- Required: Yes
- Type: StatisticsConfiguration structure
Configuration for evaluations. Statistics can be used to select evaluations and override parameters of evaluations.
ConditionExpression
Description
Represents an individual condition that evaluates to true or false.
Conditions are used with recipe actions. The action is only performed for column values where the condition evaluates to true.
If a recipe requires more than one condition, then the recipe must specify multiple ConditionExpression
elements. Each condition is applied to the rows in a dataset first, before the recipe action is performed.
Members
- Condition
-
- Required: Yes
- Type: string
A specific condition to apply to a recipe action. For more information, see Recipe structure in the Glue DataBrew Developer Guide.
- TargetColumn
-
- Required: Yes
- Type: string
A column to apply this condition to.
- Value
-
- Type: string
A value that the condition must evaluate to for the condition to succeed.
ConflictException
Description
Updating or deleting a resource can cause an inconsistent state.
Members
- Message
-
- Type: string
CsvOptions
Description
Represents a set of options that define how DataBrew will read a comma-separated value (CSV) file when creating a dataset from that file.
Members
- Delimiter
-
- Type: string
A single character that specifies the delimiter being used in the CSV file.
- HeaderRow
-
- Type: boolean
A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.
CsvOutputOptions
Description
Represents a set of options that define how DataBrew will write a comma-separated value (CSV) file.
Members
- Delimiter
-
- Type: string
A single character that specifies the delimiter used to create CSV job output.
DataCatalogInputDefinition
Description
Represents how metadata stored in the Glue Data Catalog is defined in a DataBrew dataset.
Members
- CatalogId
-
- Type: string
The unique identifier of the Amazon Web Services account that holds the Data Catalog that stores the data.
- DatabaseName
-
- Required: Yes
- Type: string
The name of a database in the Data Catalog.
- TableName
-
- Required: Yes
- Type: string
The name of a database table in the Data Catalog. This table corresponds to a DataBrew dataset.
- TempDirectory
-
- Type: S3Location structure
Represents an Amazon location where DataBrew can store intermediate results.
DataCatalogOutput
Description
Represents options that specify how and where in the Glue Data Catalog DataBrew writes the output generated by recipe jobs.
Members
- CatalogId
-
- Type: string
The unique identifier of the Amazon Web Services account that holds the Data Catalog that stores the data.
- DatabaseName
-
- Required: Yes
- Type: string
The name of a database in the Data Catalog.
- DatabaseOptions
-
- Type: DatabaseTableOutputOptions structure
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
- Overwrite
-
- Type: boolean
A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.
- S3Options
-
- Type: S3TableOutputOptions structure
Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.
- TableName
-
- Required: Yes
- Type: string
The name of a table in the Data Catalog.
DatabaseInputDefinition
Description
Connection information for dataset input files stored in a database.
Members
- DatabaseTableName
-
- Type: string
The table within the target database.
- GlueConnectionName
-
- Required: Yes
- Type: string
The Glue Connection that stores the connection information for the target database.
- QueryString
-
- Type: string
Custom SQL to run against the provided Glue connection. This SQL will be used as the input for DataBrew projects and jobs.
- TempDirectory
-
- Type: S3Location structure
Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.
DatabaseOutput
Description
Represents a JDBC database output object which defines the output destination for a DataBrew recipe job to write into.
Members
- DatabaseOptions
-
- Required: Yes
- Type: DatabaseTableOutputOptions structure
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
- DatabaseOutputMode
-
- Type: string
The output mode to write into the database. Currently supported option: NEW_TABLE.
- GlueConnectionName
-
- Required: Yes
- Type: string
The Glue connection that stores the connection information for the target database.
DatabaseTableOutputOptions
Description
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
Members
- TableName
-
- Required: Yes
- Type: string
A prefix for the name of a table DataBrew will create in the database.
- TempDirectory
-
- Type: S3Location structure
Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
Dataset
Description
Represents a dataset that can be processed by DataBrew.
Members
- AccountId
-
- Type: string
The ID of the Amazon Web Services account that owns the dataset.
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the dataset was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the dataset.
- Format
-
- Type: string
The file format of a dataset that is created from an Amazon S3 file or folder.
- FormatOptions
-
- Type: FormatOptions structure
A set of options that define how DataBrew interprets the data in the dataset.
- Input
-
- Required: Yes
- Type: Input structure
Information on how DataBrew can find the dataset, in either the Glue Data Catalog or Amazon S3.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the dataset.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last modification date and time of the dataset.
- Name
-
- Required: Yes
- Type: string
The unique name of the dataset.
- PathOptions
-
- Type: PathOptions structure
A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.
- ResourceArn
-
- Type: string
The unique Amazon Resource Name (ARN) for the dataset.
- Source
-
- Type: string
The location of the data for the dataset, either Amazon S3 or the Glue Data Catalog.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the dataset.
DatasetParameter
Description
Represents a dataset parameter that defines type and conditions for a parameter in the Amazon S3 path of the dataset.
Members
- CreateColumn
-
- Type: boolean
Optional boolean value that defines whether the captured value of this parameter should be used to create a new column in a dataset.
- DatetimeOptions
-
- Type: DatetimeOptions structure
Additional parameter options such as a format and a timezone. Required for datetime parameters.
- Filter
-
- Type: FilterExpression structure
The optional filter expression structure to apply additional matching criteria to the parameter.
- Name
-
- Required: Yes
- Type: string
The name of the parameter that is used in the dataset's Amazon S3 path.
- Type
-
- Required: Yes
- Type: string
The type of the dataset parameter, can be one of a 'String', 'Number' or 'Datetime'.
DatetimeOptions
Description
Represents additional options for correct interpretation of datetime parameters used in the Amazon S3 path of a dataset.
Members
- Format
-
- Required: Yes
- Type: string
Required option, that defines the datetime format used for a date parameter in the Amazon S3 path. Should use only supported datetime specifiers and separation characters, all literal a-z or A-Z characters should be escaped with single quotes. E.g. "MM.dd.yyyy-'at'-HH:mm".
- LocaleCode
-
- Type: string
Optional value for a non-US locale code, needed for correct interpretation of some date formats.
- TimezoneOffset
-
- Type: string
Optional value for a timezone offset of the datetime parameter value in the Amazon S3 path. Shouldn't be used if Format for this parameter includes timezone fields. If no offset specified, UTC is assumed.
EntityDetectorConfiguration
Description
Configuration of entity detection for a profile job. When undefined, entity detection is disabled.
Members
- AllowedStatistics
-
- Type: Array of AllowedStatistics structures
Configuration of statistics that are allowed to be run on columns that contain detected entities. When undefined, no statistics will be computed on columns that contain detected entities.
- EntityTypes
-
- Required: Yes
- Type: Array of strings
Entity types to detect. Can be any of the following:
-
USA_SSN
-
EMAIL
-
USA_ITIN
-
USA_PASSPORT_NUMBER
-
PHONE_NUMBER
-
USA_DRIVING_LICENSE
-
BANK_ACCOUNT
-
CREDIT_CARD
-
IP_ADDRESS
-
MAC_ADDRESS
-
USA_DEA_NUMBER
-
USA_HCPCS_CODE
-
USA_NATIONAL_PROVIDER_IDENTIFIER
-
USA_NATIONAL_DRUG_CODE
-
USA_HEALTH_INSURANCE_CLAIM_NUMBER
-
USA_MEDICARE_BENEFICIARY_IDENTIFIER
-
USA_CPT_CODE
-
PERSON_NAME
-
DATE
The Entity type group USA_ALL is also supported, and includes all of the above entity types except PERSON_NAME and DATE.
ExcelOptions
Description
Represents a set of options that define how DataBrew will interpret a Microsoft Excel file when creating a dataset from that file.
Members
- HeaderRow
-
- Type: boolean
A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.
- SheetIndexes
-
- Type: Array of ints
One or more sheet numbers in the Excel file that will be included in the dataset.
- SheetNames
-
- Type: Array of strings
One or more named sheets in the Excel file that will be included in the dataset.
FilesLimit
Description
Represents a limit imposed on number of Amazon S3 files that should be selected for a dataset from a connected Amazon S3 path.
Members
- MaxFiles
-
- Required: Yes
- Type: int
The number of Amazon S3 files to select.
- Order
-
- Type: string
A criteria to use for Amazon S3 files sorting before their selection. By default uses DESCENDING order, i.e. most recent files are selected first. Another possible value is ASCENDING.
- OrderedBy
-
- Type: string
A criteria to use for Amazon S3 files sorting before their selection. By default uses LAST_MODIFIED_DATE as a sorting criteria. Currently it's the only allowed value.
FilterExpression
Description
Represents a structure for defining parameter conditions. Supported conditions are described here: Supported conditions for dynamic datasets in the Glue DataBrew Developer Guide.
Members
- Expression
-
- Required: Yes
- Type: string
The expression which includes condition names followed by substitution variables, possibly grouped and combined with other conditions. For example, "(starts_with :prefix1 or starts_with :prefix2) and (ends_with :suffix1 or ends_with :suffix2)". Substitution variables should start with ':' symbol.
- ValuesMap
-
- Required: Yes
- Type: Associative array of custom strings keys (ValueReference) to strings
The map of substitution variable names to their values used in this filter expression.
FormatOptions
Description
Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.
Members
- Csv
-
- Type: CsvOptions structure
Options that define how CSV input is to be interpreted by DataBrew.
- Excel
-
- Type: ExcelOptions structure
Options that define how Excel input is to be interpreted by DataBrew.
- Json
-
- Type: JsonOptions structure
Options that define how JSON input is to be interpreted by DataBrew.
Input
Description
Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.
Members
- DataCatalogInputDefinition
-
- Type: DataCatalogInputDefinition structure
The Glue Data Catalog parameters for the data.
- DatabaseInputDefinition
-
- Type: DatabaseInputDefinition structure
Connection information for dataset input files stored in a database.
- Metadata
-
- Type: Metadata structure
Contains additional resource information needed for specific datasets.
- S3InputDefinition
-
- Type: S3Location structure
The Amazon S3 location where the data is stored.
InternalServerException
Description
An internal service failure occurred.
Members
- Message
-
- Type: string
Job
Description
Represents all of the attributes of a DataBrew job.
Members
- AccountId
-
- Type: string
The ID of the Amazon Web Services account that owns the job.
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the job was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the job.
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- DatasetName
-
- Type: string
A dataset that the job is to process.
- EncryptionKeyArn
-
- Type: string
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output. For more information, see Encrypting data written by DataBrew jobs
- EncryptionMode
-
- Type: string
The encryption mode for the job, which can be one of the following:
-
SSE-KMS
- Server-side encryption with keys managed by KMS. -
SSE-S3
- Server-side encryption with keys managed by Amazon S3.
- JobSample
-
- Type: JobSample structure
A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a
JobSample
value isn't provided, the default value is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter. - LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the job.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The modification date and time of the job.
- LogSubscription
-
- Type: string
The current status of Amazon CloudWatch logging for the job.
- MaxCapacity
-
- Type: int
The maximum number of nodes that can be consumed when the job processes data.
- MaxRetries
-
- Type: int
The maximum number of times to retry the job after a job run fails.
- Name
-
- Required: Yes
- Type: string
The unique name of the job.
- Outputs
-
- Type: Array of Output structures
One or more artifacts that represent output from running the job.
- ProjectName
-
- Type: string
The name of the project that the job is associated with.
- RecipeReference
-
- Type: RecipeReference structure
A set of steps that the job runs.
- ResourceArn
-
- Type: string
The unique Amazon Resource Name (ARN) for the job.
- RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the role to be assumed for this job.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the job.
- Timeout
-
- Type: int
The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of
TIMEOUT
. - Type
-
- Type: string
The job type of the job, which must be one of the following:
-
PROFILE
- A job to analyze a dataset, to determine its size, data types, data distribution, and more. -
RECIPE
- A job to apply one or more transformations to a dataset.
- ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job.
JobRun
Description
Represents one run of a DataBrew job.
Members
- Attempt
-
- Type: int
The number of times that DataBrew has attempted to run the job.
- CompletedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the job completed processing.
- DataCatalogOutputs
-
- Type: Array of DataCatalogOutput structures
One or more artifacts that represent the Glue Data Catalog output from running the job.
- DatabaseOutputs
-
- Type: Array of DatabaseOutput structures
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- DatasetName
-
- Type: string
The name of the dataset for the job to process.
- ErrorMessage
-
- Type: string
A message indicating an error (if any) that was encountered when the job ran.
- ExecutionTime
-
- Type: int
The amount of time, in seconds, during which a job run consumed resources.
- JobName
-
- Type: string
The name of the job being processed during this run.
- JobSample
-
- Type: JobSample structure
A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a
JobSample
value isn't provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter. - LogGroupName
-
- Type: string
The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.
- LogSubscription
-
- Type: string
The current status of Amazon CloudWatch logging for the job run.
- Outputs
-
- Type: Array of Output structures
One or more output artifacts from a job run.
- RecipeReference
-
- Type: RecipeReference structure
The set of steps processed by the job.
- RunId
-
- Type: string
The unique identifier of the job run.
- StartedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who initiated the job run.
- StartedOn
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the job run began.
- State
-
- Type: string
The current state of the job run entity itself.
- ValidationConfigurations
-
- Type: Array of ValidationConfiguration structures
List of validation configurations that are applied to the profile job run.
JobSample
Description
A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample
value isn't provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.
Members
- Mode
-
- Type: string
A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:
-
FULL_DATASET - The profile job is run on the entire dataset.
-
CUSTOM_ROWS - The profile job is run on the number of rows specified in the
Size
parameter.
- Size
-
- Type: long (int|float)
The
Size
parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.Long.MAX_VALUE = 9223372036854775807
JsonOptions
Description
Represents the JSON-specific options that define how input is to be interpreted by Glue DataBrew.
Members
- MultiLine
-
- Type: boolean
A value that specifies whether JSON input contains embedded new line characters.
Metadata
Description
Contains additional resource information needed for specific datasets.
Members
- SourceArn
-
- Type: string
The Amazon Resource Name (ARN) associated with the dataset. Currently, DataBrew only supports ARNs from Amazon AppFlow.
Output
Description
Represents options that specify how and where in Amazon S3 DataBrew writes the output generated by recipe jobs or profile jobs.
Members
- CompressionFormat
-
- Type: string
The compression algorithm used to compress the output text of the job.
- Format
-
- Type: string
The data format of the output of the job.
- FormatOptions
-
- Type: OutputFormatOptions structure
Represents options that define how DataBrew formats job output files.
- Location
-
- Required: Yes
- Type: S3Location structure
The location in Amazon S3 where the job writes its output.
- MaxOutputFiles
-
- Type: int
Maximum number of files to be generated by the job and written to the output folder. For output partitioned by column(s), the MaxOutputFiles value is the maximum number of files per partition.
- Overwrite
-
- Type: boolean
A value that, if true, means that any data in the location specified for output is overwritten with new output.
- PartitionColumns
-
- Type: Array of strings
The names of one or more partition columns for the output of the job.
OutputFormatOptions
Description
Represents a set of options that define the structure of comma-separated (CSV) job output.
Members
- Csv
-
- Type: CsvOutputOptions structure
Represents a set of options that define the structure of comma-separated value (CSV) job output.
PathOptions
Description
Represents a set of options that define how DataBrew selects files for a given Amazon S3 path in a dataset.
Members
- FilesLimit
-
- Type: FilesLimit structure
If provided, this structure imposes a limit on a number of files that should be selected.
- LastModifiedDateCondition
-
- Type: FilterExpression structure
If provided, this structure defines a date range for matching Amazon S3 objects based on their LastModifiedDate attribute in Amazon S3.
- Parameters
-
- Type: Associative array of custom strings keys (PathParameterName) to DatasetParameter structures
A structure that maps names of parameters used in the Amazon S3 path of a dataset to their definitions.
ProfileConfiguration
Description
Configuration for profile jobs. Configuration can be used to select columns, do evaluations, and override default parameters of evaluations. When configuration is undefined, the profile job will apply default settings to all supported columns.
Members
- ColumnStatisticsConfigurations
-
- Type: Array of ColumnStatisticsConfiguration structures
List of configurations for column evaluations. ColumnStatisticsConfigurations are used to select evaluations and override parameters of evaluations for particular columns. When ColumnStatisticsConfigurations is undefined, the profile job will profile all supported columns and run all supported evaluations.
- DatasetStatisticsConfiguration
-
- Type: StatisticsConfiguration structure
Configuration for inter-column evaluations. Configuration can be used to select evaluations and override parameters of evaluations. When configuration is undefined, the profile job will run all supported inter-column evaluations.
- EntityDetectorConfiguration
-
- Type: EntityDetectorConfiguration structure
Configuration of entity detection for a profile job. When undefined, entity detection is disabled.
- ProfileColumns
-
- Type: Array of ColumnSelector structures
List of column selectors. ProfileColumns can be used to select columns from the dataset. When ProfileColumns is undefined, the profile job will profile all supported columns.
Project
Description
Represents all of the attributes of a DataBrew project.
Members
- AccountId
-
- Type: string
The ID of the Amazon Web Services account that owns the project.
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the project was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who crated the project.
- DatasetName
-
- Type: string
The dataset that the project is to act upon.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the project.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last modification date and time for the project.
- Name
-
- Required: Yes
- Type: string
The unique name of a project.
- OpenDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the project was opened.
- OpenedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user that opened the project for use.
- RecipeName
-
- Required: Yes
- Type: string
The name of a recipe that will be developed during a project session.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) for the project.
- RoleArn
-
- Type: string
The Amazon Resource Name (ARN) of the role that will be assumed for this project.
- Sample
-
- Type: Sample structure
The sample size and sampling type to apply to the data. If this parameter isn't specified, then the sample consists of the first 500 rows from the dataset.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the project.
Recipe
Description
Represents one or more actions to be performed on a DataBrew dataset.
Members
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the recipe was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the recipe.
- Description
-
- Type: string
The description of the recipe.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the recipe.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The last modification date and time of the recipe.
- Name
-
- Required: Yes
- Type: string
The unique name for the recipe.
- ProjectName
-
- Type: string
The name of the project that the recipe is associated with.
- PublishedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who published the recipe.
- PublishedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the recipe was published.
- RecipeVersion
-
- Type: string
The identifier for the version for the recipe. Must be one of the following:
-
Numeric version (
X.Y
) -X
andY
stand for major and minor version numbers. The maximum length of each is 6 digits, and neither can be negative values. BothX
andY
are required, and "0.0" isn't a valid version. -
LATEST_WORKING
- the most recent valid version being developed in a DataBrew project. -
LATEST_PUBLISHED
- the most recent published version.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) for the recipe.
- Steps
-
- Type: Array of RecipeStep structures
A list of steps that are defined by the recipe.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the recipe.
RecipeAction
Description
Represents a transformation and associated parameters that are used to apply a change to a DataBrew dataset. For more information, see Recipe actions reference.
Members
- Operation
-
- Required: Yes
- Type: string
The name of a valid DataBrew transformation to be performed on the data.
- Parameters
-
- Type: Associative array of custom strings keys (ParameterName) to strings
Contextual parameters for the transformation.
RecipeReference
Description
Represents the name and version of a DataBrew recipe.
Members
- Name
-
- Required: Yes
- Type: string
The name of the recipe.
- RecipeVersion
-
- Type: string
The identifier for the version for the recipe.
RecipeStep
Description
Represents a single step from a DataBrew recipe to be performed.
Members
- Action
-
- Required: Yes
- Type: RecipeAction structure
The particular action to be performed in the recipe step.
- ConditionExpressions
-
- Type: Array of ConditionExpression structures
One or more conditions that must be met for the recipe step to succeed.
All of the conditions in the array must be met. In other words, all of the conditions must be combined using a logical AND operation.
RecipeVersionErrorDetail
Description
Represents any errors encountered when attempting to delete multiple recipe versions.
Members
- ErrorCode
-
- Type: string
The HTTP status code for the error.
- ErrorMessage
-
- Type: string
The text of the error message.
- RecipeVersion
-
- Type: string
The identifier for the recipe version associated with this error.
ResourceNotFoundException
Description
One or more resources can't be found.
Members
- Message
-
- Type: string
Rule
Description
Represents a single data quality requirement that should be validated in the scope of this dataset.
Members
- CheckExpression
-
- Required: Yes
- Type: string
The expression which includes column references, condition names followed by variable references, possibly grouped and combined with other conditions. For example,
(:col1 starts_with :prefix1 or :col1 starts_with :prefix2) and (:col1 ends_with :suffix1 or :col1 ends_with :suffix2)
. Column and value references are substitution variables that should start with the ':' symbol. Depending on the context, substitution variables' values can be either an actual value or a column name. These values are defined in the SubstitutionMap. If a CheckExpression starts with a column reference, then ColumnSelectors in the rule should be null. If ColumnSelectors has been defined, then there should be no column reference in the left side of a condition, for example,is_between :val1 and :val2
.For more information, see Available checks
- ColumnSelectors
-
- Type: Array of ColumnSelector structures
List of column selectors. Selectors can be used to select columns using a name or regular expression from the dataset. Rule will be applied to selected columns.
- Disabled
-
- Type: boolean
A value that specifies whether the rule is disabled. Once a rule is disabled, a profile job will not validate it during a job run. Default value is false.
- Name
-
- Required: Yes
- Type: string
The name of the rule.
- SubstitutionMap
-
- Type: Associative array of custom strings keys (ValueReference) to strings
The map of substitution variable names to their values used in a check expression. Variable names should start with a ':' (colon). Variable values can either be actual values or column names. To differentiate between the two, column names should be enclosed in backticks, for example,
":col1": "`Column A`".
- Threshold
-
- Type: Threshold structure
The threshold used with a non-aggregate check expression. Non-aggregate check expressions will be applied to each row in a specific column, and the threshold will be used to determine whether the validation succeeds.
RulesetItem
Description
Contains metadata about the ruleset.
Members
- AccountId
-
- Type: string
The ID of the Amazon Web Services account that owns the ruleset.
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the ruleset was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the ruleset.
- Description
-
- Type: string
The description of the ruleset.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the ruleset.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The modification date and time of the ruleset.
- Name
-
- Required: Yes
- Type: string
The name of the ruleset.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) for the ruleset.
- RuleCount
-
- Type: int
The number of rules that are defined in the ruleset.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the ruleset.
- TargetArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) of a resource (dataset) that the ruleset is associated with.
S3Location
Description
Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.
Members
- Bucket
-
- Required: Yes
- Type: string
The Amazon S3 bucket name.
- BucketOwner
-
- Type: string
The Amazon Web Services account ID of the bucket owner.
- Key
-
- Type: string
The unique name of the object in the bucket.
S3TableOutputOptions
Description
Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.
Members
- Location
-
- Required: Yes
- Type: S3Location structure
Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
Sample
Description
Represents the sample size and sampling type for DataBrew to use for interactive data analysis.
Members
- Size
-
- Type: int
The number of rows in the sample.
- Type
-
- Required: Yes
- Type: string
The way in which DataBrew obtains rows from a dataset.
Schedule
Description
Represents one or more dates and times when a job is to run.
Members
- AccountId
-
- Type: string
The ID of the Amazon Web Services account that owns the schedule.
- CreateDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time that the schedule was created.
- CreatedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who created the schedule.
- CronExpression
-
- Type: string
The dates and times when the job is to run. For more information, see Cron expressions in the Glue DataBrew Developer Guide.
- JobNames
-
- Type: Array of strings
A list of jobs to be run, according to the schedule.
- LastModifiedBy
-
- Type: string
The Amazon Resource Name (ARN) of the user who last modified the schedule.
- LastModifiedDate
-
- Type: timestamp (string|DateTime or anything parsable by strtotime)
The date and time when the schedule was last modified.
- Name
-
- Required: Yes
- Type: string
The name of the schedule.
- ResourceArn
-
- Type: string
The Amazon Resource Name (ARN) of the schedule.
- Tags
-
- Type: Associative array of custom strings keys (TagKey) to strings
Metadata tags that have been applied to the schedule.
ServiceQuotaExceededException
Description
A service quota is exceeded.
Members
- Message
-
- Type: string
StatisticOverride
Description
Override of a particular evaluation for a profile job.
Members
- Parameters
-
- Required: Yes
- Type: Associative array of custom strings keys (ParameterName) to strings
A map that includes overrides of an evaluation’s parameters.
- Statistic
-
- Required: Yes
- Type: string
The name of an evaluation
StatisticsConfiguration
Description
Configuration of evaluations for a profile job. This configuration can be used to select evaluations and override the parameters of selected evaluations.
Members
- IncludedStatistics
-
- Type: Array of strings
List of included evaluations. When the list is undefined, all supported evaluations will be included.
- Overrides
-
- Type: Array of StatisticOverride structures
List of overrides for evaluations.
Threshold
Description
The threshold used with a non-aggregate check expression. The non-aggregate check expression will be applied to each row in a specific column. Then the threshold will be used to determine whether the validation succeeds.
Members
- Type
-
- Type: string
The type of a threshold. Used for comparison of an actual count of rows that satisfy the rule to the threshold value.
- Unit
-
- Type: string
Unit of threshold value. Can be either a COUNT or PERCENTAGE of the full sample size used for validation.
- Value
-
- Required: Yes
- Type: double
The value of a threshold.
ValidationConfiguration
Description
Configuration for data quality validation. Used to select the Rulesets and Validation Mode to be used in the profile job. When ValidationConfiguration is null, the profile job will run without data quality validation.
Members
- RulesetArn
-
- Required: Yes
- Type: string
The Amazon Resource Name (ARN) for the ruleset to be validated in the profile job. The TargetArn of the selected ruleset should be the same as the Amazon Resource Name (ARN) of the dataset that is associated with the profile job.
- ValidationMode
-
- Type: string
Mode of data quality validation. Default mode is “CHECK_ALL” which verifies all rules defined in the selected ruleset.
ValidationException
Description
The input parameters for this request failed validation.
Members
- Message
-
- Type: string
ViewFrame
Description
Represents the data being transformed during an action.
Members
- Analytics
-
- Type: string
Controls if analytics computation is enabled or disabled. Enabled by default.
- ColumnRange
-
- Type: int
The number of columns to include in the view frame, beginning with the
StartColumnIndex
value and ignoring any columns in theHiddenColumns
list. - HiddenColumns
-
- Type: Array of strings
A list of columns to hide in the view frame.
- RowRange
-
- Type: int
The number of rows to include in the view frame, beginning with the
StartRowIndex
value. - StartColumnIndex
-
- Required: Yes
- Type: int
The starting index for the range of columns to return in the view frame.
- StartRowIndex
-
- Type: int
The starting index for the range of rows to return in the view frame.