CfnJob
- class aws_cdk.aws_databrew.CfnJob(scope, id, *, name, role_arn, type, database_outputs=None, data_catalog_outputs=None, dataset_name=None, encryption_key_arn=None, encryption_mode=None, job_sample=None, log_subscription=None, max_capacity=None, max_retries=None, output_location=None, outputs=None, profile_configuration=None, project_name=None, recipe=None, tags=None, timeout=None, validation_configurations=None)
Bases:
CfnResource
Specifies a new DataBrew job.
- See:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-databrew-job.html
- CloudformationResource:
AWS::DataBrew::Job
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew cfn_job = databrew.CfnJob(self, "MyCfnJob", name="name", role_arn="roleArn", type="type", # the properties below are optional database_outputs=[databrew.CfnJob.DatabaseOutputProperty( database_options=databrew.CfnJob.DatabaseTableOutputOptionsProperty( table_name="tableName", # the properties below are optional temp_directory=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ), glue_connection_name="glueConnectionName", # the properties below are optional database_output_mode="databaseOutputMode" )], data_catalog_outputs=[databrew.CfnJob.DataCatalogOutputProperty( database_name="databaseName", table_name="tableName", # the properties below are optional catalog_id="catalogId", database_options=databrew.CfnJob.DatabaseTableOutputOptionsProperty( table_name="tableName", # the properties below are optional temp_directory=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ), overwrite=False, s3_options=databrew.CfnJob.S3TableOutputOptionsProperty( location=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ) )], dataset_name="datasetName", encryption_key_arn="encryptionKeyArn", encryption_mode="encryptionMode", job_sample=databrew.CfnJob.JobSampleProperty( mode="mode", size=123 ), log_subscription="logSubscription", max_capacity=123, max_retries=123, output_location=databrew.CfnJob.OutputLocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ), outputs=[databrew.CfnJob.OutputProperty( location=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ), # the properties below are optional compression_format="compressionFormat", format="format", format_options=databrew.CfnJob.OutputFormatOptionsProperty( csv=databrew.CfnJob.CsvOutputOptionsProperty( delimiter="delimiter" ) ), max_output_files=123, overwrite=False, partition_columns=["partitionColumns"] )], profile_configuration=databrew.CfnJob.ProfileConfigurationProperty( column_statistics_configurations=[databrew.CfnJob.ColumnStatisticsConfigurationProperty( statistics=databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] ), # the properties below are optional selectors=[databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )] )], dataset_statistics_configuration=databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] ), entity_detector_configuration=databrew.CfnJob.EntityDetectorConfigurationProperty( entity_types=["entityTypes"], # the properties below are optional allowed_statistics=databrew.CfnJob.AllowedStatisticsProperty( statistics=["statistics"] ) ), profile_columns=[databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )] ), project_name="projectName", recipe=databrew.CfnJob.RecipeProperty( name="name", # the properties below are optional version="version" ), tags=[CfnTag( key="key", value="value" )], timeout=123, validation_configurations=[databrew.CfnJob.ValidationConfigurationProperty( ruleset_arn="rulesetArn", # the properties below are optional validation_mode="validationMode" )] )
- Parameters:
scope (
Construct
) – Scope in which this resource is defined.id (
str
) – Construct identifier for this resource (unique in its scope).name (
str
) – The unique name of the job.role_arn (
str
) – The Amazon Resource Name (ARN) of the role to be assumed for this job.type (
str
) – The job type of the job, which must be one of the following:. -PROFILE
- A job to analyze a dataset, to determine its size, data types, data distribution, and more. -RECIPE
- A job to apply one or more transformations to a dataset.database_outputs (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,DatabaseOutputProperty
,Dict
[str
,Any
]]],None
]) – Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.data_catalog_outputs (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,DataCatalogOutputProperty
,Dict
[str
,Any
]]],None
]) – One or more artifacts that represent the AWS Glue Data Catalog output from running the job.dataset_name (
Optional
[str
]) – A dataset that the job is to process.encryption_key_arn (
Optional
[str
]) – The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output. For more information, see Encrypting data written by DataBrew jobsencryption_mode (
Optional
[str
]) – The encryption mode for the job, which can be one of the following:. -SSE-KMS
- Server-side encryption with keys managed by AWS KMS . -SSE-S3
- Server-side encryption with keys managed by Amazon S3.job_sample (
Union
[IResolvable
,JobSampleProperty
,Dict
[str
,Any
],None
]) – A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If aJobSample
value isn’t provided, the default value is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.log_subscription (
Optional
[str
]) – The current status of Amazon CloudWatch logging for the job.max_capacity (
Union
[int
,float
,None
]) – The maximum number of nodes that can be consumed when the job processes data.max_retries (
Union
[int
,float
,None
]) – The maximum number of times to retry the job after a job run fails.output_location (
Union
[IResolvable
,OutputLocationProperty
,Dict
[str
,Any
],None
]) – The location in Amazon S3 where the job writes its output.outputs (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,OutputProperty
,Dict
[str
,Any
]]],None
]) – One or more artifacts that represent output from running the job.profile_configuration (
Union
[IResolvable
,ProfileConfigurationProperty
,Dict
[str
,Any
],None
]) – Configuration for profile jobs. Configuration can be used to select columns, do evaluations, and override default parameters of evaluations. When configuration is undefined, the profile job will apply default settings to all supported columns.project_name (
Optional
[str
]) – The name of the project that the job is associated with.recipe (
Union
[IResolvable
,RecipeProperty
,Dict
[str
,Any
],None
]) – A series of data transformation steps that the job runs.tags (
Optional
[Sequence
[Union
[CfnTag
,Dict
[str
,Any
]]]]) – Metadata tags that have been applied to the job.timeout (
Union
[int
,float
,None
]) – The job’s timeout in minutes. A job that attempts to run longer than this timeout period ends with a status ofTIMEOUT
.validation_configurations (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,ValidationConfigurationProperty
,Dict
[str
,Any
]]],None
]) – List of validation configurations that are applied to the profile job.
Methods
- add_deletion_override(path)
Syntactic sugar for
addOverride(path, undefined)
.- Parameters:
path (
str
) – The path of the value to delete.- Return type:
None
- add_dependency(target)
Indicates that this resource depends on another resource and cannot be provisioned unless the other resource has been successfully provisioned.
This can be used for resources across stacks (or nested stack) boundaries and the dependency will automatically be transferred to the relevant scope.
- Parameters:
target (
CfnResource
) –- Return type:
None
- add_depends_on(target)
(deprecated) Indicates that this resource depends on another resource and cannot be provisioned unless the other resource has been successfully provisioned.
- Parameters:
target (
CfnResource
) –- Deprecated:
use addDependency
- Stability:
deprecated
- Return type:
None
- add_metadata(key, value)
Add a value to the CloudFormation Resource Metadata.
- Parameters:
key (
str
) –value (
Any
) –
- See:
- Return type:
None
Note that this is a different set of metadata from CDK node metadata; this metadata ends up in the stack template under the resource, whereas CDK node metadata ends up in the Cloud Assembly.
- add_override(path, value)
Adds an override to the synthesized CloudFormation resource.
To add a property override, either use
addPropertyOverride
or prefixpath
with “Properties.” (i.e.Properties.TopicName
).If the override is nested, separate each nested level using a dot (.) in the path parameter. If there is an array as part of the nesting, specify the index in the path.
To include a literal
.
in the property name, prefix with a\
. In most programming languages you will need to write this as"\\."
because the\
itself will need to be escaped.For example:
cfn_resource.add_override("Properties.GlobalSecondaryIndexes.0.Projection.NonKeyAttributes", ["myattribute"]) cfn_resource.add_override("Properties.GlobalSecondaryIndexes.1.ProjectionType", "INCLUDE")
would add the overrides Example:
"Properties": { "GlobalSecondaryIndexes": [ { "Projection": { "NonKeyAttributes": [ "myattribute" ] ... } ... }, { "ProjectionType": "INCLUDE" ... }, ] ... }
The
value
argument toaddOverride
will not be processed or translated in any way. Pass raw JSON values in here with the correct capitalization for CloudFormation. If you pass CDK classes or structs, they will be rendered with lowercased key names, and CloudFormation will reject the template.- Parameters:
path (
str
) –The path of the property, you can use dot notation to override values in complex types. Any intermediate keys will be created as needed.
value (
Any
) –The value. Could be primitive or complex.
- Return type:
None
- add_property_deletion_override(property_path)
Adds an override that deletes the value of a property from the resource definition.
- Parameters:
property_path (
str
) – The path to the property.- Return type:
None
- add_property_override(property_path, value)
Adds an override to a resource property.
Syntactic sugar for
addOverride("Properties.<...>", value)
.- Parameters:
property_path (
str
) – The path of the property.value (
Any
) – The value.
- Return type:
None
- apply_removal_policy(policy=None, *, apply_to_update_replace_policy=None, default=None)
Sets the deletion policy of the resource based on the removal policy specified.
The Removal Policy controls what happens to this resource when it stops being managed by CloudFormation, either because you’ve removed it from the CDK application or because you’ve made a change that requires the resource to be replaced.
The resource can be deleted (
RemovalPolicy.DESTROY
), or left in your AWS account for data recovery and cleanup later (RemovalPolicy.RETAIN
). In some cases, a snapshot can be taken of the resource prior to deletion (RemovalPolicy.SNAPSHOT
). A list of resources that support this policy can be found in the following link:- Parameters:
policy (
Optional
[RemovalPolicy
]) –apply_to_update_replace_policy (
Optional
[bool
]) – Apply the same deletion policy to the resource’s “UpdateReplacePolicy”. Default: truedefault (
Optional
[RemovalPolicy
]) – The default policy to apply in case the removal policy is not defined. Default: - Default value is resource specific. To determine the default value for a resource, please consult that specific resource’s documentation.
- See:
- Return type:
None
- get_att(attribute_name, type_hint=None)
Returns a token for an runtime attribute of this resource.
Ideally, use generated attribute accessors (e.g.
resource.arn
), but this can be used for future compatibility in case there is no generated attribute.- Parameters:
attribute_name (
str
) – The name of the attribute.type_hint (
Optional
[ResolutionTypeHint
]) –
- Return type:
- get_metadata(key)
Retrieve a value value from the CloudFormation Resource Metadata.
- Parameters:
key (
str
) –- See:
- Return type:
Any
Note that this is a different set of metadata from CDK node metadata; this metadata ends up in the stack template under the resource, whereas CDK node metadata ends up in the Cloud Assembly.
- inspect(inspector)
Examines the CloudFormation resource and discloses attributes.
- Parameters:
inspector (
TreeInspector
) – tree inspector to collect and process attributes.- Return type:
None
- obtain_dependencies()
Retrieves an array of resources this resource depends on.
This assembles dependencies on resources across stacks (including nested stacks) automatically.
- Return type:
List
[Union
[Stack
,CfnResource
]]
- obtain_resource_dependencies()
Get a shallow copy of dependencies between this resource and other resources in the same stack.
- Return type:
List
[CfnResource
]
- override_logical_id(new_logical_id)
Overrides the auto-generated logical ID with a specific ID.
- Parameters:
new_logical_id (
str
) – The new logical ID to use for this stack element.- Return type:
None
- remove_dependency(target)
Indicates that this resource no longer depends on another resource.
This can be used for resources across stacks (including nested stacks) and the dependency will automatically be removed from the relevant scope.
- Parameters:
target (
CfnResource
) –- Return type:
None
- replace_dependency(target, new_target)
Replaces one dependency with another.
- Parameters:
target (
CfnResource
) – The dependency to replace.new_target (
CfnResource
) – The new dependency to add.
- Return type:
None
- to_string()
Returns a string representation of this construct.
- Return type:
str
- Returns:
a string representation of this resource
Attributes
- CFN_RESOURCE_TYPE_NAME = 'AWS::DataBrew::Job'
- cfn_options
Options for this resource, such as condition, update policy etc.
- cfn_resource_type
AWS resource type.
- creation_stack
return:
the stack trace of the point where this Resource was created from, sourced from the +metadata+ entry typed +aws:cdk:logicalId+, and with the bottom-most node +internal+ entries filtered.
- data_catalog_outputs
One or more artifacts that represent the AWS Glue Data Catalog output from running the job.
- database_outputs
Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.
- dataset_name
A dataset that the job is to process.
- encryption_key_arn
The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output.
- encryption_mode
.
- Type:
The encryption mode for the job, which can be one of the following
- job_sample
A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run.
- log_subscription
The current status of Amazon CloudWatch logging for the job.
- logical_id
The logical ID for this CloudFormation stack element.
The logical ID of the element is calculated from the path of the resource node in the construct tree.
To override this value, use
overrideLogicalId(newLogicalId)
.- Returns:
the logical ID as a stringified token. This value will only get resolved during synthesis.
- max_capacity
The maximum number of nodes that can be consumed when the job processes data.
- max_retries
The maximum number of times to retry the job after a job run fails.
- name
The unique name of the job.
- node
The tree node.
- output_location
The location in Amazon S3 where the job writes its output.
- outputs
One or more artifacts that represent output from running the job.
- profile_configuration
Configuration for profile jobs.
- project_name
The name of the project that the job is associated with.
- recipe
A series of data transformation steps that the job runs.
- ref
Return a string that will be resolved to a CloudFormation
{ Ref }
for this element.If, by any chance, the intrinsic reference of a resource is not a string, you could coerce it to an IResolvable through
Lazy.any({ produce: resource.ref })
.
- role_arn
The Amazon Resource Name (ARN) of the role to be assumed for this job.
- stack
The stack in which this element is defined.
CfnElements must be defined within a stack scope (directly or indirectly).
- tags
Tag Manager which manages the tags for this resource.
- tags_raw
Metadata tags that have been applied to the job.
- timeout
The job’s timeout in minutes.
- type
.
- Type:
The job type of the job, which must be one of the following
- validation_configurations
List of validation configurations that are applied to the profile job.
Static Methods
- classmethod is_cfn_element(x)
Returns
true
if a construct is a stack element (i.e. part of the synthesized cloudformation template).Uses duck-typing instead of
instanceof
to allow stack elements from different versions of this library to be included in the same stack.- Parameters:
x (
Any
) –- Return type:
bool
- Returns:
The construct as a stack element or undefined if it is not a stack element.
- classmethod is_cfn_resource(x)
Check whether the given object is a CfnResource.
- Parameters:
x (
Any
) –- Return type:
bool
- classmethod is_construct(x)
Checks if
x
is a construct.Use this method instead of
instanceof
to properly detectConstruct
instances, even when the construct library is symlinked.Explanation: in JavaScript, multiple copies of the
constructs
library on disk are seen as independent, completely different libraries. As a consequence, the classConstruct
in each copy of theconstructs
library is seen as a different class, and an instance of one class will not test asinstanceof
the other class.npm install
will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of theconstructs
library can be accidentally installed, andinstanceof
will behave unpredictably. It is safest to avoid usinginstanceof
, and using this type-testing method instead.- Parameters:
x (
Any
) – Any object.- Return type:
bool
- Returns:
true if
x
is an object created from a class which extendsConstruct
.
AllowedStatisticsProperty
- class CfnJob.AllowedStatisticsProperty(*, statistics)
Bases:
object
Configuration of statistics that are allowed to be run on columns that contain detected entities.
When undefined, no statistics will be computed on columns that contain detected entities.
- Parameters:
statistics (
Sequence
[str
]) – One or more column statistics to allow for columns that contain detected entities.- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew allowed_statistics_property = databrew.CfnJob.AllowedStatisticsProperty( statistics=["statistics"] )
Attributes
- statistics
One or more column statistics to allow for columns that contain detected entities.
ColumnSelectorProperty
- class CfnJob.ColumnSelectorProperty(*, name=None, regex=None)
Bases:
object
Selector of a column from a dataset for profile job configuration.
One selector includes either a column name or a regular expression.
- Parameters:
name (
Optional
[str
]) – The name of a column from a dataset.regex (
Optional
[str
]) – A regular expression for selecting a column from a dataset.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew column_selector_property = databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )
Attributes
- name
The name of a column from a dataset.
- regex
A regular expression for selecting a column from a dataset.
ColumnStatisticsConfigurationProperty
- class CfnJob.ColumnStatisticsConfigurationProperty(*, statistics, selectors=None)
Bases:
object
Configuration for column evaluations for a profile job.
ColumnStatisticsConfiguration can be used to select evaluations and override parameters of evaluations for particular columns.
- Parameters:
statistics (
Union
[IResolvable
,StatisticsConfigurationProperty
,Dict
[str
,Any
]]) – Configuration for evaluations. Statistics can be used to select evaluations and override parameters of evaluations.selectors (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,ColumnSelectorProperty
,Dict
[str
,Any
]]],None
]) – List of column selectors. Selectors can be used to select columns from the dataset. When selectors are undefined, configuration will be applied to all supported columns.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew column_statistics_configuration_property = databrew.CfnJob.ColumnStatisticsConfigurationProperty( statistics=databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] ), # the properties below are optional selectors=[databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )] )
Attributes
- selectors
List of column selectors.
Selectors can be used to select columns from the dataset. When selectors are undefined, configuration will be applied to all supported columns.
- statistics
Configuration for evaluations.
Statistics can be used to select evaluations and override parameters of evaluations.
CsvOutputOptionsProperty
- class CfnJob.CsvOutputOptionsProperty(*, delimiter=None)
Bases:
object
Represents a set of options that define how DataBrew will write a comma-separated value (CSV) file.
- Parameters:
delimiter (
Optional
[str
]) – A single character that specifies the delimiter used to create CSV job output.- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew csv_output_options_property = databrew.CfnJob.CsvOutputOptionsProperty( delimiter="delimiter" )
Attributes
- delimiter
A single character that specifies the delimiter used to create CSV job output.
DataCatalogOutputProperty
- class CfnJob.DataCatalogOutputProperty(*, database_name, table_name, catalog_id=None, database_options=None, overwrite=None, s3_options=None)
Bases:
object
Represents options that specify how and where in the AWS Glue Data Catalog DataBrew writes the output generated by recipe jobs.
- Parameters:
database_name (
str
) – The name of a database in the Data Catalog.table_name (
str
) – The name of a table in the Data Catalog.catalog_id (
Optional
[str
]) – The unique identifier of the AWS account that holds the Data Catalog that stores the data.database_options (
Union
[IResolvable
,DatabaseTableOutputOptionsProperty
,Dict
[str
,Any
],None
]) – Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.overwrite (
Union
[bool
,IResolvable
,None
]) – A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.s3_options (
Union
[IResolvable
,S3TableOutputOptionsProperty
,Dict
[str
,Any
],None
]) – Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew data_catalog_output_property = databrew.CfnJob.DataCatalogOutputProperty( database_name="databaseName", table_name="tableName", # the properties below are optional catalog_id="catalogId", database_options=databrew.CfnJob.DatabaseTableOutputOptionsProperty( table_name="tableName", # the properties below are optional temp_directory=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ), overwrite=False, s3_options=databrew.CfnJob.S3TableOutputOptionsProperty( location=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ) )
Attributes
- catalog_id
The unique identifier of the AWS account that holds the Data Catalog that stores the data.
- database_name
The name of a database in the Data Catalog.
- database_options
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
- overwrite
A value that, if true, means that any data in the location specified for output is overwritten with new output.
Not supported with DatabaseOptions.
- s3_options
Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.
- table_name
The name of a table in the Data Catalog.
DatabaseOutputProperty
- class CfnJob.DatabaseOutputProperty(*, database_options, glue_connection_name, database_output_mode=None)
Bases:
object
Represents a JDBC database output object which defines the output destination for a DataBrew recipe job to write into.
- Parameters:
database_options (
Union
[IResolvable
,DatabaseTableOutputOptionsProperty
,Dict
[str
,Any
]]) – Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.glue_connection_name (
str
) – The AWS Glue connection that stores the connection information for the target database.database_output_mode (
Optional
[str
]) – The output mode to write into the database. Currently supported option: NEW_TABLE.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew database_output_property = databrew.CfnJob.DatabaseOutputProperty( database_options=databrew.CfnJob.DatabaseTableOutputOptionsProperty( table_name="tableName", # the properties below are optional temp_directory=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) ), glue_connection_name="glueConnectionName", # the properties below are optional database_output_mode="databaseOutputMode" )
Attributes
- database_options
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
- database_output_mode
The output mode to write into the database.
Currently supported option: NEW_TABLE.
- glue_connection_name
The AWS Glue connection that stores the connection information for the target database.
DatabaseTableOutputOptionsProperty
- class CfnJob.DatabaseTableOutputOptionsProperty(*, table_name, temp_directory=None)
Bases:
object
Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
- Parameters:
table_name (
str
) – A prefix for the name of a table DataBrew will create in the database.temp_directory (
Union
[IResolvable
,S3LocationProperty
,Dict
[str
,Any
],None
]) – Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew database_table_output_options_property = databrew.CfnJob.DatabaseTableOutputOptionsProperty( table_name="tableName", # the properties below are optional temp_directory=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) )
Attributes
- table_name
A prefix for the name of a table DataBrew will create in the database.
- temp_directory
Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
EntityDetectorConfigurationProperty
- class CfnJob.EntityDetectorConfigurationProperty(*, entity_types, allowed_statistics=None)
Bases:
object
Configuration of entity detection for a profile job.
When undefined, entity detection is disabled.
- Parameters:
entity_types (
Sequence
[str
]) – Entity types to detect. Can be any of the following:. - USA_SSN - EMAIL - USA_ITIN - USA_PASSPORT_NUMBER - PHONE_NUMBER - USA_DRIVING_LICENSE - BANK_ACCOUNT - CREDIT_CARD - IP_ADDRESS - MAC_ADDRESS - USA_DEA_NUMBER - USA_HCPCS_CODE - USA_NATIONAL_PROVIDER_IDENTIFIER - USA_NATIONAL_DRUG_CODE - USA_HEALTH_INSURANCE_CLAIM_NUMBER - USA_MEDICARE_BENEFICIARY_IDENTIFIER - USA_CPT_CODE - PERSON_NAME - DATE The Entity type group USA_ALL is also supported, and includes all of the above entity types except PERSON_NAME and DATE.allowed_statistics (
Union
[IResolvable
,AllowedStatisticsProperty
,Dict
[str
,Any
],None
]) – Configuration of statistics that are allowed to be run on columns that contain detected entities. When undefined, no statistics will be computed on columns that contain detected entities.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew entity_detector_configuration_property = databrew.CfnJob.EntityDetectorConfigurationProperty( entity_types=["entityTypes"], # the properties below are optional allowed_statistics=databrew.CfnJob.AllowedStatisticsProperty( statistics=["statistics"] ) )
Attributes
- allowed_statistics
Configuration of statistics that are allowed to be run on columns that contain detected entities.
When undefined, no statistics will be computed on columns that contain detected entities.
- entity_types
.
USA_SSN
EMAIL
USA_ITIN
USA_PASSPORT_NUMBER
PHONE_NUMBER
USA_DRIVING_LICENSE
BANK_ACCOUNT
CREDIT_CARD
IP_ADDRESS
MAC_ADDRESS
USA_DEA_NUMBER
USA_HCPCS_CODE
USA_NATIONAL_PROVIDER_IDENTIFIER
USA_NATIONAL_DRUG_CODE
USA_HEALTH_INSURANCE_CLAIM_NUMBER
USA_MEDICARE_BENEFICIARY_IDENTIFIER
USA_CPT_CODE
PERSON_NAME
DATE
The Entity type group USA_ALL is also supported, and includes all of the above entity types except PERSON_NAME and DATE.
- See:
- Type:
Entity types to detect. Can be any of the following
JobSampleProperty
- class CfnJob.JobSampleProperty(*, mode=None, size=None)
Bases:
object
A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run.
If a
JobSample
value isn’t provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.- Parameters:
mode (
Optional
[str
]) – A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following: - FULL_DATASET - The profile job is run on the entire dataset. - CUSTOM_ROWS - The profile job is run on the number of rows specified in theSize
parameter.size (
Union
[int
,float
,None
]) – TheSize
parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE. Long.MAX_VALUE = 9223372036854775807
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew job_sample_property = databrew.CfnJob.JobSampleProperty( mode="mode", size=123 )
Attributes
- mode
A value that determines whether the profile job is run on the entire dataset or a specified number of rows.
This value must be one of the following:
FULL_DATASET - The profile job is run on the entire dataset.
CUSTOM_ROWS - The profile job is run on the number of rows specified in the
Size
parameter.
- size
The
Size
parameter is only required when the mode is CUSTOM_ROWS.The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.
Long.MAX_VALUE = 9223372036854775807
OutputFormatOptionsProperty
- class CfnJob.OutputFormatOptionsProperty(*, csv=None)
Bases:
object
Represents a set of options that define the structure of comma-separated (CSV) job output.
- Parameters:
csv (
Union
[IResolvable
,CsvOutputOptionsProperty
,Dict
[str
,Any
],None
]) – Represents a set of options that define the structure of comma-separated value (CSV) job output.- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew output_format_options_property = databrew.CfnJob.OutputFormatOptionsProperty( csv=databrew.CfnJob.CsvOutputOptionsProperty( delimiter="delimiter" ) )
Attributes
- csv
Represents a set of options that define the structure of comma-separated value (CSV) job output.
OutputLocationProperty
- class CfnJob.OutputLocationProperty(*, bucket, bucket_owner=None, key=None)
Bases:
object
The location in Amazon S3 or AWS Glue Data Catalog where the job writes its output.
- Parameters:
bucket (
str
) – The Amazon S3 bucket name.bucket_owner (
Optional
[str
]) –key (
Optional
[str
]) – The unique name of the object in the bucket.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew output_location_property = databrew.CfnJob.OutputLocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" )
Attributes
- bucket
The Amazon S3 bucket name.
- bucket_owner
-
- Type:
see
- key
The unique name of the object in the bucket.
OutputProperty
- class CfnJob.OutputProperty(*, location, compression_format=None, format=None, format_options=None, max_output_files=None, overwrite=None, partition_columns=None)
Bases:
object
Represents options that specify how and where in Amazon S3 DataBrew writes the output generated by recipe jobs or profile jobs.
- Parameters:
location (
Union
[IResolvable
,S3LocationProperty
,Dict
[str
,Any
]]) – The location in Amazon S3 where the job writes its output.compression_format (
Optional
[str
]) – The compression algorithm used to compress the output text of the job.format (
Optional
[str
]) – The data format of the output of the job.format_options (
Union
[IResolvable
,OutputFormatOptionsProperty
,Dict
[str
,Any
],None
]) – Represents options that define how DataBrew formats job output files.max_output_files (
Union
[int
,float
,None
]) – The maximum number of files to be generated by the job and written to the output folder.overwrite (
Union
[bool
,IResolvable
,None
]) – A value that, if true, means that any data in the location specified for output is overwritten with new output.partition_columns (
Optional
[Sequence
[str
]]) – The names of one or more partition columns for the output of the job.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew output_property = databrew.CfnJob.OutputProperty( location=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ), # the properties below are optional compression_format="compressionFormat", format="format", format_options=databrew.CfnJob.OutputFormatOptionsProperty( csv=databrew.CfnJob.CsvOutputOptionsProperty( delimiter="delimiter" ) ), max_output_files=123, overwrite=False, partition_columns=["partitionColumns"] )
Attributes
- compression_format
The compression algorithm used to compress the output text of the job.
- format
The data format of the output of the job.
- format_options
Represents options that define how DataBrew formats job output files.
- location
The location in Amazon S3 where the job writes its output.
- max_output_files
The maximum number of files to be generated by the job and written to the output folder.
- overwrite
A value that, if true, means that any data in the location specified for output is overwritten with new output.
- partition_columns
The names of one or more partition columns for the output of the job.
ProfileConfigurationProperty
- class CfnJob.ProfileConfigurationProperty(*, column_statistics_configurations=None, dataset_statistics_configuration=None, entity_detector_configuration=None, profile_columns=None)
Bases:
object
Configuration for profile jobs.
Configuration can be used to select columns, do evaluations, and override default parameters of evaluations. When configuration is undefined, the profile job will apply default settings to all supported columns.
- Parameters:
column_statistics_configurations (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,ColumnStatisticsConfigurationProperty
,Dict
[str
,Any
]]],None
]) – List of configurations for column evaluations. ColumnStatisticsConfigurations are used to select evaluations and override parameters of evaluations for particular columns. When ColumnStatisticsConfigurations is undefined, the profile job will profile all supported columns and run all supported evaluations.dataset_statistics_configuration (
Union
[IResolvable
,StatisticsConfigurationProperty
,Dict
[str
,Any
],None
]) – Configuration for inter-column evaluations. Configuration can be used to select evaluations and override parameters of evaluations. When configuration is undefined, the profile job will run all supported inter-column evaluations.entity_detector_configuration (
Union
[IResolvable
,EntityDetectorConfigurationProperty
,Dict
[str
,Any
],None
]) – Configuration of entity detection for a profile job. When undefined, entity detection is disabled.profile_columns (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,ColumnSelectorProperty
,Dict
[str
,Any
]]],None
]) – List of column selectors. ProfileColumns can be used to select columns from the dataset. When ProfileColumns is undefined, the profile job will profile all supported columns.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew profile_configuration_property = databrew.CfnJob.ProfileConfigurationProperty( column_statistics_configurations=[databrew.CfnJob.ColumnStatisticsConfigurationProperty( statistics=databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] ), # the properties below are optional selectors=[databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )] )], dataset_statistics_configuration=databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] ), entity_detector_configuration=databrew.CfnJob.EntityDetectorConfigurationProperty( entity_types=["entityTypes"], # the properties below are optional allowed_statistics=databrew.CfnJob.AllowedStatisticsProperty( statistics=["statistics"] ) ), profile_columns=[databrew.CfnJob.ColumnSelectorProperty( name="name", regex="regex" )] )
Attributes
- column_statistics_configurations
List of configurations for column evaluations.
ColumnStatisticsConfigurations are used to select evaluations and override parameters of evaluations for particular columns. When ColumnStatisticsConfigurations is undefined, the profile job will profile all supported columns and run all supported evaluations.
- dataset_statistics_configuration
Configuration for inter-column evaluations.
Configuration can be used to select evaluations and override parameters of evaluations. When configuration is undefined, the profile job will run all supported inter-column evaluations.
- entity_detector_configuration
Configuration of entity detection for a profile job.
When undefined, entity detection is disabled.
- profile_columns
List of column selectors.
ProfileColumns can be used to select columns from the dataset. When ProfileColumns is undefined, the profile job will profile all supported columns.
RecipeProperty
- class CfnJob.RecipeProperty(*, name, version=None)
Bases:
object
Represents one or more actions to be performed on a DataBrew dataset.
- Parameters:
name (
str
) – The unique name for the recipe.version (
Optional
[str
]) – The identifier for the version for the recipe.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew recipe_property = databrew.CfnJob.RecipeProperty( name="name", # the properties below are optional version="version" )
Attributes
- name
The unique name for the recipe.
- version
The identifier for the version for the recipe.
S3LocationProperty
- class CfnJob.S3LocationProperty(*, bucket, bucket_owner=None, key=None)
Bases:
object
Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.
- Parameters:
bucket (
str
) – The Amazon S3 bucket name.bucket_owner (
Optional
[str
]) – The AWS account ID of the bucket owner.key (
Optional
[str
]) – The unique name of the object in the bucket.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew s3_location_property = databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" )
Attributes
- bucket
The Amazon S3 bucket name.
- bucket_owner
The AWS account ID of the bucket owner.
- key
The unique name of the object in the bucket.
S3TableOutputOptionsProperty
- class CfnJob.S3TableOutputOptionsProperty(*, location)
Bases:
object
Represents options that specify how and where DataBrew writes the Amazon S3 output generated by recipe jobs.
- Parameters:
location (
Union
[IResolvable
,S3LocationProperty
,Dict
[str
,Any
]]) – Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew s3_table_output_options_property = databrew.CfnJob.S3TableOutputOptionsProperty( location=databrew.CfnJob.S3LocationProperty( bucket="bucket", # the properties below are optional bucket_owner="bucketOwner", key="key" ) )
Attributes
- location
Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
StatisticOverrideProperty
- class CfnJob.StatisticOverrideProperty(*, parameters, statistic)
Bases:
object
Override of a particular evaluation for a profile job.
- Parameters:
parameters (
Union
[IResolvable
,Mapping
[str
,str
]]) – A map that includes overrides of an evaluation’s parameters.statistic (
str
) – The name of an evaluation.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew statistic_override_property = databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )
Attributes
- parameters
A map that includes overrides of an evaluation’s parameters.
- statistic
The name of an evaluation.
StatisticsConfigurationProperty
- class CfnJob.StatisticsConfigurationProperty(*, included_statistics=None, overrides=None)
Bases:
object
Configuration of evaluations for a profile job.
This configuration can be used to select evaluations and override the parameters of selected evaluations.
- Parameters:
included_statistics (
Optional
[Sequence
[str
]]) – List of included evaluations. When the list is undefined, all supported evaluations will be included.overrides (
Union
[IResolvable
,Sequence
[Union
[IResolvable
,StatisticOverrideProperty
,Dict
[str
,Any
]]],None
]) – List of overrides for evaluations.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew statistics_configuration_property = databrew.CfnJob.StatisticsConfigurationProperty( included_statistics=["includedStatistics"], overrides=[databrew.CfnJob.StatisticOverrideProperty( parameters={ "parameters_key": "parameters" }, statistic="statistic" )] )
Attributes
- included_statistics
List of included evaluations.
When the list is undefined, all supported evaluations will be included.
- overrides
List of overrides for evaluations.
ValidationConfigurationProperty
- class CfnJob.ValidationConfigurationProperty(*, ruleset_arn, validation_mode=None)
Bases:
object
Configuration for data quality validation.
Used to select the Rulesets and Validation Mode to be used in the profile job. When ValidationConfiguration is null, the profile job will run without data quality validation.
- Parameters:
ruleset_arn (
str
) – The Amazon Resource Name (ARN) for the ruleset to be validated in the profile job. The TargetArn of the selected ruleset should be the same as the Amazon Resource Name (ARN) of the dataset that is associated with the profile job.validation_mode (
Optional
[str
]) – Mode of data quality validation. Default mode is “CHECK_ALL” which verifies all rules defined in the selected ruleset.
- See:
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. from aws_cdk import aws_databrew as databrew validation_configuration_property = databrew.CfnJob.ValidationConfigurationProperty( ruleset_arn="rulesetArn", # the properties below are optional validation_mode="validationMode" )
Attributes
- ruleset_arn
The Amazon Resource Name (ARN) for the ruleset to be validated in the profile job.
The TargetArn of the selected ruleset should be the same as the Amazon Resource Name (ARN) of the dataset that is associated with the profile job.
- validation_mode
Mode of data quality validation.
Default mode is “CHECK_ALL” which verifies all rules defined in the selected ruleset.