interface ScalaSparkFlexEtlJobProps
Language | Type name |
---|---|
![]() | Amazon.CDK.AWS.Glue.Alpha.ScalaSparkFlexEtlJobProps |
![]() | github.com/aws/aws-cdk-go/awscdkgluealpha/v2#ScalaSparkFlexEtlJobProps |
![]() | software.amazon.awscdk.services.glue.alpha.ScalaSparkFlexEtlJobProps |
![]() | aws_cdk.aws_glue_alpha.ScalaSparkFlexEtlJobProps |
![]() | @aws-cdk/aws-glue-alpha ยป ScalaSparkFlexEtlJobProps |
Flex Jobs class.
Flex jobs supports Python and Scala language. The flexible execution class is appropriate for non-urgent jobs such as pre-production jobs, testing, and one-time data loads. Flexible job runs are supported for jobs using AWS Glue version 3.0 or later and G.1X or G.2X worker types but will default to the latest version of Glue (currently Glue 3.0.)
Similar to ETL, weโll enable these features: โenable-metrics, โenable-spark-ui, โenable-continuous-cloudwatch-log
Example
// The code below shows an example of how to instantiate this type.
// The values are placeholders you should change.
import * as glue_alpha from '@aws-cdk/aws-glue-alpha';
import * as cdk from 'aws-cdk-lib';
import { aws_iam as iam } from 'aws-cdk-lib';
import { aws_logs as logs } from 'aws-cdk-lib';
import { aws_s3 as s3 } from 'aws-cdk-lib';
declare const bucket: s3.Bucket;
declare const code: glue_alpha.Code;
declare const connection: glue_alpha.Connection;
declare const logGroup: logs.LogGroup;
declare const role: iam.Role;
declare const securityConfiguration: glue_alpha.SecurityConfiguration;
const scalaSparkFlexEtlJobProps: glue_alpha.ScalaSparkFlexEtlJobProps = {
className: 'className',
role: role,
script: code,
// the properties below are optional
connections: [connection],
continuousLogging: {
enabled: false,
// the properties below are optional
conversionPattern: 'conversionPattern',
logGroup: logGroup,
logStreamPrefix: 'logStreamPrefix',
quiet: false,
},
defaultArguments: {
defaultArgumentsKey: 'defaultArguments',
},
description: 'description',
enableProfilingMetrics: false,
extraFiles: [code],
extraJars: [code],
extraJarsFirst: false,
glueVersion: glue_alpha.GlueVersion.V0_9,
jobName: 'jobName',
maxConcurrentRuns: 123,
maxRetries: 123,
notifyDelayAfter: cdk.Duration.minutes(30),
numberOfWorkers: 123,
securityConfiguration: securityConfiguration,
sparkUI: {
bucket: bucket,
jobRunQueuingEnabled: false,
prefix: 'prefix',
},
tags: {
tagsKey: 'tags',
},
timeout: cdk.Duration.minutes(30),
workerType: glue_alpha.WorkerType.STANDARD,
};
Properties
Name | Type | Description |
---|---|---|
class | string | The fully qualified Scala class name that serves as the entry point for the job. |
role | IRole | IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn't have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions. |
script | Code | Script Code Location (required) Script to run when the Glue job executes. |
connections? | IConnection [] | Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC. |
continuous | Continuous | Enables continuous logging with the specified props. |
default | { [string]: string } | Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs. |
description? | string | Description (optional) Developer-specified description of the Glue job. |
enable | boolean | Enables the collection of metrics for job profiling. |
extra | Code [] | Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. |
extra | Code [] | Additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. Only individual files are supported, directories are not supported. |
extra | boolean | Setting this value to true prioritizes the customer's extra JAR files in the classpath. |
glue | Glue | Glue Version The version of Glue to use to execute this job. |
job | string | Name of the Glue job (optional) Developer-specified name of the Glue job. |
max | number | Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run. |
max | number | Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails. |
notify | Duration | Specifies configuration properties of a notification (optional). |
number | number | Number of Workers (optional) Number of workers for Glue to use during job execution. |
security | ISecurity | Security Configuration (optional) Defines the encryption options for the Glue job. |
spark | Spark | Enables the Spark UI debugging and monitoring with the specified props. |
tags? | { [string]: string } | Tags (optional) A list of key:value pairs of tags to apply to this Glue job resources. |
timeout? | Duration | Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. |
worker | Worker | Worker Type (optional) Type of Worker for Glue to use during job execution Enum options: Standard, G_1X, G_2X, G_025X. |
className
Type:
string
The fully qualified Scala class name that serves as the entry point for the job.
See also: [ --class
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html]( --class
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
role
Type:
IRole
IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn't have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.
See also: https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html
script
Type:
Code
Script Code Location (required) Script to run when the Glue job executes.
Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucket
connections?
Type:
IConnection
[]
(optional, default: [] - no connections are added to the job)
Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC.
continuousLogging?
Type:
Continuous
(optional, default: continuous logging is enabled.)
Enables continuous logging with the specified props.
defaultArguments?
Type:
{ [string]: string }
(optional, default: no arguments)
Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs.
See also: [https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters)
description?
Type:
string
(optional, default: no value)
Description (optional) Developer-specified description of the Glue job.
enableProfilingMetrics?
Type:
boolean
(optional, default: no profiling metrics emitted.)
Enables the collection of metrics for job profiling.
extraFiles?
Type:
Code
[]
(optional, default: [] - no extra files are copied to the working directory)
Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it.
Only individual files are supported, directories are not supported.
See also: [ --extra-files
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html]( --extra-files
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
extraJars?
Type:
Code
[]
(optional, default: [] - no extra jars are added to the classpath)
Additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. Only individual files are supported, directories are not supported.
See also: [ --extra-jars
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html]( --extra-jars
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
extraJarsFirst?
Type:
boolean
(optional, default: false - priority is not given to user-provided jars)
Setting this value to true prioritizes the customer's extra JAR files in the classpath.
See also: [ --user-jars-first
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html]( --user-jars-first
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html)
glueVersion?
Type:
Glue
(optional, default: 3.0 for ETL)
Glue Version The version of Glue to use to execute this job.
jobName?
Type:
string
(optional, default: a name is automatically generated)
Name of the Glue job (optional) Developer-specified name of the Glue job.
maxConcurrentRuns?
Type:
number
(optional, default: 1)
Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run.
An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.
maxRetries?
Type:
number
(optional, default: 0)
Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails.
notifyDelayAfter?
Type:
Duration
(optional, default: undefined)
Specifies configuration properties of a notification (optional).
After a job run starts, the number of minutes to wait before sending a job run delay notification.
numberOfWorkers?
Type:
number
(optional, default: 10)
Number of Workers (optional) Number of workers for Glue to use during job execution.
securityConfiguration?
Type:
ISecurity
(optional, default: no security configuration.)
Security Configuration (optional) Defines the encryption options for the Glue job.
sparkUI?
Type:
Spark
(optional, default: Spark UI debugging and monitoring is disabled.)
Enables the Spark UI debugging and monitoring with the specified props.
tags?
Type:
{ [string]: string }
(optional, default: {} - no tags)
Tags (optional) A list of key:value pairs of tags to apply to this Glue job resources.
timeout?
Type:
Duration
(optional, default: 2880 (2 days for non-streaming))
Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status.
Specified in minutes.
workerType?
Type:
Worker
(optional, default: WorkerType.G_1X)
Worker Type (optional) Type of Worker for Glue to use during job execution Enum options: Standard, G_1X, G_2X, G_025X.
G_4X, G_8X, Z_2X