Create a schedule to automatically
process new data
The following section only applies to SageMaker Processing jobs. If you
used the default Canvas settings or EMR Serverless to create a
remote job to apply transforms to your full dataset, this section doesn’t apply.
If you're processing data periodically, you can create a schedule to run the
processing job automatically. For example, you can create a schedule that runs a
processing job automatically when you get new data. For more information about
processing jobs, see Export to Amazon S3.
When you create a job, you must specify an IAM role that has permissions to create
the job. You can use the AmazonSageMakerCanvasDataPrepFullAccess policy to add permissions.
Add the following trust policy to the role to allow EventBridge to assume it.
{
"Effect": "Allow",
"Principal": {
"Service": "events.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
When you create a schedule, Data Wrangler creates an eventRule
in EventBridge. You
incur charges for both the event rules that you create and the instances used to run
the processing job.
For information about EventBridge pricing, see Amazon EventBridge pricing. For information about
processing job pricing, see Amazon SageMaker AI Pricing.
You can set a schedule using one of the following methods:
The following sections provide procedures on scheduling jobs when filling out the SageMaker AI
Processing job settings while exporting your data to Amazon S3.
All of the following instructions begin in the Associate schedules section of
the SageMaker Processing job settings.
- CRON
-
Use the following procedure to create a schedule with a CRON
expression.
-
In the Export to Amazon S3 side panel, make sure
you've turned off the Auto job configuration toggle and
have the SageMaker Processing option selected.
-
In the SageMaker Processing job settings, open the
Associate schedules section and choose
Create new schedule.
-
The Create new schedule dialog box opens.
For Schedule Name, specify the name of the
schedule.
-
For Run Frequency, choose
CRON.
-
For each of the Minutes, Hours,
Days of month, Month,
and Day of week fields, enter valid CRON expression values.
-
Choose Create.
-
(Optional) Choose Add another schedule to run
the job on an additional schedule.
You can associate a maximum of two schedules. The schedules
are independent and don't affect each other unless the times
overlap.
-
Choose one of the following:
-
Choose Export after you've filled out the
rest of the export job settings.
- RATE
-
Use the following procedure to create a schedule with a RATE
expression.
-
In the Export to Amazon S3 side panel, make sure
you've turned off the Auto job configuration toggle and
have the SageMaker Processing option selected.
-
In the SageMaker Processing job settings, open the
Associate schedules section and choose
Create new schedule.
-
The Create new schedule dialog box opens.
For Schedule Name, specify the name of the
schedule.
-
For Run Frequency, choose
Rate.
-
For Value, specify an integer.
-
For Unit, select one of the following:
-
Choose Create.
-
(Optional) Choose Add another schedule to run
the job on an additional schedule.
You can associate a maximum of two schedules. The schedules
are independent and don't affect each other unless the times
overlap.
-
Choose one of the following:
-
Choose Export after you've filled out
the rest of the export job settings.
- Recurring
-
Use the following procedure to create a schedule that runs a job on a
recurring basis.
-
In the Export to Amazon S3 side panel, make sure
you've turned off the Auto job configuration toggle and
have the SageMaker Processing option selected.
-
In the SageMaker Processing job settings, open the
Associate schedules section and choose
Create new schedule.
-
The Create new schedule dialog box opens.
For Schedule Name, specify the name of the
schedule.
-
For Run Frequency, choose
Recurring.
-
For Every x hours, specify the hourly
frequency that the job runs during the day. Valid values are
integers in the inclusive range of 1
and
23
.
-
For On days, select one of the following
options:
-
Every Day
-
Weekends
-
Weekdays
-
Select Days
-
(Optional) If you've selected Select
Days, choose the days of the week to run the
job.
The schedule resets every day. If you schedule a job to run
every five hours, it runs at the following times during the
day:
-
00:00
-
05:00
-
10:00
-
15:00
-
20:00
-
Choose Create.
-
(Optional) Choose Add another schedule to run
the job on an additional schedule.
You can associate a maximum of two schedules. The schedules
are independent and don't affect each other unless the times
overlap.
-
Choose one of the following:
-
Choose Export after you've filled out the rest of
the export job settings.
- Specific time
-
Use the following procedure to create a schedule that runs a job at
specific times.
-
In the Export to Amazon S3 side panel, make sure
you've turned off the Auto job configuration toggle and
have the SageMaker Processing option selected.
-
In the SageMaker Processing job settings, open the
Associate schedules section and choose
Create new schedule.
-
The Create new schedule dialog box opens.
For Schedule Name, specify the name of the
schedule.
-
For Run Frequency, choose
Start time.
-
For Start time, enter a time in UTC format
(for example, 09:00
). The start time defaults
to the time zone where you are located.
-
For On days, select one of the following
options:
-
Every Day
-
Weekends
-
Weekdays
-
Select Days
-
(Optional) If you've selected Select
Days, choose the days of the week to run the
job.
-
Choose Create.
-
(Optional) Choose Add another schedule to run
the job on an additional schedule.
You can associate a maximum of two schedules. The schedules
are independent and don't affect each other unless the times
overlap.
-
Choose one of the following:
-
Choose Export after you've filled out the rest
of the export job settings.
You can use the SageMaker AI AWS Management Console to view the jobs that are scheduled to run. Your
processing jobs run within Pipelines. Each processing job has its own pipeline. It runs as a
processing step within the pipeline. You can view the schedules that you've created
within a pipeline. For information about viewing a pipeline, see View the details of a pipeline.
Use the following procedure to view the jobs that you've scheduled.
To view the jobs you've scheduled, do the following.
-
Open Amazon SageMaker Studio Classic.
-
Open Pipelines
-
View the pipelines for the jobs that you've created.
The pipeline running the job uses the job name as a prefix. For example, if
you've created a job named housing-data-feature-enginnering
, the
name of the pipeline is
canvas-data-prep-housing-data-feature-engineering
.
-
Choose the pipeline containing your job.
-
View the status of the pipelines. Pipelines with a Status
of Succeeded have run the processing job
successfully.
To stop the processing job from running, do the following:
To stop a processing job from running, delete the event rule that specifies the
schedule. Deleting an event rule stops all the jobs associated with the schedule from
running. For information about deleting a rule, see Disabling or deleting an
Amazon EventBridge rule.
You can stop and delete the pipelines associated with the schedules as well. For
information about stopping a pipeline, see StopPipelineExecution. For information about deleting a pipeline, see
DeletePipeline.