Retry Failed Activities
Activities sometimes fail for ephemeral reasons, such as a temporary loss of connectivity. At another time, the activity might succeed, so the appropriate way to handle activity failure is often to retry the activity, perhaps multiple times.
There are a variety of strategies for retrying activities; the best one depends on the details of your workflow. The strategies fall into three basic categories:
-
The retry-until-success strategy simply keeps retrying the activity until it completes.
-
The exponential retry strategy increases the time interval between retry attempts exponentially until the activity completes or the process reaches a specified stopping point, such as a maximum number of attempts.
-
The custom retry strategy decides whether or how to retry the activity after each failed attempt.
The following sections describe how to implement these strategies. The example workflow workers all use a
single activity, unreliableActivity
, which randomly does one of following:
-
Completes immediately
-
Fails intentionally by exceeding the timeout value
-
Fails intentionally by throwing
IllegalStateException
Retry-Until-Success Strategy
The simplest retry strategy is to keep retrying the activity each time it fails until it eventually succeeds. The basic pattern is:
-
Implement a nested
TryCatch
orTryCatchFinally
class in your workflow's entry point method. -
Execute the activity in
doTry
-
If the activity fails, the framework calls
doCatch
, which runs the entry point method again. -
Repeat Steps 2 - 3 until the activity completes successfully.
The following workflow implements the retry-until-success strategy. The workflow interface is implemented in
RetryActivityRecipeWorkflow
and has one method, runUnreliableActivityTillSuccess
,
which is the workflow's entry point. The workflow worker is implemented in
RetryActivityRecipeWorkflowImpl
, as follows:
public class RetryActivityRecipeWorkflowImpl implements RetryActivityRecipeWorkflow { @Override public void runUnreliableActivityTillSuccess() { final Settable<Boolean> retryActivity = new Settable<Boolean>(); new TryCatch() { @Override protected void doTry() throws Throwable { Promise<Void> activityRanSuccessfully = client.unreliableActivity(); setRetryActivityToFalse(activityRanSuccessfully, retryActivity); } @Override protected void doCatch(Throwable e) throws Throwable { retryActivity.set(true); } }; restartRunUnreliableActivityTillSuccess(retryActivity); } @Asynchronous private void setRetryActivityToFalse( Promise<Void> activityRanSuccessfully, @NoWait Settable<Boolean> retryActivity) { retryActivity.set(false); } @Asynchronous private void restartRunUnreliableActivityTillSuccess( Settable<Boolean> retryActivity) { if (retryActivity.get()) { runUnreliableActivityTillSuccess(); } } }
The workflow works as follows:
-
runUnreliableActivityTillSuccess
creates aSettable<Boolean>
object namedretryActivity
which is used to indicate whether the activity failed and should be retried.Settable<T>
is derived fromPromise<T>
and works much the same way, but you set aSettable<T>
object's value manually. -
runUnreliableActivityTillSuccess
implements an anonymous nestedTryCatch
class to handle any exceptions that are thrown by theunreliableActivity
activity. For more discussion of how to handle exceptions thrown by asynchronous code, see Error Handling. -
doTry
executes theunreliableActivity
activity, which returns aPromise<Void>
object namedactivityRanSuccessfully
. -
doTry
calls the asynchronoussetRetryActivityToFalse
method, which has two parameters:-
activityRanSuccessfully
takes thePromise<Void>
object returned by theunreliableActivity
activity. -
retryActivity
takes theretryActivity
object.
If
unreliableActivity
completes,activityRanSuccessfully
becomes ready andsetRetryActivityToFalse
setsretryActivity
to false. Otherwise,activityRanSuccessfully
never becomes ready andsetRetryActivityToFalse
doesn't execute. -
-
If
unreliableActivity
throws an exception, the framework callsdoCatch
and passes it the exception object.doCatch
setsretryActivity
to true. -
runUnreliableActivityTillSuccess
calls the asynchronousrestartRunUnreliableActivityTillSuccess
method and passes it theretryActivity
object. BecauseretryActivity
is aPromise<T>
type,restartRunUnreliableActivityTillSuccess
defers execution untilretryActivity
is ready, which occurs afterTryCatch
completes. -
When
retryActivity
is ready,restartRunUnreliableActivityTillSuccess
extracts the value.-
If the value is
false
, the retry succeeded.restartRunUnreliableActivityTillSuccess
doesn'thing and the retry sequence terminates. -
If the value is true, the retry failed.
restartRunUnreliableActivityTillSuccess
callsrunUnreliableActivityTillSuccess
to execute the activity again.
-
-
Steps 1 - 7 repeat until
unreliableActivity
completes.
Note
doCatch
doesn't handle the exception; it simply sets the retryActivity
object
to true to indicate that the activity failed. The retry is handled by the asynchronous
restartRunUnreliableActivityTillSuccess
method, which defers execution until
TryCatch
completes. The reason for this approach is that, if you retry an activity in
doCatch
, you can't cancel it. Retrying the activity in
restartRunUnreliableActivityTillSuccess
allows you to execute cancellable activities.
Exponential Retry Strategy
With the exponential retry strategy, the framework executes a failed activity again after a specified period of time, N seconds. If that attempt fails the framework executes the activity again after 2N seconds, and then 4N seconds and so on. Because the wait time can get quite large, you typically stop the retry attempts at some point rather than continuing indefinitely.
The framework provides three ways to implement an exponential retry strategy:
-
The
@ExponentialRetry
annotation is the simplest approach, but you must set the retry configuration options at compile time. -
The
RetryDecorator
class allows you to set retry configuration at run time and change it as needed. -
The
AsyncRetryingExecutor
class allows you to set retry configuration at run time and change it as needed. In addition, the framework calls a user-implementedAsyncRunnable.run
method to run each retry attempt.
All approaches support the following configuration options, where time values are in seconds:
-
The initial retry wait time.
-
The back-off coefficient, which is used to compute the retry intervals, as follows:
retryInterval = initialRetryIntervalSeconds * Math.pow(backoffCoefficient, numberOfTries - 2)
The default value is 2.0.
-
The maximum number of retry attempts. The default value is unlimited.
-
The maximum retry interval. The default value is unlimited.
-
The expiration time. Retry attempts stop when the total duration of the process exceeds this value. The default value is unlimited.
-
The exceptions that will trigger the retry process. By default, every exception triggers the retry process.
-
The exceptions that will not trigger a retry attempt. By default, no exceptions are excluded.
The following sections describe the various ways that you can implement an exponential retry strategy.
Exponential Retry with @ExponentialRetry
The simplest way to implement an exponential retry strategy for an activity is to apply an
@ExponentialRetry
annotation to the activity in the interface definition. If the activity fails,
the framework handles the retry process automatically, based on the specified option values. The basic pattern
is:
-
Apply
@ExponentialRetry
to the appropriate activities and specify the retry configuration. -
If an annotated activity fails, the framework automatically retries the activity according to the configuration specified by the annotation's arguments.
The ExponentialRetryAnnotationWorkflow
workflow worker implements the exponential retry
strategy by using an @ExponentialRetry
annotation. It uses an unreliableActivity
activity whose interface definition is implemented in ExponentialRetryAnnotationActivities
, as
follows:
@Activities(version = "1.0") @ActivityRegistrationOptions( defaultTaskScheduleToStartTimeoutSeconds = 30, defaultTaskStartToCloseTimeoutSeconds = 30) public interface ExponentialRetryAnnotationActivities { @ExponentialRetry( initialRetryIntervalSeconds = 5, maximumAttempts = 5, exceptionsToRetry = IllegalStateException.class) public void unreliableActivity(); }
The @ExponentialRetry
options specify the following strategy:
-
Retry only if the activity throws
IllegalStateException
. -
Use an initial wait time of 5 seconds.
-
No more than 5 retry attempts.
The workflow interface is implemented in RetryWorkflow
and has one method,
process
, which is the workflow's entry point. The workflow worker is implemented in
ExponentialRetryAnnotationWorkflowImpl
, as follows:
public class ExponentialRetryAnnotationWorkflowImpl implements RetryWorkflow { public void process() { handleUnreliableActivity(); } public void handleUnreliableActivity() { client.unreliableActivity(); } }
The workflow works as follows:
-
process
runs the synchronoushandleUnreliableActivity
method. -
handleUnreliableActivity
executes theunreliableActivity
activity.
If the activity fails by throwing IllegalStateException
, the framework automatically runs the
retry strategy specified in ExponentialRetryAnnotationActivities
.
Exponential Retry with the RetryDecorator Class
@ExponentialRetry
is simple to use. However, the configuration is static and set at compile
time, so the framework uses the same retry strategy every time the activity fails. You can implement a more
flexible exponential retry strategy by using the RetryDecorator
class, which allows you to
specify the configuration at run time and change it as needed. The basic pattern is:
-
Create and configure an
ExponentialRetryPolicy
object that specifies the retry configuration. -
Create a
RetryDecorator
object and pass theExponentialRetryPolicy
object from Step 1 to the constructor. -
Apply the decorator object to the activity by passing the activity client's class name to the
RetryDecorator
object's decorate method. -
Execute the activity.
If the activity fails, the framework retries the activity according to the
ExponentialRetryPolicy
object's configuration. You can change the retry configuration as needed
by modifying this object.
Note
The @ExponentialRetry
annotation and the RetryDecorator
class are mutually
exclusive. You can't use RetryDecorator
to dynamically override a retry policy specified by an
@ExponentialRetry
annotation.
The following workflow implementation shows how to use the RetryDecorator
class to implement
an exponential retry strategy. It uses an unreliableActivity
activity that doesn't have an
@ExponentialRetry
annotation. The workflow interface is implemented in RetryWorkflow
and has one method, process
, which is the workflow's entry point. The workflow worker is
implemented in DecoratorRetryWorkflowImpl
, as follows:
public class DecoratorRetryWorkflowImpl implements RetryWorkflow { ... public void process() { long initialRetryIntervalSeconds = 5; int maximumAttempts = 5; ExponentialRetryPolicy retryPolicy = new ExponentialRetryPolicy( initialRetryIntervalSeconds).withMaximumAttempts(maximumAttempts); Decorator retryDecorator = new RetryDecorator(retryPolicy); client = retryDecorator.decorate(RetryActivitiesClient.class, client); handleUnreliableActivity(); } public void handleUnreliableActivity() { client.unreliableActivity(); } }
The workflow works as follows:
-
process
creates and configures anExponentialRetryPolicy
object by:-
Passing the initial retry interval to the constructor.
-
Calling the object's
withMaximumAttempts
method to set the maximum number of attempts to 5.ExponentialRetryPolicy
exposes otherwith
objects that you can use to specify other configuration options.
-
-
process
creates aRetryDecorator
object namedretryDecorator
and passes theExponentialRetryPolicy
object from Step 1 to the constructor. -
process
applies the decorator to the activity by calling theretryDecorator.decorate
method and passing it the activity client's class name. -
handleUnreliableActivity
executes the activity.
If the activity fails, the framework retries it according to the configuration specified in Step 1.
Note
Several of the ExponentialRetryPolicy
class's with
methods have a
corresponding set
method that you can call to modify the corresponding configuration option
at any time: setBackoffCoefficient
, setMaximumAttempts
,
setMaximumRetryIntervalSeconds
, and setMaximumRetryExpirationIntervalSeconds
.
Exponential Retry with the AsyncRetryingExecutor Class
The RetryDecorator
class provides more flexibility in configuring the retry process than
@ExponentialRetry
, but the framework still runs the retry attempts automatically, based on the
ExponentialRetryPolicy
object's current configuration. A more flexible approach is to use the
AsyncRetryingExecutor
class. In addition to allowing you to configure the retry process at run
time, the framework calls a user-implemented AsyncRunnable.run
method to run each retry attempt
instead of simply executing the activity.
The basic pattern is:
-
Create and configure an
ExponentialRetryPolicy
object to specify the retry configuration. -
Create an
AsyncRetryingExecutor
object, and pass it theExponentialRetryPolicy
object and an instance of the workflow clock. -
Implement an anonymous nested
TryCatch
orTryCatchFinally
class. -
Implement an anonymous
AsyncRunnable
class and override therun
method to implement custom code for running the activity. -
Override
doTry
to call theAsyncRetryingExecutor
object'sexecute
method and pass it theAsyncRunnable
class from Step 4. TheAsyncRetryingExecutor
object callsAsyncRunnable.run
to run the activity. -
If the activity fails, the
AsyncRetryingExecutor
object calls theAsyncRunnable.run
method again, according to the retry policy specified in Step 1.
The following workflow shows how to use the AsyncRetryingExecutor
class to implement an
exponential retry strategy. It uses the same unreliableActivity
activity as the
DecoratorRetryWorkflow
workflow discussed earlier. The workflow interface is implemented in
RetryWorkflow
and has one method, process
, which is the workflow's entry point. The
workflow worker is implemented in AsyncExecutorRetryWorkflowImpl
, as follows:
public class AsyncExecutorRetryWorkflowImpl implements RetryWorkflow { private final RetryActivitiesClient client = new RetryActivitiesClientImpl(); private final DecisionContextProvider contextProvider = new DecisionContextProviderImpl(); private final WorkflowClock clock = contextProvider.getDecisionContext().getWorkflowClock(); public void process() { long initialRetryIntervalSeconds = 5; int maximumAttempts = 5; handleUnreliableActivity(initialRetryIntervalSeconds, maximumAttempts); } public void handleUnreliableActivity(long initialRetryIntervalSeconds, int maximumAttempts) { ExponentialRetryPolicy retryPolicy = new ExponentialRetryPolicy(initialRetryIntervalSeconds).withMaximumAttempts(maximumAttempts); final AsyncExecutor executor = new AsyncRetryingExecutor(retryPolicy, clock); new TryCatch() { @Override protected void doTry() throws Throwable { executor.execute(new AsyncRunnable() { @Override public void run() throws Throwable { client.unreliableActivity(); } }); } @Override protected void doCatch(Throwable e) throws Throwable { } }; } }
The workflow works as follows:
-
process
calls thehandleUnreliableActivity
method and passes it the configuration settings. -
handleUnreliableActivity
uses the configuration settings from Step 1 to create anExponentialRetryPolicy
object,retryPolicy
. -
handleUnreliableActivity
creates anAsyncRetryExecutor
object,executor
, and passes theExponentialRetryPolicy
object from Step 2 and an instance of the workflow clock to the constructor -
handleUnreliableActivity
implements an anonymous nestedTryCatch
class and overrides thedoTry
anddoCatch
methods to run the retry attempts and handle any exceptions. -
doTry
creates an anonymousAsyncRunnable
class and overrides therun
method to implement custom code to executeunreliableActivity
. For simplicity,run
just executes the activity, but you can implement more sophisticated approaches as appropriate. -
doTry
callsexecutor.execute
and passes it theAsyncRunnable
object.execute
calls theAsyncRunnable
object'srun
method to run the activity. -
If the activity fails, executor calls
run
again, according to theretryPolicy
object configuration.
For more discussion of how to use the TryCatch
class to handle errors, see AWS Flow Framework for Java Exceptions.
Custom Retry Strategy
The most flexible approach to retrying failed activities is a custom strategy, which recursively calls an asynchronous method that runs the retry attempt, much like the retry-until-success strategy. However, instead of simply running the activity again, you implement custom logic that decides whether and how to run each successive retry attempt. The basic pattern is:
-
Create a
Settable<T>
status object, which is used to indicate whether the activity failed. -
Implement a nested
TryCatch
orTryCatchFinally
class. -
doTry
executes the activity. -
If the activity fails,
doCatch
sets the status object to indicate that the activity failed. -
Call an asynchronous failure handling method and pass it the status object. The method defers execution until
TryCatch
orTryCatchFinally
completes. -
The failure handling method decides whether to retry the activity, and if so, when.
The following workflow shows how to implement a custom retry strategy. It uses the same
unreliableActivity
activity as the DecoratorRetryWorkflow
and
AsyncExecutorRetryWorkflow
workflows. The workflow interface is implemented in
RetryWorkflow
and has one method, process
, which is the workflow's entry point. The
workflow worker is implemented in CustomLogicRetryWorkflowImpl
, as follows:
public class CustomLogicRetryWorkflowImpl implements RetryWorkflow { ... public void process() { callActivityWithRetry(); } @Asynchronous public void callActivityWithRetry() { final Settable<Throwable> failure = new Settable<Throwable>(); new TryCatchFinally() { protected void doTry() throws Throwable { client.unreliableActivity(); } protected void doCatch(Throwable e) { failure.set(e); } protected void doFinally() throws Throwable { if (!failure.isReady()) { failure.set(null); } } }; retryOnFailure(failure); } @Asynchronous private void retryOnFailure(Promise<Throwable> failureP) { Throwable failure = failureP.get(); if (failure != null && shouldRetry(failure)) { callActivityWithRetry(); } } protected Boolean shouldRetry(Throwable e) { //custom logic to decide to retry the activity or not return true; } }
The workflow works as follows:
-
process
calls the asynchronouscallActivityWithRetry
method. -
callActivityWithRetry
creates aSettable<Throwable>
object named failure which is used to indicate whether the activity has failed.Settable<T>
is derived fromPromise<T>
and works much the same way, but you set aSettable<T>
object's value manually. -
callActivityWithRetry
implements an anonymous nestedTryCatchFinally
class to handle any exceptions that are thrown byunreliableActivity
. For more discussion of how to handle exceptions thrown by asynchronous code, see AWS Flow Framework for Java Exceptions. -
doTry
executesunreliableActivity
. -
If
unreliableActivity
throws an exception, the framework callsdoCatch
and passes it the exception object.doCatch
setsfailure
to the exception object, which indicates that the activity failed and puts the object in a ready state. -
doFinally
checks whetherfailure
is ready, which will be true only iffailure
was set bydoCatch
.-
If
failure
is ready,doFinally
doesn'thing. -
If
failure
isn't ready, the activity completed anddoFinally
sets failure tonull
.
-
-
callActivityWithRetry
calls the asynchronousretryOnFailure
method and passes it failure. Because failure is aSettable<T>
type,callActivityWithRetry
defers execution until failure is ready, which occurs afterTryCatchFinally
completes. -
retryOnFailure
gets the value from failure.-
If failure is set to null, the retry attempt was successful.
retryOnFailure
does nothing, which terminates the retry process. -
If failure is set to an exception object and
shouldRetry
returns true,retryOnFailure
callscallActivityWithRetry
to retry the activity.shouldRetry
implements custom logic to decide whether to retry a failed activity. For simplicity,shouldRetry
always returnstrue
andretryOnFailure
executes the activity immediately, but you can implement more sophisticated logic as needed.
-
-
Steps 2–8 repeat until
unreliableActivity
completes orshouldRetry
decides to stop the process.
Note
doCatch
doesn't handle the retry process; it simply sets failure to indicate that the
activity failed. The retry process is handled by the asynchronous retryOnFailure
method, which
defers execution until TryCatch
completes. The reason for this approach is that, if you retry an
activity in doCatch
, you can't cancel it. Retrying the activity in retryOnFailure
allows you to execute cancellable activities.