

# Array jobs
<a name="array_jobs"></a>

An array job is a job that shares common parameters, such as the job definition, vCPUs, and memory. It runs as a collection of related yet separate basic jobs that might be distributed across multiple hosts and might run concurrently. Array jobs are the most efficient way to run extremely parallel jobs such as Monte Carlo simulations, parametric sweeps, or large rendering jobs.

AWS Batch array jobs are submitted just like regular jobs. However, you specify an array size (between 2 and 10,000) to define how many child jobs should run in the array. If you submit a job with an array size of 1000, a single job runs and spawns 1000 child jobs. The array job is a reference or pointer to manage all the child jobs. This way, you can submit large workloads with a single query. The timeout specified in the `attemptDurationSeconds` parameter applies to each child job. The parent array job does not have a timeout.

When you submit an array job, the parent array job gets a normal AWS Batch job ID. Each child job has the same base ID. However, the array index for the child job is appended to the end of the parent ID, such as `example_job_ID:0` for the first child job of the array. 

The parent array job can enter a `SUBMITTED`, `PENDING`, `FAILED`, or `SUCCEEDED` status. An array parent job is updated to `PENDING` when any child job is updated to `RUNNABLE`. For more information about job dependencies, see [Job dependencies](job_dependencies.md).

At runtime, the `AWS_BATCH_JOB_ARRAY_INDEX` environment variable is set to the container's corresponding job array index number. The first array job index is numbered `0`, and subsequent attempts are in ascending order (for example, 1, 2, and 3). You can use this index value to control how your array job children are differentiated. For more information, see [Use the array job index to control job differentiation](array_index_example.md).

For array job dependencies, you can specify a type for a dependency, such as `SEQUENTIAL` or `N_TO_N`. You can specify a `SEQUENTIAL` type dependency (without specifying a job ID) so that each child array job completes sequentially, starting at index 0. For example, if you submit an array job with an array size of 100, and specify a dependency with type `SEQUENTIAL`, 100 child jobs are spawned sequentially, where the first child job must succeed before the next child job starts. The figure below shows Job A, an array job with an array size of 10. Each job in Job A's child index is dependent on the previous child job. Job A:1 can't start until job A:0 finishes.

![\[Flowchart showing Job-A with sequential child jobs A:0 through A:9, connected by arrows.\]](http://docs.aws.amazon.com/batch/latest/userguide/images/sequential-dep.png)


You can also specify an `N_TO_N` type dependency with a job ID for array jobs. That way, each index child of this job must wait for the corresponding index child of each dependency to complete before it can begin. The following figure shows Job A and Job B, two array jobs with an array size of 10,000 each. Each job in Job B's child index is dependent on the corresponding index in Job A. Job B:1 can't start until job A:1 finishes. 

![\[Two array jobs, Job-A and Job-B, with 10,000 indexed tasks each, showing N_TO_N dependency.\]](http://docs.aws.amazon.com/batch/latest/userguide/images/n-to-n-dep.png)


If you cancel or terminate a parent array job, all the child jobs are cancelled or terminated with it. You can cancel or terminate individual child jobs (which moves them to a `FAILED` status) without affecting the other child jobs. However, if a child array job fails (on its own, or by manually cancelling or terminating the job), the parent job also fails. In this scenario, the parent job transitions to `FAILED` when all child jobs complete.

For more information about searching and filtering array jobs, see [Search for jobs in a job queue](searching-filtering-jobs.md).

**Topics**
+ [Example of an array job workflow](example_array_job.md)
+ [Use the array job index to control job differentiation](array_index_example.md)

# Example of an array job workflow
<a name="example_array_job"></a>

A common workflow for AWS Batch customers is to run a prerequisite setup job, run a series of commands against a large number of input tasks, and then conclude with a job that aggregates results and writes summary data to Amazon S3, DynamoDB, Amazon Redshift, or Aurora.

For example:
+ `JobA`: A standard, non-array job that performs a quick listing and metadata validation of objects in an Amazon S3 bucket, `BucketA`. The [SubmitJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitJob.html) JSON syntax is as follows.

  ```
  {
      "jobName": "JobA",
      "jobQueue": "ProdQueue",
      "jobDefinition": "JobA-list-and-validate:1"
  }
  ```
+ `JobB`: An array job with 10,000 copies that is dependent upon `JobA` that runs CPU-intensive commands against each object in `BucketA` and uploads results to `BucketB`. The [SubmitJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitJob.html) JSON syntax is as follows.

  ```
  {
      "jobName": "JobB",
      "jobQueue": "ProdQueue",
      "jobDefinition": "JobB-CPU-Intensive-Processing:1",
      "containerOverrides": {
          "resourceRequirements": [
              {
                  "type": "MEMORY",
                  "value": "4096"
              },
              {
                  "type": "VCPU",
                  "value": "32"
              }
          ]
     }
      "arrayProperties": {
          "size": 10000
      },
      "dependsOn": [
          {
              "jobId": "JobA_job_ID"
    }
      ]
  }
  ```
+ `JobC`: Another 10,000 copy array job that's dependent upon `JobB` with an `N_TO_N` dependency model, that runs memory-intensive commands against each item in `BucketB`, writes metadata to DynamoDB, and uploads the resulting output to `BucketC`. The [SubmitJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitJob.html) JSON syntax is as follows.

  ```
  {
      "jobName": "JobC",
      "jobQueue": "ProdQueue",
      "jobDefinition": "JobC-Memory-Intensive-Processing:1",
      "containerOverrides": {
          "resourceRequirements": [
              {
                  "type": "MEMORY",
                  "value": "32768"
              },
              {
                  "type": "VCPU",
                  "value": "1"
              }
          ]
     }
      "arrayProperties": {
          "size": 10000
      },
      "dependsOn": [
          {
              "jobId": "JobB_job_ID",
              "type": "N_TO_N"
          }
      ]
  }
  ```
+ `JobD`: An array job that performs 10 validation steps that each need to query DynamoDB and might interact with any of the above Amazon S3 buckets. Each of the steps in `JobD` run the same command. However, the behavior is different based on the value of the `AWS_BATCH_JOB_ARRAY_INDEX` environment variable within the job's container. These validation steps run sequentially (for example, `JobD:0` and then `JobD:1`). The [SubmitJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitJob.html) JSON syntax is as follows.

  ```
  {
      "jobName": "JobD",
      "jobQueue": "ProdQueue",
      "jobDefinition": "JobD-Sequential-Validation:1",
      "containerOverrides": {
          "resourceRequirements": [
              {
                  "type": "MEMORY",
                  "value": "32768"
              },
              {
                  "type": "VCPU",
                  "value": "1"
              }
          ]
     }
      "arrayProperties": {
          "size": 10
      },
      "dependsOn": [
          {
              "jobId": "JobC_job_ID"
          },
          {
              "type": "SEQUENTIAL"
          },
   
      ]
  }
  ```
+ `JobE`: A final, non-array job that performs some simple cleanup operations and sends an Amazon SNS notification with a message that the pipeline has completed and a link to the output URL. The [SubmitJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitJob.html) JSON syntax is as follows.

  ```
  {
      "jobName": "JobE",
      "jobQueue": "ProdQueue",
      "jobDefinition": "JobE-Cleanup-and-Notification:1",
      "parameters": {
          "SourceBucket": "s3://amzn-s3-demo-source-bucket",
          "Recipient": "pipeline-notifications@mycompany.com"
      },
      "dependsOn": [
          {
              "jobId": "JobD_job_ID"
          }
      ]
  }
  ```

# Use the array job index to control job differentiation
<a name="array_index_example"></a>

This tutorial describes how to use the `AWS_BATCH_JOB_ARRAY_INDEX` environment variable to differentiate the child jobs. Each child job is assigned to this variable. The example uses the child job's index number to read a specific line in a file. Then, it substitutes the parameter associated with that line number with a command inside the job's container. The result is that you can have multiple AWS Batch jobs that run the same Docker image and command arguments. However, the results are different because the array job index is used as a modifier.

In this tutorial, you create a text file that has all of the colors of the rainbow, each on its own line. Then, you create an entrypoint script for a Docker container that converts the index into a value that can be used for a line number in the color file. The index starts at zero, but line numbers start at one. Create a Dockerfile that copies the color and index files to the container image and sets `ENTRYPOINT` for the image to the entrypoint script. The Dockerfile and resources are built to a Docker image that's pushed to Amazon ECR. You then register a job definition that uses your new container image, submit an AWS Batch array job with that job definition, and view the results.

**Topics**
+ [Prerequisites](array-tutorial-prereqs.md)
+ [Build a container image](build-index-container.md)
+ [Push your image to Amazon ECR](push-array-image.md)
+ [Create and register a job definition](create-array-job-def.md)
+ [Submit an AWS Batch array job](submit-array-job.md)
+ [View your array job logs](#array-tutorial-logs)

# Prerequisites
<a name="array-tutorial-prereqs"></a>

This tutorial workflow has the following prerequisites:
+ An AWS Batch compute environment. For more information, see [Create a compute environment](create-compute-environment.md).
+ An AWS Batch job queue and associated compute environment. For more information, see [Create a job queue](create-job-queue.md).
+ The AWS CLI installed on your local system. For more information, see [ >Installing the AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) in the *AWS Command Line Interface User Guide*.
+ Docker installed on your local system. For more information, see [About Docker CE](https://docs.docker.com/install/) in the Docker documentation.

# Build a container image
<a name="build-index-container"></a>

You can use the `AWS_BATCH_JOB_ARRAY_INDEX` in a job definition in the command parameter. However, we recommend that you create a container image that uses the variable in an entrypoint script instead. This section describes how to create such a container image.

**To build your Docker container image**

1. Create a new directory to use as your Docker image workspace and navigate to it.

1. Create a file named `colors.txt` in your workspace directory and paste the following into it.

   ```
   red
   orange
   yellow
   green
   blue
   indigo
   violet
   ```

1. Create a file named `print-color.sh` in your workspace directory and paste the following into it.
**Note**  
The `LINE` variable is set to the `AWS_BATCH_JOB_ARRAY_INDEX` \$1 1 because the array index starts at 0, but line numbers start at 1. The `COLOR` variable is set to the color in `colors.txt` that's associated with its line number.

   ```
   #!/bin/sh
   LINE=$((AWS_BATCH_JOB_ARRAY_INDEX + 1))
   COLOR=$(sed -n ${LINE}p /tmp/colors.txt)
   echo My favorite color of the rainbow is $COLOR.
   ```

1. Create a file named `Dockerfile` in your workspace directory and paste the following content into it. This Dockerfile copies the previous files to your container and sets the entrypoint script to run when the container starts.

   ```
   FROM busybox
   COPY print-color.sh /tmp/print-color.sh
   COPY colors.txt /tmp/colors.txt
   RUN chmod +x /tmp/print-color.sh
   ENTRYPOINT /tmp/print-color.sh
   ```

1. Build your Docker image.

   ```
   $ docker build -t print-color .
   ```

1. Test your container with the following script. This script sets the `AWS_BATCH_JOB_ARRAY_INDEX` variable to 0 locally and then increments it to simulate what an array job with seven children does.

   ```
   $ AWS_BATCH_JOB_ARRAY_INDEX=0
   while [ $AWS_BATCH_JOB_ARRAY_INDEX -le 6 ]
   do
       docker run -e AWS_BATCH_JOB_ARRAY_INDEX=$AWS_BATCH_JOB_ARRAY_INDEX print-color
       AWS_BATCH_JOB_ARRAY_INDEX=$((AWS_BATCH_JOB_ARRAY_INDEX + 1))
   done
   ```

   The following is the output.

   ```
   My favorite color of the rainbow is red.
   My favorite color of the rainbow is orange.
   My favorite color of the rainbow is yellow.
   My favorite color of the rainbow is green.
   My favorite color of the rainbow is blue.
   My favorite color of the rainbow is indigo.
   My favorite color of the rainbow is violet.
   ```

# Push your image to Amazon ECR
<a name="push-array-image"></a>

Now that you built and tested your Docker container, push it to an image repository. This example uses Amazon ECR, but you can use another registry, such as DockerHub.

1. Create an Amazon ECR image repository to store your container image. This example only uses the AWS CLI, but you can also use the AWS Management Console. For more information, see [Creating a Repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) in the *Amazon Elastic Container Registry User Guide*.

   ```
   $ aws ecr create-repository --repository-name print-color
   ```

1. Tag your `print-color` image with your Amazon ECR repository URI that was returned from the previous step.

   ```
   $ docker tag print-color aws_account_id.dkr.ecr.region.amazonaws.com/print-color
   ```

1. Log in to your Amazon ECR registry. For more information, see [Registry Authentication](https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html#registry_auth) in the *Amazon Elastic Container Registry User Guide*.

   ```
   $ aws ecr get-login-password \
       --region region | docker login \
       --username AWS \
       --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com
   ```

1. Push your image to Amazon ECR.

   ```
   $ docker push aws_account_id.dkr.ecr.region.amazonaws.com/print-color
   ```

# Create and register a job definition
<a name="create-array-job-def"></a>

Now that your Docker image is in an image registry, you can specify it in an AWS Batch job definition. Then, you can use it later to run an array job. This example only uses the AWS CLI. However, you can also use the AWS Management Console. For more information, see [Create a single-node job definition](create-job-definition.md).

**To create a job definition**

1. Create a file named `print-color-job-def.json` in your workspace directory and paste the following into it. Replace the image repository URI with your own image's URI.

   ```
   {
     "jobDefinitionName": "print-color",
     "type": "container",
     "containerProperties": {
       "image": "aws_account_id.dkr.ecr.region.amazonaws.com/print-color",
       "resourceRequirements": [
           {
               "type": "MEMORY",
               "value": "250"
           },
           {
               "type": "VCPU",
               "value": "1"
           }
       ]
     }
   }
   ```

1. Register the job definition with AWS Batch.

   ```
   $ aws batch register-job-definition --cli-input-json file://print-color-job-def.json
   ```

# Submit an AWS Batch array job
<a name="submit-array-job"></a>

After you registered your job definition, you can submit an AWS Batch array job that uses your new container image.

**To submit an AWS Batch array job**

1. Create a file named `print-color-job.json` in your workspace directory and paste the following into it.
**Note**  
This example uses the job queue mentioned in the [Prerequisites](array-tutorial-prereqs.md) section.

   ```
   {
     "jobName": "print-color",
     "jobQueue": "existing-job-queue",
     "arrayProperties": {
       "size": 7
     },
     "jobDefinition": "print-color"
   }
   ```

1. Submit the job to your AWS Batch job queue. Note the job ID that's returned in the output.

   ```
   $ aws batch submit-job --cli-input-json file://print-color-job.json
   ```

1. Describe the job's status and wait for the job to move to `SUCCEEDED`.

## View your array job logs
<a name="array-tutorial-logs"></a>

After your job reaches the `SUCCEEDED` status, you can view the CloudWatch Logs from the job's container.

**To view your job's logs in CloudWatch Logs**

1. Open the AWS Batch console at [https://console.aws.amazon.com/batch/](https://console.aws.amazon.com/batch/).

1. In the left navigation pane, choose **Jobs**.

1. For **Job queue**, select a queue. 

1. In the **Status** section, choose **succeeded**.

1. To display all of the child jobs for your array job, select the job ID that was returned in the previous section.

1. To see the logs from the job's container, select one of the child jobs and choose **View logs**.  
![\[Array job container logs\]](http://docs.aws.amazon.com/batch/latest/userguide/images/array-logs.png)

1. View the other child job's logs. Each job returns a different color of the rainbow.