

# Amazon SageMaker geospatial capabilities
<a name="geospatial"></a>

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. If prior to November 30, 2023 you created a Amazon SageMaker AI domain, Studio Classic remains the default experience. domains created after November 30, 2023 default to the new Studio experience.  
Amazon SageMaker geospatial features and resources are *only* available in Studio Classic. To learn more about setting up a domain and getting started with Studio, see [Getting started with Amazon SageMaker geospatial](geospatial-getting-started.md).

Amazon SageMaker geospatial capabilities makes it easier for data scientists and machine learning (ML) engineers to build, train, and deploy ML models faster using geospatial data. You have access to open-source and third-party data, processing, and visualization tools to make it more efficient to prepare geospatial data for ML. You can increase your productivity by using purpose-built algorithms and pre-trained ML models to speed up model building and training, and use built-in visualization tools to explore prediction outputs on an interactive map and then collaborate across teams on insights and results.

**Note**  
Currently, SageMaker geospatial capabilities are only supported in the US West (Oregon) Region.  
If you don't see the SageMaker geospatial UI available in your current Studio Classic instance check to make sure you are currently in the US West (Oregon) Region.
<a name="why-use-geo"></a>
**Why use SageMaker geospatial capabilities?**  
You can use SageMaker geospatial capabilities to make predictions on geospatial data faster than do-it-yourself solutions. SageMaker geospatial capabilities make it easier to access geospatial data from your existing customer data lakes, open-source datasets, and other SageMaker geospatial data providers. SageMaker geospatial capabilities minimize the need for building custom infrastructure and data preprocessing functions by offering purpose-built algorithms for efficient data preparation, model training, and inference. You can also create and share custom visualizations and data with your company from Amazon SageMaker Studio Classic. SageMaker geospatial capabilities offer pre-trained models for common uses in agriculture, real estate, insurance, and financial services.

## How can I use SageMaker geospatial capabilities?
<a name="how-use-geo"></a>

You can use SageMaker geospatial capabilities in two ways.
+ Through the SageMaker geospatial UI, as a part of Amazon SageMaker Studio Classic UI.
+ Through a Studio Classic notebook instance that uses the **Geospatial 1.0** image.

**SageMaker AI has the following geospatial capabilities**
+ Use a purpose built SageMaker geospatial image that supports both CPU and GPU-based notebook instances, and also includes commonly used open-source libraries found in geospatial machine learning workflows.
+ Use the Amazon SageMaker Processing and the SageMaker geospatial container to run large-scale workloads with your own datasets, including soil, weather, climate, LiDAR, and commercial aerial and satellite imagery.
+ Run an [Earth Observation job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-eoj.html) for raster data processing.
+ Run a [Vector Enrichment job](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-vej.html) to convert latitude and longitude into human readable addresses, and match noisy GPS traces to specific roads.
+ Use built-in [visualization tools right in Studio Classic to interactively view geospatial data or model predictions on a map.](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-visualize.html)

You can also use data from a collection of geospatial data providers. Currently, the data collections available include:
+ [https://www.usgs.gov/centers/eros/data-citation?qt-science_support_page_related_con=0#qt-science_support_page_related_con](https://www.usgs.gov/centers/eros/data-citation?qt-science_support_page_related_con=0#qt-science_support_page_related_con)
+ [https://sentinels.copernicus.eu/documents/247904/690755/Sentinel_Data_Legal_Notice](https://sentinels.copernicus.eu/documents/247904/690755/Sentinel_Data_Legal_Notice)
+ [https://sentinel.esa.int/web/sentinel/missions/sentinel-2](https://sentinel.esa.int/web/sentinel/missions/sentinel-2)
+ [https://registry.opendata.aws/copernicus-dem/](https://registry.opendata.aws/copernicus-dem/)
+ [https://registry.opendata.aws/naip/](https://registry.opendata.aws/naip/)

## Are you a first-time user of SageMaker geospatial?
<a name="first-time-geospatial-data"></a>

As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. New domains created after November 30, 2023 default to the Studio experience. Access to SageMaker geospatial is limited to Studio Classic, to learn more see [Accessing SageMaker geospatial](access-studio-classic-geospatial.md).

If you're a first-time user of AWS or Amazon SageMaker AI, we recommend that you do the following:

1. **Create an AWS account.**

   To learn about setting up an AWS account and getting started with SageMaker AI, see [Complete Amazon SageMaker AI prerequisites](gs-set-up.md).

1. **Create a user role and execution role that work with SageMaker geospatial**.

   As a managed service, Amazon SageMaker geospatial capabilities performs operations on your behalf on the AWS hardware that SageMaker AI manages. A SageMaker AI execution role an perform only the operations that users grant. To work with SageMaker geospatial capabilities, you must set up a user role and an execution role. For more information, see [SageMaker geospatial capabilities roles](sagemaker-geospatial-roles.md).

1. **Update your trust policy to include SageMaker geospatial**.

   SageMaker geospatial defines an additional service principal. To learn how to create or update your SageMaker AI execution role's trust policy, see [Adding  the SageMaker geospatial service principal to an existing SageMaker AI execution role](sagemaker-geospatial-roles-pass-role.md).

1. **Set up an Amazon SageMaker AI domain to access Amazon SageMaker Studio Classic.**

   To use SageMaker geospatial, a domain is required. For domains created before November 30, 2023 the default experience is Studio Classic. domains created after November 30, 2023 default to the Studio experience. To learn more about accessing Studio Classic from Studio, see [Accessing SageMaker geospatial](access-studio-classic-geospatial.md).

1. **Remember, shut down resources.**

   When you have finished using SageMaker geospatial capabilities, shut down the instance it runs on to avoid incurring additional charges. For more information, see [Shut Down Resources from Amazon SageMaker Studio Classic](notebooks-run-and-manage-shut-down.md). 

**Topics**
+ [How can I use SageMaker geospatial capabilities?](#how-use-geo)
+ [Are you a first-time user of SageMaker geospatial?](#first-time-geospatial-data)
+ [Getting started with Amazon SageMaker geospatial](geospatial-getting-started.md)
+ [Using a processing jobs for custom geospatial workloads](geospatial-custom-operations.md)
+ [Earth Observation Jobs](geospatial-eoj.md)
+ [Vector Enrichment Jobs](geospatial-vej.md)
+ [Visualization Using SageMaker geospatial capabilities](geospatial-visualize.md)
+ [Amazon SageMaker geospatial Map SDK](geospatial-notebook-sdk.md)
+ [SageMaker geospatial capabilities FAQ](geospatial-faq.md)
+ [SageMaker geospatial Security and Permissions](geospatial-security-general.md)
+ [Types of compute instances](geospatial-instances.md)
+ [Data collections](geospatial-data-collections.md)

# Getting started with Amazon SageMaker geospatial
<a name="geospatial-getting-started"></a>

 SageMaker geospatial provides a purpose built **Image** and **Instance type** for Amazon SageMaker Studio Classic notebooks. You can use either CPU or GPU enabled notebooks with the SageMaker geospatial **Image**. You can also visualize your geospatial data using a purpose built visualizer. Furthermore, SageMaker geospatial also provides APIs that allow you to query raster data collections.You can also use pre-trained models to analyze geospatial data, reverse geocoding, and map matching.

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. If prior to November 30, 2023 you created a Amazon SageMaker AI domain, Studio Classic remains the default experience. domains created after November 30, 2023 default to the new Studio experience.

To access and get started using Amazon SageMaker geospatial, do the following:

**Topics**
+ [Accessing SageMaker geospatial](access-studio-classic-geospatial.md)
+ [Create an Amazon SageMaker Studio Classic notebook using the geospatial image](geospatial-launch-notebook.md)
+ [Access the Sentinel-2 raster data collection and create an earth observation job to perform land segmentation](geospatial-demo.md)

# Accessing SageMaker geospatial
<a name="access-studio-classic-geospatial"></a>

**Note**  
Currently, SageMaker geospatial capabilities are only supported in the US West (Oregon) Region and in Studio Classic.  
If you don't see the SageMaker geospatial UI available in your current Studio Classic instance check to make sure you are currently in the US West (Oregon) Region.

A domain is required to access SageMaker geospatial. If you created a domain prior to November 30, 2023 the default experience is Studio Classic.

If you created a domain after November 30, 2023 or if you have migrated to Studio, then you can use the following procedure to activate Studio Classic from within Studio to use SageMaker geospatial features.

To learn more about creating a domain, see [Onboard to Amazon SageMaker AI domain](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html).

**To access Studio Classic from Studio**

1. Launch Amazon SageMaker Studio.

1. Under **Applications**, choose **Studio Classic**.

1. Then, choose **Create Studio Classic space**.

1. On the **Create Studio Classic space** page, enter a **Name**.

1. Disable the **Share with my domain** option. SageMaker geospatial is not available in shared domains.

1. Then choose **Create space**.

When successful the **Status** changes to **Updating**. When your Studio Classic application is ready to be used the status changes to **Stopped**.

To start your Studio Classic application, choose **Run**.

# Create an Amazon SageMaker Studio Classic notebook using the geospatial image
<a name="geospatial-launch-notebook"></a>

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

**Note**  
Currently, SageMaker geospatial is only supported in the US West (Oregon) Region.  
If you don't see SageMaker geospatial available in your current domain or notebook instance, make sure that you're currently in the US West (Oregon) Region.

Use the following procedure to create Studio Classic notebook with the SageMaker geospatial image. If your default studio experience is Studio, see [Accessing SageMaker geospatial](access-studio-classic-geospatial.md) to learn about starting a Studio Classic application.

**To create a Studio Classic notebook with the SageMaker geospatial image**

1. Launch Studio Classic

1. Choose **Home** in the menu bar.

1. Under **Quick actions**, choose **Open Launcher**.

1. When the **Launcher** dialog box opens. Choose **Change environment** under **Notebooks and compute resources**.

1. When, the **Change environment** dialog box opens. Choose the **Image** dropdown and choose or type **Geospatial 1.0**.  
![\[A dialogue boxing showing the correct geospatial image and instance type selected.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geospatial-environment-dialogue.png)

1. Next, choose an **Instance type** from the dropdown.

   SageMaker geospatial supports two types of notebook instances: CPU and GPU. The supported CPU instance is called **ml.geospatial.interactive**. Any of the G5-family of GPU instances can be used with the Geospatial 1.0 image.
**Note**  
If you receive a ResourceLimitExceeded error when attempting to start a GPU based instance, you need to request a quota increase. To get started on a Service Quotas quota increase request, see [Requesting a quota increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide* 

1. Choose **Select**.

1. Choose **Create notebook**.

After creating a notebook, to learn more about SageMaker geospatial, try the [SageMaker geospatial tutorial](geospatial-demo.md). It shows you how to process Sentinel-2 image data and perform land segmentation on it using the earth observation jobs API. 

# Access the Sentinel-2 raster data collection and create an earth observation job to perform land segmentation
<a name="geospatial-demo"></a>

This Python-based tutorial uses the SDK for Python (Boto3) and an Amazon SageMaker Studio Classic notebook. To complete this demo successfully, make sure that you have the required AWS Identity and Access Management (IAM) permissions to use SageMaker geospatial and Studio Classic. SageMaker geospatial requires that you have a user, group, or role which can access Studio Classic. You must also have a SageMaker AI execution role that specifies the SageMaker geospatial service principal, `sagemaker-geospatial.amazonaws.com` in its trust policy. 

To learn more about these requirements, see [SageMaker geospatial IAM roles](sagemaker-geospatial-roles.md).

This tutorial shows you how to use SageMaker geospatial API to complete the following tasks:
+ Find the available raster data collections with `list_raster_data_collections`.
+ Search a specified raster data collection by using `search_raster_data_collection`.
+ Create an earth observation job (EOJ) by using `start_earth_observation_job`.

## Using `list_raster_data_collections` to find available data collections
<a name="demo-use-list-rdc"></a>

SageMaker geospatial supports multiple raster data collections. To learn more about the available data collections, see [Data collections](geospatial-data-collections.md).

This demo uses satellite data that's collected from [Sentinel-2 Cloud-Optimized GeoTIFF](https://registry.opendata.aws/sentinel-2-l2a-cogs/) satellites. These satellites provide global coverage of Earth's land surface every five days. In addition to collecting surface images of Earth, the Sentinel-2 satellites also collect data across a variety of spectralbands.

To search an area of interest (AOI), you need the ARN that's associated with the Sentinel-2 satellite data. To find the available data collections and their associated ARNs in your AWS Region, use the `list_raster_data_collections` API operation.

Because the response can be paginated, you must use the `get_paginator` operation to return all of the relevant data:

```
import boto3
import sagemaker
import sagemaker_geospatial_map
import json 

## SageMaker Geospatial  is currently only avaialable in US-WEST-2  
session = boto3.Session(region_name='us-west-2')
execution_role = sagemaker.get_execution_role()

## Creates a SageMaker Geospatial client instance 
geospatial_client = session.client(service_name="sagemaker-geospatial")

# Creates a resusable Paginator for the list_raster_data_collections API operation 
paginator = geospatial_client.get_paginator("list_raster_data_collections")

# Create a PageIterator from the paginator class
page_iterator = paginator.paginate()

# Use the iterator to iterate throught the results of list_raster_data_collections
results = []
for page in page_iterator:
    results.append(page['RasterDataCollectionSummaries'])

print(results)
```

This is a sample JSON response from the `list_raster_data_collections` API operation. It's truncated to include only the data collection (Sentinel-2) that's used in this code example. For more details about a specific raster data collection, use `get_raster_data_collection`:

```
{
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",
    "Description": "Sentinel-2a and Sentinel-2b imagery, processed to Level 2A (Surface Reflectance) and converted to Cloud-Optimized GeoTIFFs",
    "DescriptionPageUrl": "https://registry.opendata.aws/sentinel-2-l2a-cogs",
    "Name": "Sentinel 2 L2A COGs",
    "SupportedFilters": [
        {
            "Maximum": 100,
            "Minimum": 0,
            "Name": "EoCloudCover",
            "Type": "number"
        },
        {
            "Maximum": 90,
            "Minimum": 0,
            "Name": "ViewOffNadir",
            "Type": "number"
        },
        {
            "Name": "Platform",
            "Type": "string"
        }
    ],
    "Tags": {},
    "Type": "PUBLIC"
}
```

After running the previous code sample, you get the ARN of the Sentinel-2 raster data collection, `arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8`. In the [next section](#demo-search-raster-data), you can query the Sentinel-2 data collection using the `search_raster_data_collection` API.

## Searching the Sentinel-2 raster data collection using `search_raster_data_collection`
<a name="demo-search-raster-data"></a>

In the preceding section, you used `list_raster_data_collections` to get the ARN for the Sentinel-2 data collection. Now you can use that ARN to search the data collection over a given area of interest (AOI), time range, properties, and the available UV bands.

To call the `search_raster_data_collection` API you must pass in a Python dictionary to the `RasterDataCollectionQuery` parameter. This example uses `AreaOfInterest`, `TimeRangeFilter`, `PropertyFilters`, and `BandFilter`. For ease, you can specify the Python dictionary using the variable **search\$1rdc\$1query** to hold the search query parameters:

```
search_rdc_query = {
    "AreaOfInterest": {
        "AreaOfInterestGeometry": {
            "PolygonGeometry": {
                "Coordinates": [
                    [
                        # coordinates are input as longitute followed by latitude 
                        [-114.529, 36.142],
                        [-114.373, 36.142],
                        [-114.373, 36.411],
                        [-114.529, 36.411],
                        [-114.529, 36.142],
                    ]
                ]
            }
        }
    },
    "TimeRangeFilter": {
        "StartTime": "2022-01-01T00:00:00Z",
        "EndTime": "2022-07-10T23:59:59Z"
    },
    "PropertyFilters": {
        "Properties": [
            {
                "Property": {
                    "EoCloudCover": {
                        "LowerBound": 0,
                        "UpperBound": 1
                    }
                }
            }
        ],
        "LogicalOperator": "AND"
    },
    "BandFilter": [
        "visual"
    ]
}
```

In this example, you query an `AreaOfInterest` that includes [Lake Mead](https://en.wikipedia.org/wiki/Lake_Mead) in Utah. Furthermore, Sentinel-2 supports multiple types of image bands. To measure the change in the surface of the water, you only need the `visual` band.

After you create the query parameters, you can use the `search_raster_data_collection` API to make the request. 

The following code sample implements a `search_raster_data_collection` API request. This API does not support pagination using the `get_paginator` API. To make sure that the full API response has been gathered the code sample uses a `while` loop to check that `NextToken` exists. The code sample then uses `.extend()` to append the satellite image URLs and other response metadata to the `items_list`. 

To learn more about the `search_raster_data_collection`, see [SearchRasterDataCollection](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_SearchRasterDataCollection.html) in the *Amazon SageMaker AI API Reference*.

```
search_rdc_response = sm_geo_client.search_raster_data_collection(
    Arn='arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8',
    RasterDataCollectionQuery=search_rdc_query
)


## items_list is the response from the API request. 
items_list = []

## Use the python .get() method to check that the 'NextToken' exists, if null returns None breaking the while loop 
while search_rdc_response.get('NextToken'):
    items_list.extend(search_rdc_response['Items'])
    search_rdc_response = sm_geo_client.search_raster_data_collection(
        Arn='arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8',
        RasterDataCollectionQuery=search_rdc_query, NextToken=search_rdc_response['NextToken']
    )

## Print the number of observation return based on the query
print (len(items_list))
```

The following is a JSON response from your query. It has been truncated for clarity. Only the **"BandFilter": ["visual"]** specified in the request is returned in the `Assets` key-value pair:

```
{
    'Assets': {
        'visual': {
            'Href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/15/T/UH/2022/6/S2A_15TUH_20220623_0_L2A/TCI.tif'
        }
    },
    'DateTime': datetime.datetime(2022, 6, 23, 17, 22, 5, 926000, tzinfo = tzlocal()),
    'Geometry': {
        'Coordinates': [
            [
                [-114.529, 36.142],
                [-114.373, 36.142],
                [-114.373, 36.411],
                [-114.529, 36.411],
                [-114.529, 36.142],
            ]
        ],
        'Type': 'Polygon'
    },
    'Id': 'S2A_15TUH_20220623_0_L2A',
    'Properties': {
        'EoCloudCover': 0.046519,
        'Platform': 'sentinel-2a'
    }
}
```

Now that you have your query results, in the next section you can visualize the results by using `matplotlib`. This is to verify that results are from the correct geographical region. 

## Visualizing your `search_raster_data_collection` using `matplotlib`
<a name="demo-geospatial-visualize"></a>

Before you start the earth observation job (EOJ), you can visualize a result from our query with`matplotlib`. The following code sample takes the first item, `items_list[0]["Assets"]["visual"]["Href"]`, from the `items_list` variable created in the previous code sample and prints an image using `matplotlib`.

```
# Visualize an example image.
import os
from urllib import request
import tifffile
import matplotlib.pyplot as plt

image_dir = "./images/lake_mead"
os.makedirs(image_dir, exist_ok=True)

image_dir = "./images/lake_mead"
os.makedirs(image_dir, exist_ok=True)

image_url = items_list[0]["Assets"]["visual"]["Href"]
img_id = image_url.split("/")[-2]
path_to_image = image_dir + "/" + img_id + "_TCI.tif"
response = request.urlretrieve(image_url, path_to_image)
print("Downloaded image: " + img_id)

tci = tifffile.imread(path_to_image)
plt.figure(figsize=(6, 6))
plt.imshow(tci)
plt.show()
```

After checking that the results are in the correct geographical region, you can start the Earth Observation Job (EOJ) in the next step. You use the EOJ to identify the water bodies from the satellite images by using a process called land segmentation.

## Starting an earth observation job (EOJ) that performs land segmentation on a series of Satellite images
<a name="demo-start-eoj"></a>

SageMaker geospatial provides multiple pre-trained models that you can use to process geospatial data from raster data collections. To learn more about the available pre-trained models and custom operations, see [Types of Operations](geospatial-eoj-models.md).

To calculate the change in the water surface area, you need to identify which pixels in the images correspond to water. Land cover segmentation is a semantic segmentation model supported by the `start_earth_observation_job` API. Semantic segmentation models associate a label with every pixel in each image. In the results, each pixel is assigned a label that's based on the class map for the model. The following is the class map for the land segmentation model:

```
{
    0: "No_data",
    1: "Saturated_or_defective",
    2: "Dark_area_pixels",
    3: "Cloud_shadows",
    4: "Vegetation",
    5: "Not_vegetated",
    6: "Water",
    7: "Unclassified",
    8: "Cloud_medium_probability",
    9: "Cloud_high_probability",
    10: "Thin_cirrus",
    11: "Snow_ice"
}
```

To start an earth observation job, use the `start_earth_observation_job` API. When you submit your request, you must specify the following:
+ `InputConfig` (*dict*) – Used to specify the coordinates of the area that you want to search, and other metadata that's associated with your search.
+ `JobConfig` (*dict*) – Used to specify the type of EOJ operation that you performed on the data. This example uses **LandCoverSegmentationConfig**.
+ `ExecutionRoleArn` (*string*) – The ARN of the SageMaker AI execution role with the necessary permissions to run the job.
+ `Name` (*string*) –A name for the earth observation job.

The `InputConfig` is a Python dictionary. Use the following variable **eoj\$1input\$1config** to hold the search query parameters. Use this variable when you make the `start_earth_observation_job` API request. w.

```
# Perform land cover segmentation on images returned from the Sentinel-2 dataset.
eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates":[
                        [
                            [-114.529, 36.142],
                            [-114.373, 36.142],
                            [-114.373, 36.411],
                            [-114.529, 36.411],
                            [-114.529, 36.142],
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2021-01-01T00:00:00Z",
            "EndTime": "2022-07-10T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 1}}}],
            "LogicalOperator": "AND",
        },
    }
}
```

The `JobConfig` is a Python dictionary that is used to specify the EOJ operation that you want performed on your data:

```
eoj_config = {"LandCoverSegmentationConfig": {}}
```

With the dictionary elements now specified, you can submit your `start_earth_observation_job` API request using the following code sample:

```
# Gets the execution role arn associated with current notebook instance 
execution_role_arn = sagemaker.get_execution_role()

# Starts an earth observation job
response = sm_geo_client.start_earth_observation_job(
    Name="lake-mead-landcover",
    InputConfig=eoj_input_config,
    JobConfig=eoj_config,
    ExecutionRoleArn=execution_role_arn,
)
            
print(response)
```

The start an earth observation job returns an ARN along with other metadata.

To get a list of all ongoing and current earth observation jobs use the `list_earth_observation_jobs` API. To monitor the status of a single earth observation job use the `get_earth_observation_job` API. To make this request, use the ARN created after submitting your EOJ request. To learn more, see [GetEarthObservationJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_GetEarthObservationJob.html) in the *Amazon SageMaker AI API Reference*.

To find the ARNs associated with your EOJs use the `list_earth_observation_jobs` API operation. To learn more, see [ListEarthObservationJobs](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_geospatial_ListEarthObservationJobs.html) in the *Amazon SageMaker AI API Reference*.

```
# List all jobs in the account
sg_client.list_earth_observation_jobs()["EarthObservationJobSummaries"]
```

The following is an example JSON response:

```
{
    'Arn': 'arn:aws:sagemaker-geospatial:us-west-2:111122223333:earth-observation-job/futg3vuq935t',
    'CreationTime': datetime.datetime(2023, 10, 19, 4, 33, 54, 21481, tzinfo = tzlocal()),
    'DurationInSeconds': 3493,
    'Name': 'lake-mead-landcover',
    'OperationType': 'LAND_COVER_SEGMENTATION',
    'Status': 'COMPLETED',
    'Tags': {}
}, {
    'Arn': 'arn:aws:sagemaker-geospatial:us-west-2:111122223333:earth-observation-job/wu8j9x42zw3d',
    'CreationTime': datetime.datetime(2023, 10, 20, 0, 3, 27, 270920, tzinfo = tzlocal()),
    'DurationInSeconds': 1,
    'Name': 'mt-shasta-landcover',
    'OperationType': 'LAND_COVER_SEGMENTATION',
    'Status': 'INITIALIZING',
    'Tags': {}
}
```

After the status of your EOJ job changes to `COMPLETED`, proceed to the next section to calculate the change in Lake Mead's surface area.

## Calculating the change in the Lake Mead surface area
<a name="demo-geospatial-calc"></a>

To calculate the change in Lake Mead's surface area, first export the results of the EOJ to Amazon S3 by using `export_earth_observation_job`:

```
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()  # Replace with your own bucket if needed
s3_bucket = session.resource("s3").Bucket(s3_bucket_name)
prefix = "export-lake-mead-eoj"  # Replace with the S3 prefix desired
export_bucket_and_key = f"s3://{s3_bucket_name}/{prefix}/"

eoj_output_config = {"S3Data": {"S3Uri": export_bucket_and_key}}
export_response = sm_geo_client.export_earth_observation_job(
    Arn="arn:aws:sagemaker-geospatial:us-west-2:111122223333:earth-observation-job/7xgwzijebynp",
    ExecutionRoleArn=execution_role_arn,
    OutputConfig=eoj_output_config,
    ExportSourceImages=False,
)
```

To see the status of your export job, use `get_earth_observation_job`:

```
export_job_details = sm_geo_client.get_earth_observation_job(Arn=export_response["Arn"])
```

To calculate the changes in Lake Mead's water level, download the land cover masks to the local SageMaker notebook instance and download the source images from our previous query. In the class map for the land segmentation model, the water’s class index is 6.

To extract the water mask from a Sentinel-2 image, follow these steps. First, count the number of pixels marked as water (class index 6) in the image. Second, multiply the count by the area that each pixel covers. Bands can differ in their spatial resolution. For the land cover segmentation model all bands are down sampled to a spatial resolution equal to 60 meters.

```
import os
from glob import glob
import cv2
import numpy as np
import tifffile
import matplotlib.pyplot as plt
from urllib.parse import urlparse
from botocore import UNSIGNED
from botocore.config import Config

# Download land cover masks
mask_dir = "./masks/lake_mead"
os.makedirs(mask_dir, exist_ok=True)
image_paths = []
for s3_object in s3_bucket.objects.filter(Prefix=prefix).all():
    path, filename = os.path.split(s3_object.key)
    if "output" in path:
        mask_name = mask_dir + "/" + filename
        s3_bucket.download_file(s3_object.key, mask_name)
        print("Downloaded mask: " + mask_name)

# Download source images for visualization
for tci_url in tci_urls:
    url_parts = urlparse(tci_url)
    img_id = url_parts.path.split("/")[-2]
    tci_download_path = image_dir + "/" + img_id + "_TCI.tif"
    cogs_bucket = session.resource(
        "s3", config=Config(signature_version=UNSIGNED, region_name="us-west-2")
    ).Bucket(url_parts.hostname.split(".")[0])
    cogs_bucket.download_file(url_parts.path[1:], tci_download_path)
    print("Downloaded image: " + img_id)

print("Downloads complete.")

image_files = glob("images/lake_mead/*.tif")
mask_files = glob("masks/lake_mead/*.tif")
image_files.sort(key=lambda x: x.split("SQA_")[1])
mask_files.sort(key=lambda x: x.split("SQA_")[1])
overlay_dir = "./masks/lake_mead_overlay"
os.makedirs(overlay_dir, exist_ok=True)
lake_areas = []
mask_dates = []

for image_file, mask_file in zip(image_files, mask_files):
    image_id = image_file.split("/")[-1].split("_TCI")[0]
    mask_id = mask_file.split("/")[-1].split(".tif")[0]
    mask_date = mask_id.split("_")[2]
    mask_dates.append(mask_date)
    assert image_id == mask_id
    image = tifffile.imread(image_file)
    image_ds = cv2.resize(image, (1830, 1830), interpolation=cv2.INTER_LINEAR)
    mask = tifffile.imread(mask_file)
    water_mask = np.isin(mask, [6]).astype(np.uint8)  # water has a class index 6
    lake_mask = water_mask[1000:, :1100]
    lake_area = lake_mask.sum() * 60 * 60 / (1000 * 1000)  # calculate the surface area
    lake_areas.append(lake_area)
    contour, _ = cv2.findContours(water_mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    combined = cv2.drawContours(image_ds, contour, -1, (255, 0, 0), 4)
    lake_crop = combined[1000:, :1100]
    cv2.putText(lake_crop, f"{mask_date}", (10,50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 0), 3, cv2.LINE_AA)
    cv2.putText(lake_crop, f"{lake_area} [sq km]", (10,100), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 0), 3, cv2.LINE_AA)
    overlay_file = overlay_dir + '/' + mask_date + '.png'
    cv2.imwrite(overlay_file, cv2.cvtColor(lake_crop, cv2.COLOR_RGB2BGR))

# Plot water surface area vs. time.
plt.figure(figsize=(20,10))
plt.title('Lake Mead surface area for the 2021.02 - 2022.07 period.', fontsize=20)
plt.xticks(rotation=45)
plt.ylabel('Water surface area [sq km]', fontsize=14)
plt.plot(mask_dates, lake_areas, marker='o')
plt.grid('on')
plt.ylim(240, 320)
for i, v in enumerate(lake_areas):
    plt.text(i, v+2, "%d" %v, ha='center')
plt.show()
```

Using `matplotlib`, you can visualize the results with a graph. The graph shows that the surface area of Lake Mead decreased from January 2021–July 2022.

![\[A bar graph showing the surface area of Lake Mead decreased from January 2021-July 2022\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/lake-mead-decrease.png)


# Using a processing jobs for custom geospatial workloads
<a name="geospatial-custom-operations"></a>

With [Amazon SageMaker Processing](processing-job.md), you can use a simplified, managed experience on SageMaker AI to run your data processing workloads with the purpose-built geospatial container.

 The underlying infrastructure for a Amazon SageMaker Processing job is fully managed by SageMaker AI. During a processing job, cluster resources are provisioned for the duration of your job, and cleaned up when a job completes.

![\[Running a processing job.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/Processing-1.png)


The preceding diagram shows how SageMaker AI spins up a geospatial processing job. SageMaker AI takes your geospatial workload script, copies your geospatial data from Amazon Simple Storage Service(Amazon S3), and then pulls the specified geospatial container. The underlying infrastructure for the processing job is fully managed by SageMaker AI. Cluster resources are provisioned for the duration of your job, and cleaned up when a job completes. The output of the processing job is stored in the bucket you specified. 

**Path naming constraints**  
The local paths inside a Processing jobs container must begin with **/opt/ml/processing/**.

SageMaker geospatial provides a purpose-built container, `081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest` that can be speciﬁed when running a processing job.

**Topics**
+ [Overview: Run processing jobs using `ScriptProcessor` and a SageMaker geospatial container](geospatial-custom-operations-overview.md)
+ [Using `ScriptProcessor` to calculate the Normalized Difference Vegetation Index (NDVI) using Sentinel-2 satellite data](geospatial-custom-operations-procedure.md)

# Overview: Run processing jobs using `ScriptProcessor` and a SageMaker geospatial container
<a name="geospatial-custom-operations-overview"></a>

SageMaker geospatial provides a purpose-built processing container, `081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest`. You can use this container when running a job with Amazon SageMaker Processing. When you create an instance of the [https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor) class that is available through the *Amazon SageMaker Python SDK for Processing*, specify this `image_uri`.

**Note**  
If you receive a ResourceLimitExceeded error when attempting to start a processing job, you need to request a quota increase. To get started on a Service Quotas quota increase request, see [Requesting a quota increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide* 

**Prerequisites for using `ScriptProcessor`**

1. You have created a Python script that specifies your geospatial ML workload.

1. You have granted the SageMaker AI execution role access to any Amazon S3 buckets that are needed.

1. Prepare your data for import into the container. Amazon SageMaker Processing jobs support either setting the `s3_data_type` equal to `"ManifestFile"` or to `"S3Prefix"`.

The following procedure show you how to create an instance of `ScriptProcessor` and submit a Amazon SageMaker Processing job using the SageMaker geospatial container.

**To create a `ScriptProcessor` instance and submit a Amazon SageMaker Processing job using a SageMaker geospatial container**

1. Instantiate an instance of the `ScriptProcessor` class using the SageMaker geospatial image:

   ```
   from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
   	
   sm_session = sagemaker.session.Session()
   execution_role_arn = sagemaker.get_execution_role()
   
   # purpose-built geospatial container
   image_uri = '081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest'
   
   script_processor = ScriptProcessor(
   	command=['python3'],
   	image_uri=image_uri,
   	role=execution_role_arn,
   	instance_count=4,
   	instance_type='ml.m5.4xlarge',
   	sagemaker_session=sm_session
   )
   ```

   Replace *execution\$1role\$1arn* with the ARN of the SageMaker AI execution role that has access to the input data stored in Amazon S3 and any other AWS services that you want to call in your processing job. You can update the `instance_count` and the `instance_type` to match the requirements of your processing job.

1. To start a processing job, use the `.run()` method:

   ```
   # Can be replaced with any S3 compliant string for the name of the folder.
   s3_folder = geospatial-data-analysis
   
   # Use .default_bucket() to get the name of the S3 bucket associated with your current SageMaker session
   s3_bucket = sm_session.default_bucket()
   					
   s3_manifest_uri = f's3://{s3_bucket}/{s3_folder}/manifest.json'
   s3_prefix_uri =  f's3://{s3_bucket}/{s3_folder}/image-prefix
   
   script_processor.run(
   	code='preprocessing.py',
   	inputs=[
   		ProcessingInput(
   			source=s3_manifest_uri | s3_prefix_uri ,
   			destination='/opt/ml/processing/input_data/',
   			s3_data_type= "ManifestFile" | "S3Prefix",
   			s3_data_distribution_type= "ShardedByS3Key" | "FullyReplicated"
   		)
   	],
   	outputs=[
           ProcessingOutput(
               source='/opt/ml/processing/output_data/',
               destination=s3_output_prefix_url
           )
       ]
   )
   ```
   + Replace *preprocessing.py* with the name of your own Python data processing script.
   + A processing job supports two methods for formatting your input data. You can either create a manifest file that points to all of the input data for your processing job, or you can use a common prefix on each individual data input. If you created a manifest file set `s3_manifest_uri` equal to `"ManifestFile"`. If you used a file prefix set `s3_manifest_uri` equal to `"S3Prefix"`. You specify the path to your data using `source`.
   + You can distribute your processing job data two ways:
     + Distribute your data to all processing instances by setting `s3_data_distribution_type` equal to `FullyReplicated`.
     + Distribute your data in shards based on the Amazon S3 key by setting `s3_data_distribution_type` equal to `ShardedByS3Key`. When you use `ShardedByS3Key` one shard of data is sent to each processing instance.

    You can use a script to process SageMaker geospatial data. That script can be found in [Step 3: Writing a script that can calculate the NDVI](geospatial-custom-operations-procedure.md#geospatial-custom-operations-script-mode). To learn more about the `.run()` API operation, see [https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor.run](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor.run) in the *Amazon SageMaker Python SDK for Processing*.

To monitor the progress of your processing job, the `ProcessingJobs` class supports a [https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingJob.describe](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingJob.describe) method. This method returns a response from the `DescribeProcessingJob` API call. To learn more, see [`DescribeProcessingJob` in the *Amazon SageMaker AI API Reference*](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html).

The next topic show you how to create an instance of the `ScriptProcessor` class using the SageMaker geospatial container, and then how to use it to calculate the Normalized Difference Vegetation Index (NDVI) with Sentinel-2 images.



# Using `ScriptProcessor` to calculate the Normalized Difference Vegetation Index (NDVI) using Sentinel-2 satellite data
<a name="geospatial-custom-operations-procedure"></a>

The following code samples show you how to calculate the normalized difference vegetation index of a specific geographical area using the purpose-built geospatial image within a Studio Classic notebook and run a large-scale workload with Amazon SageMaker Processing using [https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor) from the SageMaker AI Python SDK.

This demo also uses an Amazon SageMaker Studio Classic notebook instance that uses the geospatial kernel and instance type. To learn how to create a Studio Classic geospatial notebook instance, see [Create an Amazon SageMaker Studio Classic notebook using the geospatial image](geospatial-launch-notebook.md).

You can follow along with this demo in your own notebook instance by copying and pasting the following code snippets:

1. [Use `search_raster_data_collection` to query a specific area of interest (AOI) over a given a time range using a specific raster data collection, Sentinel-2.](#geospatial-custom-operations-procedure-search)

1. [Create a manifest file that specifies what data will be processed during the processing job.](#geospatial-custom-operations-procedure-manifest)

1. [Write a data processing Python script calculating the NDVI.](#geospatial-custom-operations-script-mode)

1. [Create a `ScriptProcessor` instance and start the Amazon SageMaker Processing job](#geospatial-custom-operations-create).

1. [Visualizing the results of your processing job](#geospatial-custom-operations-visual).

## Query the Sentinel-2 raster data collection using `SearchRasterDataCollection`
<a name="geospatial-custom-operations-procedure-search"></a>

With `search_raster_data_collection` you can query supported raster data collections. This example uses data that's pulled from Sentinel-2 satellites. The area of interest (`AreaOfInterest`) specified is rural northern Iowa, and the time range (`TimeRangeFilter`) is January 1, 2022 to December 30, 2022. To see the available raster data collections in your AWS Region use `list_raster_data_collections`. To see a code example using this API, see [`ListRasterDataCollections`](geospatial-data-collections.md) in the *Amazon SageMaker AI Developer Guide*.

In following code examples you use the ARN associated with Sentinel-2 raster data collection, `arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8`.

A `search_raster_data_collection` API request requires two parameters:
+ You need to specify an `Arn` parameter that corresponds to the raster data collection that you want to query.
+ You also need to specify a `RasterDataCollectionQuery` parameter, which takes in a Python dictionary.

The following code example contains the necessary key-value pairs needed for the `RasterDataCollectionQuery` parameter saved to the `search_rdc_query` variable.

```
search_rdc_query = {
    "AreaOfInterest": {
        "AreaOfInterestGeometry": {
            "PolygonGeometry": {
                "Coordinates": [[
                    [
              -94.50938680498298,
              43.22487436936203
            ],
            [
              -94.50938680498298,
              42.843474642037194
            ],
            [
              -93.86520004156142,
              42.843474642037194
            ],
            [
              -93.86520004156142,
              43.22487436936203
            ],
            [
              -94.50938680498298,
              43.22487436936203
            ]
               ]]
            }
        }
    },
    "TimeRangeFilter": {"StartTime": "2022-01-01T00:00:00Z", "EndTime": "2022-12-30T23:59:59Z"}
}
```

To make the `search_raster_data_collection` request, you must specify the ARN of the Sentinel-2 raster data collection: `arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8`. You also must need to pass in the Python dictionary that was defined previously, which specifies query parameters. 

```
## Creates a SageMaker Geospatial client instance 
sm_geo_client= session.create_client(service_name="sagemaker-geospatial")

search_rdc_response1 = sm_geo_client.search_raster_data_collection(
    Arn='arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8',
    RasterDataCollectionQuery=search_rdc_query
)
```

The results of this API can not be paginated. To collect all the satellite images returned by the `search_raster_data_collection` operation, you can implement a `while` loop. This checks for`NextToken` in the API response:

```
## Holds the list of API responses from search_raster_data_collection
items_list = []
while search_rdc_response1.get('NextToken') and search_rdc_response1['NextToken'] != None:
    items_list.extend(search_rdc_response1['Items'])
    
    search_rdc_response1 = sm_geo_client.search_raster_data_collection(
    	Arn='arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8',
        RasterDataCollectionQuery=search_rdc_query, 
        NextToken=search_rdc_response1['NextToken']
    )
```

The API response returns a list of URLs under the `Assets` key corresponding to specific image bands. The following is a truncated version of the API response. Some of the image bands were removed for clarity.

```
{
	'Assets': {
        'aot': {
            'Href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/15/T/UH/2022/12/S2A_15TUH_20221230_0_L2A/AOT.tif'
        },
        'blue': {
            'Href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/15/T/UH/2022/12/S2A_15TUH_20221230_0_L2A/B02.tif'
        },
        'swir22-jp2': {
            'Href': 's3://sentinel-s2-l2a/tiles/15/T/UH/2022/12/30/0/B12.jp2'
        },
        'visual-jp2': {
            'Href': 's3://sentinel-s2-l2a/tiles/15/T/UH/2022/12/30/0/TCI.jp2'
        },
        'wvp-jp2': {
            'Href': 's3://sentinel-s2-l2a/tiles/15/T/UH/2022/12/30/0/WVP.jp2'
        }
    },
    'DateTime': datetime.datetime(2022, 12, 30, 17, 21, 52, 469000, tzinfo = tzlocal()),
    'Geometry': {
        'Coordinates': [
            [
                [-95.46676936182894, 43.32623760511659],
                [-94.11293433656887, 43.347431265475954],
                [-94.09532154452742, 42.35884880571144],
                [-95.42776890002203, 42.3383710796791],
                [-95.46676936182894, 43.32623760511659]
            ]
        ],
        'Type': 'Polygon'
    },
    'Id': 'S2A_15TUH_20221230_0_L2A',
    'Properties': {
        'EoCloudCover': 62.384969,
        'Platform': 'sentinel-2a'
    }
}
```

In the [next section](#geospatial-custom-operations-procedure-manifest), you create a manifest file using the `'Id'` key from the API response.

## Create an input manifest file using the `Id` key from the `search_raster_data_collection` API response
<a name="geospatial-custom-operations-procedure-manifest"></a>

When you run a processing job, you must specify a data input from Amazon S3. The input data type can either be a manifest file, which then points to the individual data files. You can also add a prefix to each file that you want processed. The following code example defines the folder where your manifest files will be generated.

Use SDK for Python (Boto3) to get the default bucket and the ARN of the execution role that is associated with your Studio Classic notebook instance:

```
sm_session = sagemaker.session.Session()
s3 = boto3.resource('s3')
# Gets the default excution role associated with the notebook
execution_role_arn = sagemaker.get_execution_role() 

# Gets the default bucket associated with the notebook
s3_bucket = sm_session.default_bucket() 

# Can be replaced with any name
s3_folder = "script-processor-input-manifest"
```

Next, you create a manifest file. It will hold the URLs of the satellite images that you wanted processed when you run your processing job later in step 4.

```
# Format of a manifest file
manifest_prefix = {}
manifest_prefix['prefix'] = 's3://' + s3_bucket + '/' + s3_folder + '/'
manifest = [manifest_prefix]

print(manifest)
```

The following code sample returns the S3 URI where your manifest files will be created.

```
[{'prefix': 's3://sagemaker-us-west-2-111122223333/script-processor-input-manifest/'}]
```

All the response elements from the search\$1raster\$1data\$1collection response are not needed to run the processing job. 

The following code snippet removes the unnecessary elements `'Properties'`, `'Geometry'`, and `'DateTime'`. The `'Id'` key-value pair, `'Id': 'S2A_15TUH_20221230_0_L2A'`, contains the year and the month. The following code example parses that data to create new keys in the Python dictionary **dict\$1month\$1items**. The values are the assets that are returned from the `SearchRasterDataCollection` query. 

```
# For each response get the month and year, and then remove the metadata not related to the satelite images.
dict_month_items = {}
for item in items_list:
    # Example ID being split: 'S2A_15TUH_20221230_0_L2A' 
    yyyymm = item['Id'].split("_")[2][:6]
    if yyyymm not in dict_month_items:
        dict_month_items[yyyymm] = []
    
    # Removes uneeded metadata elements for this demo 
    item.pop('Properties', None)
    item.pop('Geometry', None)
    item.pop('DateTime', None)

    # Appends the response from search_raster_data_collection to newly created key above
    dict_month_items[yyyymm].append(item)
```

This code example uploads the `dict_month_items` to Amazon S3 as a JSON object using the [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_file.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_file.html) API operation:

```
## key_ is the yyyymm timestamp formatted above
## value_ is the reference to all the satellite images collected via our searchRDC query 
for key_, value_ in dict_month_items.items():
    filename = f'manifest_{key_}.json'
    with open(filename, 'w') as fp:
        json.dump(value_, fp)
    s3.meta.client.upload_file(filename, s3_bucket, s3_folder + '/' + filename)
    manifest.append(filename)
    os.remove(filename)
```

This code example uploads a parent `manifest.json` file that points to all the other manifests uploaded to Amazon S3. It also saves the path to a local variable: **s3\$1manifest\$1uri**. You'll use that variable again to specify the source of the input data when you run the processing job in step 4.

```
with open('manifest.json', 'w') as fp:
    json.dump(manifest, fp)
s3.meta.client.upload_file('manifest.json', s3_bucket, s3_folder + '/' + 'manifest.json')
os.remove('manifest.json')

s3_manifest_uri = f's3://{s3_bucket}/{s3_folder}/manifest.json'
```

Now that you created the input manifest files and uploaded them, you can write a script that processes your data in the processing job. It processes the data from the satellite images, calculates the NDVI, and then returns the results to a different Amazon S3 location.

## Write a script that calculates the NDVI
<a name="geospatial-custom-operations-script-mode"></a>

Amazon SageMaker Studio Classic supports the use of the `%%writefile` cell magic command. After running a cell with this command, its contents will be saved to your local Studio Classic directory. This is code specific to calculating NDVI. However, the following can be useful when you write your own script for a processing job:
+ In your processing job container, the local paths inside the container must begin with `/opt/ml/processing/`. In this example, **input\$1data\$1path = '/opt/ml/processing/input\$1data/' ** and **processed\$1data\$1path = '/opt/ml/processing/output\$1data/'** are specified in that way.
+ With Amazon SageMaker Processing, a script that a processing job runs can upload your processed data directly to Amazon S3. To do so, make sure that the execution role associated with your `ScriptProcessor` instance has the necessary requirements to access the S3 bucket. You can also specify an outputs parameter when you run your processing job. To learn more, see the [`.run()` API operation ](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor.run) in the *Amazon SageMaker Python SDK*. In this code example, the results of the data processing are uploaded directly to Amazon S3.
+ To manage the size of the Amazon EBScontainer attached to your processing jobuse the `volume_size_in_gb` parameter. The containers's default size is 30 GB. You can aslo optionally use the Python library [Garbage Collector](https://docs.python.org/3/library/gc.html) to manage storage in your Amazon EBS container.

  The following code example loads the arrays into the processing job container. When arrays build up and fill in the memory, the processing job crashes. To prevent this crash, the following example contains commands that remove the arrays from the processing job’s container.

```
%%writefile compute_ndvi.py

import os
import pickle
import sys
import subprocess
import json
import rioxarray

if __name__ == "__main__":
    print("Starting processing")
    
    input_data_path = '/opt/ml/processing/input_data/'
    input_files = []
    
    for current_path, sub_dirs, files in os.walk(input_data_path):
        for file in files:
            if file.endswith(".json"):
                input_files.append(os.path.join(current_path, file))
    
    print("Received {} input_files: {}".format(len(input_files), input_files))

    items = []
    for input_file in input_files:
        full_file_path = os.path.join(input_data_path, input_file)
        print(full_file_path)
        with open(full_file_path, 'r') as f:
            items.append(json.load(f))
            
    items = [item for sub_items in items for item in sub_items]

    for item in items:
        red_uri = item["Assets"]["red"]["Href"]
        nir_uri = item["Assets"]["nir"]["Href"]

        red = rioxarray.open_rasterio(red_uri, masked=True)
        nir = rioxarray.open_rasterio(nir_uri, masked=True)

        ndvi = (nir - red)/ (nir + red)
        
        file_name = 'ndvi_' + item["Id"] + '.tif'
        output_path = '/opt/ml/processing/output_data'
        output_file_path = f"{output_path}/{file_name}"
        
        ndvi.rio.to_raster(output_file_path)
        print("Written output:", output_file_path)
```

You now have a script that can calculate the NDVI. Next, you can create an instance of the ScriptProcessor and run your Processing job.

## Creating an instance of the `ScriptProcessor` class
<a name="geospatial-custom-operations-create"></a>

This demo uses the [ScriptProcessor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ScriptProcessor) class that is available via the Amazon SageMaker Python SDK. First, you need to create an instance of the class, and then you can start your Processing job by using the `.run()` method.

```
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

image_uri = '081189585635.dkr.ecr.us-west-2.amazonaws.com/sagemaker-geospatial-v1-0:latest'

processor = ScriptProcessor(
	command=['python3'],
	image_uri=image_uri,
	role=execution_role_arn,
	instance_count=4,
	instance_type='ml.m5.4xlarge',
	sagemaker_session=sm_session
)

print('Starting processing job.')
```

When you start your Processing job, you need to specify a [https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingInput](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingInput) object. In that object, you specify the following:
+ The path to the manifest file that you created in step 2, **s3\$1manifest\$1uri**. This is the source of the input data to the container.
+ The path to where you want the input data to be saved in the container. This must match the path that you specified in your script.
+ Use the `s3_data_type` parameter to specify the input as `"ManifestFile"`.

```
s3_output_prefix_url = f"s3://{s3_bucket}/{s3_folder}/output"

processor.run(
    code='compute_ndvi.py',
    inputs=[
        ProcessingInput(
            source=s3_manifest_uri,
            destination='/opt/ml/processing/input_data/',
            s3_data_type="ManifestFile",
            s3_data_distribution_type="ShardedByS3Key"
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/output_data/',
            destination=s3_output_prefix_url,
            s3_upload_mode="Continuous"
        )
    ]
)
```

The following code example uses the [`.describe()` method](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingJob.describe) to get details of your Processing job.

```
preprocessing_job_descriptor = processor.jobs[-1].describe()
s3_output_uri = preprocessing_job_descriptor["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
print(s3_output_uri)
```

## Visualizing your results using `matplotlib`
<a name="geospatial-custom-operations-visual"></a>

With the [Matplotlib](https://matplotlib.org/stable/index.html) Python library, you can plot raster data. Before you plot the data, you need to calculate the NDVI using sample images from the Sentinel-2 satellites. The following code example opens the image arrays using the `.open_rasterio()` API operation, and then calculates the NDVI using the `nir` and `red` image bands from the Sentinel-2 satellite data. 

```
# Opens the python arrays 
import rioxarray

red_uri = items[25]["Assets"]["red"]["Href"]
nir_uri = items[25]["Assets"]["nir"]["Href"]

red = rioxarray.open_rasterio(red_uri, masked=True)
nir = rioxarray.open_rasterio(nir_uri, masked=True)

# Calculates the NDVI
ndvi = (nir - red)/ (nir + red)

# Common plotting library in Python 
import matplotlib.pyplot as plt

f, ax = plt.subplots(figsize=(18, 18))
ndvi.plot(cmap='viridis', ax=ax)
ax.set_title("NDVI for {}".format(items[25]["Id"]))
ax.set_axis_off()
plt.show()
```

The output of the preceding code example is a satellite image with the NDVI values overlaid on it. An NDVI value near 1 indicates lots of vegetation is present, and values near 0 indicate no vegetation is presentation.

![\[A satellite image of northern Iowa with the NDVI overlaid on top\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/ndvi-iowa.png)


This completes the demo of using `ScriptProcessor`.

# Earth Observation Jobs
<a name="geospatial-eoj"></a>

Using an Earth Observation job (EOJ), you can acquire, transform, and visualize geospatial data to make predictions. You can choose an operation based on your use case from a wide range of operations and models. You get the flexibility of choosing your area of interest, selecting the data providers, and setting time-range based and cloud-cover-percentage-based filters. After SageMaker AI creates an EOJ for you, you can visualize the inputs and outputs of the job using the visualization functionality. An EOJ has various use cases that include comparing deforestation over time and diagnosing plant health. You can create an EOJ by using a SageMaker notebook with a SageMaker geospatial image. You can also access the SageMaker geospatial UI as a part of Amazon SageMaker Studio Classic UI to view the list of all your jobs. You can also use the UI to pause or stop an ongoing job. You can choose a job from the list of available EOJ to view the **Job summary**, the **Job details** as well as visualize the **Job output**.

**Topics**
+ [Create an Earth Observation Job Using a Amazon SageMaker Studio Classic Notebook with a SageMaker geospatial Image](geospatial-eoj-ntb.md)
+ [Types of Operations](geospatial-eoj-models.md)

# Create an Earth Observation Job Using a Amazon SageMaker Studio Classic Notebook with a SageMaker geospatial Image
<a name="geospatial-eoj-ntb"></a>

**To use a SageMaker Studio Classic notebook with a SageMaker geospatial image:**

1. From the **Launcher**, choose **Change environment** under **Notebooks and compute resources**.

1. Next, the **Change environment** dialog opens.

1. Select the **Image** dropdown and choose **Geospatial 1.0**. The **Instance type** should be **ml.geospatial.interactive**. Do not change the default values for other settings.

1. Choose **Select**.

1. Choose **Create notebook**.

You can initiate an EOJ using a Amazon SageMaker Studio Classic notebook with a SageMaker geospatial image using the code provided below.

```
import boto3
import sagemaker
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
sg_client = session.client(service_name="sagemaker-geospatial")
```

The following is an example showing how to create an EOJ in the in the US West (Oregon) Region.

```
#Query and Access Data
search_rdc_args = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",  # sentinel-2 L2A COG
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-114.529, 36.142],
                            [-114.373, 36.142],
                            [-114.373, 36.411],
                            [-114.529, 36.411],
                            [-114.529, 36.142],
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2021-01-01T00:00:00Z",
            "EndTime": "2022-07-10T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 1}}}],
            "LogicalOperator": "AND",
        },
        "BandFilter": ["visual"],
    },
}

tci_urls = []
data_manifests = []
while search_rdc_args.get("NextToken", True):
    search_result = sg_client.search_raster_data_collection(**search_rdc_args)
    if search_result.get("NextToken"):
        data_manifests.append(search_result)
    for item in search_result["Items"]:
        tci_url = item["Assets"]["visual"]["Href"]
        print(tci_url)
        tci_urls.append(tci_url)

    search_rdc_args["NextToken"] = search_result.get("NextToken")
        
# Perform land cover segmentation on images returned from the sentinel dataset.
eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-114.529, 36.142],
                            [-114.373, 36.142],
                            [-114.373, 36.411],
                            [-114.529, 36.411],
                            [-114.529, 36.142],
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2021-01-01T00:00:00Z",
            "EndTime": "2022-07-10T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 1}}}],
            "LogicalOperator": "AND",
        },
    }
}
eoj_config = {"LandCoverSegmentationConfig": {}}

response = sg_client.start_earth_observation_job(
    Name="lake-mead-landcover",
    InputConfig=eoj_input_config,
    JobConfig=eoj_config,
    ExecutionRoleArn=execution_role,
)
```

After your EOJ is created, the `Arn` is returned to you. You use the `Arn` to identify a job and perform further operations. To get the status of a job, you can run `sg_client.get_earth_observation_job(Arn = response['Arn'])`.

The following example shows how to query the status of an EOJ until it is completed.

```
eoj_arn = response["Arn"]
job_details = sg_client.get_earth_observation_job(Arn=eoj_arn)
{k: v for k, v in job_details.items() if k in ["Arn", "Status", "DurationInSeconds"]}
# List all jobs in the account
sg_client.list_earth_observation_jobs()["EarthObservationJobSummaries"]
```

After the EOJ is completed, you can visualize the EOJ outputs directly in the notebook. The following example shows you how an interactive map can be rendered.

```
map = sagemaker_geospatial_map.create_map({
'is_raster': True
})
map.set_sagemaker_geospatial_client(sg_client)
# render the map
map.render()
```

The following example shows how the map can be centered on an area of interest and the input and output of the EOJ can be rendered as separate layers within the map.

```
# visualize the area of interest
config = {"label": "Lake Mead AOI"}
aoi_layer = map.visualize_eoj_aoi(Arn=eoj_arn, config=config)

# Visualize input.
time_range_filter = {
    "start_date": "2022-07-01T00:00:00Z",
    "end_date": "2022-07-10T23:59:59Z",
}
config = {"label": "Input"}

input_layer = map.visualize_eoj_input(
    Arn=eoj_arn, config=config, time_range_filter=time_range_filter
)
# Visualize output, EOJ needs to be in completed status.
time_range_filter = {
    "start_date": "2022-07-01T00:00:00Z",
    "end_date": "2022-07-10T23:59:59Z",
}
config = {"preset": "singleBand", "band_name": "mask"}
output_layer = map.visualize_eoj_output(
    Arn=eoj_arn, config=config, time_range_filter=time_range_filter
)
```

You can use the `export_earth_observation_job` function to export the EOJ results to your Amazon S3 bucket. The export function makes it convenient to share results across teams. SageMaker AI also simplifies dataset management. We can simply share the EOJ results using the job ARN, instead of crawling thousands of files in the S3 bucket. Each EOJ becomes an asset in the data catalog, as results can be grouped by the job ARN. The following example shows how you can export the results of an EOJ.

```
sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()  # Replace with your own bucket if needed
s3_bucket = session.resource("s3").Bucket(s3_bucket_name)
prefix = "eoj_lakemead"  # Replace with the S3 prefix desired
export_bucket_and_key = f"s3://{s3_bucket_name}/{prefix}/"

eoj_output_config = {"S3Data": {"S3Uri": export_bucket_and_key}}
export_response = sg_client.export_earth_observation_job(
    Arn=eoj_arn,
    ExecutionRoleArn=execution_role,
    OutputConfig=eoj_output_config,
    ExportSourceImages=False,
)
```

You can monitor the status of your export job using the following snippet.

```
# Monitor the export job status
export_job_details = sg_client.get_earth_observation_job(Arn=export_response["Arn"])
{k: v for k, v in export_job_details.items() if k in ["Arn", "Status", "DurationInSeconds"]}
```

You are not charged the storage fees after you delete the EOJ.

For an example that showcases how to run an EOJ, see this [blog post](https://aws.amazon.com/blogs/machine-learning/monitoring-lake-mead-drought-using-the-new-amazon-sagemaker-geospatial-capabilities/).

For more example notebooks on SageMaker geospatial capabilities, see this [GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-geospatial).

# Types of Operations
<a name="geospatial-eoj-models"></a>

When you create an EOJ, you select an operation based on your use case. Amazon SageMaker geospatial capabilities provide a combination of purpose-built operations and pre-trained models. You can use these operations to understand the impact of environmental changes and human activities over time or identify cloud and cloud-free pixels.

**Cloud Masking**

Identify clouds in satellite images is an essential pre-processing step in producing high-quality geospatial data. Ignoring cloud pixels can lead to errors in analysis, and over-detection of cloud pixels can decrease the number of valid observations. Cloud masking has the ability to identify cloudy and cloud-free pixels in satellite images. An accurate cloud mask helps get satellite images for processing and improves data generation. The following is the class map for cloud masking.

```
{
0: "No_cloud",
1: "cloud"
}
```

**Cloud Removal**

Cloud removal for Sentinel-2 data uses an ML-based semantic segmentation model to identify clouds in the image. Cloudy pixels can be replaced by with pixels from other timestamps. USGS Landsat data contains landsat metadata that is used for cloud removal.

**Temporal Statistics**

Temporal statistics calculate statistics for geospatial data through time. The temporal statistics currently supported include mean, median, and standard deviation. You can calculate these statistics by using `GROUPBY` and set it to either `all` or `yearly`. You can also mention the `TargetBands`.

**Zonal Statistics**

Zonal statistics performs statistical operations over a specified area on the image. 

**Resampling**

Resampling is used to upscale and downscale the resolution of a geospatial image. The `value` attribute in resampling represents the length of a side of the pixel.

**Geomosaic**

Geomosaic allows you to stitch smaller images into a large image.

**Band Stacking**

Band stacking takes more than one image band as input and stacks them into a single GeoTIFF. The `OutputResolution` attribute determines the resolution of the output image. Based on the resolutions of the input images, you can set it to `lowest`, `highest` or `average`.

**Band Math**

Band Math, also known as Spectral Index, is a process of transforming the observations from multiple spectral bands to a single band, indicating the relative abundance of features of interests. For instance, Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) are helpful for observing the presence of green vegetation features.

**Land Cover Segmentation**

Land Cover segmentation is a semantic segmentation model that has the capability to identify the physical material, such as vegetation, water, and bare ground, at the earth surface. Having an accurate way to map the land cover patterns helps you understand the impact of environmental change and human activities over time. Land Cover segmentation is often used for region planning, disaster response, ecological management, and environmental impact assessment. The following is the class map for Land Cover segmentation.

```
{
0: "No_data",
1: "Saturated_or_defective",
2: "Dark_area_pixels",
3: "Cloud_shadows",
4: "Vegetation",
5: "Not_vegetated",
6: "Water",
7: "Unclassified",
8: "Cloud_medium_probability",
9: "Cloud_high_probability",
10: "Thin_cirrus",
11: "Snow_ice"
}
```

## Availability of EOJ Operations
<a name="geospatial-eoj-models-avail"></a>

The availability of operations depends on whether you are using the SageMaker geospatial UI or the Amazon SageMaker Studio Classic notebooks with a SageMaker geospatial image. Currently, notebooks support all functionalities. To summarize, the following geospatial operations are supported by SageMaker AI:


| Operations |  Description  |  Availability  | 
| --- | --- | --- | 
| Cloud Masking | Identify cloud and cloud-free pixels to get improved and accurate satellite imagery. | UI, Notebook | 
| Cloud Removal | Remove pixels containing parts of a cloud from satellite imagery. | Notebook | 
| Temporal Statistics | Calculate statistics through time for a given GeoTIFF. | Notebook | 
| Zonal Statistics | Calculate statistics on user-defined regions. | Notebook | 
| Resampling | Scale images to different resolutions. | Notebook | 
| Geomosaic | Combine multiple images for greater fidelity. | Notebook | 
| Band Stacking | Combine multiple spectral bands to create a single image. | Notebook | 
| Band Math / Spectral Index | Obtain a combination of spectral bands that indicate the abundance of features of interest. | UI, Notebook | 
| Land Cover Segmentation | Identify land cover types such as vegetation and water in satellite imagery. | UI, Notebook | 

# Vector Enrichment Jobs
<a name="geospatial-vej"></a>

A Vector Enrichment Job (VEJ) performs operations on your vector data. Currently, you can use a VEJ to do reverse geocoding or map matching.
<a name="geospatial-vej-rev-geo"></a>
**Reverse Geocoding**  
With a reverse geocoding VEJ, you can convert geographic coordinates (latitude, longitude) to human-readable addresses powered by Amazon Location Service. When you upload a CSV file containing the longitude and latitude coordinates, a it returns the address number, country, label, municipality, neighborhood, postal code and region of that location. The output file consists of your input data along with columns containing these the values appended at the end. These jobs are optimized to accept tens of thousands of GPS traces. 
<a name="geospatial-vej-map-match"></a>
**Map Matching**  
Map matching allows you to snap GPS coordinates to road segments. The input should be a CSV file containing the trace ID (route), longitude, latitude and the timestamp attributes. There can be multiple GPS co-ordinates per route. The input can contain multiple routes too. The output is a GeoJSON file that contains links of the predicted route. It also has the snap points provided in the input. These jobs are optimized to accept tens of thousands of drives in one request. Map matching is supported by [OpenStreetMap](https://www.openstreetmap.org/). Map matching fails if the names in the input source field don't match the ones in `MapMatchingConfig`. The error message you receive contains the the field names present in the input file and the expected field name that is not found in `MapMatchingConfig`. 

The input CSV file for a VEJ must contain the following:
+ A header row
+ Latitude and longitude in separate columns
+ The ID and Timestamp columns can be in numeric or string format. All other column data must be in numeric format only
+ No miss matching quotes

For the timestamp column, SageMaker geospatial capabilities supports epoch time in seconds and milliseconds (long integer). The string formats supported are as follows:
+ "dd.MM.yyyy HH:mm:ss z"
+ "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
+ "yyyy-MM-dd'T'HH:mm:ss"
+ "yyyy-MM-dd hh:mm:ss a"
+ "yyyy-MM-dd HH:mm:ss"
+ "yyyyMMddHHmmss"

While you need to use an Amazon SageMaker Studio Classic notebook to execute a VEJ, you can view all the jobs you create using the UI. To use the visualization in the notebook, you first need to export your output to your S3 bucket. The VEJ actions you can perform are as follows.
+ [StartVectorEnrichmentJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_StartVectorEnrichmentJob.html)
+ [GetVectorEnrichmentJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_GetVectorEnrichmentJob.html)
+ [ListVectorEnrichmentJobs](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_ListVectorEnrichmentJobs.html)
+ [StopVectorEnrichmentJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_StopVectorEnrichmentJob.html)
+ [DeleteVectorEnrichmentJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_DeleteVectorEnrichmentJob.html)

# Visualization Using SageMaker geospatial capabilities
<a name="geospatial-visualize"></a>

Using the visualization functionalities provided by Amazon SageMaker geospatial you can visualize geospatial data, the inputs to your EOJ or VEJ jobs as well as the outputs exported from your Amazon S3 bucket. The visualization tool is powered by [Foursquare Studio](https://studio.foursquare.com/home). The following image depicts the visualization tool supported by SageMaker geospatial capabilities. 

![\[Visualization tool using SageMaker geospatial capabilities shows a map of the California coast.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geospatial_vis.png)


You can use the left navigation panel to add data, layers, filters, and columns. You can also make modifications to how you interact with the map.

**Dataset**

The source of data used for visualization is called a **Dataset**. To add data for visualization, choose **Add Data** in the left navigation panel. You can either upload the data from your Amazon S3 bucket or your local machine. The data formats supported are CSV, JSON and GeoJSON. You can add multiple datasets to your map. After you upload the dataset, you can see it loaded on the map screen. 

**Layers**

In the layer panel, a layer is created and populated automatically when you add a dataset. If your map consists of more than one dataset, you can select which dataset belongs to a layer. You can create new layers and group them. SageMaker SageMaker geospatial capabilities support various layer types, including point, arc, icon, and polygon. 

You can choose any data point in a layer to have an **Outline**. You can also further customize the data points. For example, you can choose the layer type as **Point** and then **Fill Color** based on any column of your dataset. You can also change the radius of the points. 

The following image shows the layers panel supported by SageMaker geospatial capabilities.

![\[The layers panel with data points on a USA map, supported by SageMaker geospatial capabilities.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geospatial_vis_layer.png)


**Columns**

You can view the columns present in your dataset by using the **Columns** tab in the left navigation panel.

**Filters**

You can use filters to limit the data points that display on the map.

**Interactions**

In the **Interactions** panel, you can customize how you interact with the map. For example, you can choose what metrics to display when you hover the tooltip over a data point.

**Base map**

Currently, SageMaker AI only supports the Amazon Dark base map.

**Split Map Modes**

You can have a **Single Map**, **Dual Maps** or **Swipe Maps**. With **Dual Maps**, you can compare the same map side-by-side using different layers. Use **Swipe Maps** to overlay two maps on each other and use the sliding separator to compare them. You can choose the split map mode by choosing the **Split Mode** button on the top right corner of your map.

## Legends for EOJ in the SageMaker geospatial UI
<a name="geo-legends-eoj"></a>

The output visualization of an EOJ depends on the operation you choose to create it. The legend is based on the default color scale. You can view the legend by choosing the **Show legend** button on the top right corner of your map.

**Spectral Index**

When you visualize the output for an EOJ that uses the spectral index operation, you can map the category based on the color from the legend as shown.

![\[The legend for spectral index mapping.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geo_spectral_index.png)


**Cloud Masking**

When you visualize the output for an EOJ that uses the cloud masking operation, you can map the category based on the color from the legend as shown.

![\[The legend for cloud masking mapping.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geo_cloud_masking.png)


**Land Cover Segmentation**

When you visualize the output for an EOJ that uses the Land Cover Segmentation operation, you can map the category based on the color from the legend as shown.

![\[The legend for land cover segmentation mapping.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/geo_landcover_ss.png)


# Amazon SageMaker geospatial Map SDK
<a name="geospatial-notebook-sdk"></a>

You can use Amazon SageMaker geospatial capabilities to visualize maps within the SageMaker geospatial UI as well as SageMaker notebooks with a geospatial image. These visualizations are supported by the map visualization library called [Foursquare Studio](https://studio.foursquare.com/home)

You can use the APIs provided by the SageMaker geospatial map SDK to visualize your geospatial data, including the input, output, and AoI for EOJ.

**Topics**
+ [add\$1dataset API](#geo-add-dataset)
+ [update\$1dataset API](#geo-update-dataset)
+ [add\$1layer API](#geo-add-layer)
+ [update\$1layer API](#geo-update-layer)
+ [visualize\$1eoj\$1aoi API](#geo-visualize-eoj-aoi)
+ [visualize\$1eoj\$1input API](#geo-visualize-eoj-input)
+ [visualize\$1eoj\$1output API](#geo-visualize-eoj-output)

## add\$1dataset API
<a name="geo-add-dataset"></a>

Adds a raster or vector dataset object to the map.

**Request syntax**

```
Request = 
    add_dataset(
      self,
      dataset: Union[Dataset, Dict, None] = None,
      *,
      auto_create_layers: bool = True,
      center_map: bool = True,
      **kwargs: Any,
    ) -> Optional[Dataset]
```

**Request parameters**

The request accepts the following parameters.

Positional arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `dataset` | Union[Dataset, Dict, None] | Data used to create a dataset, in CSV, JSON, or GeoJSON format (for local datasets) or a UUID string. | 

Keyword arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `auto_create_layers` | Boolean | Whether to attempt to create new layers when adding a dataset. Default value is `False`. | 
| `center_map` | Boolean | Whether to center the map on the created dataset. Default value is `True`. | 
| `id` | String | Unique identifier of the dataset. If you do not provide it, a random ID is generated. | 
| `label` | String | Dataset label which is displayed. | 
| `color` | Tuple[float, float, float] | Color label of the dataset. | 
| `metadata` | Dictionary | Object containing tileset metadata (for tiled datasets). | 

**Response**

This API returns the [Dataset](https://location.foursquare.com/developer/docs/studio-map-sdk-types#dataset) object that was added to the map.

## update\$1dataset API
<a name="geo-update-dataset"></a>

Updates an existing dataset's settings.

**Request syntax**

```
Request = 
    update_dataset(
    self,
    dataset_id: str,
    values: Union[_DatasetUpdateProps, dict, None] = None,
    **kwargs: Any,
) -> Dataset
```

**Request parameters**

The request accepts the following parameters.

Positional arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `dataset_id` | String | The identifier of the dataset to be updated. | 
| `values` | Union[[\$1DatasetUpdateProps](https://location.foursquare.com/developer/docs/studio-map-sdk-types#datasetupdateprops), dict, None] | The values to update. | 

Keyword arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `label` | String | Dataset label which is displayed. | 
| `color` | [RGBColor](https://location.foursquare.com/developer/docs/studio-map-sdk-types#rgbcolor) | Color label of the dataset. | 

**Response**

This API returns the updated dataset object for interactive maps, or `None` for non-interactive HTML environments. 

## add\$1layer API
<a name="geo-add-layer"></a>

Adds a new layer to the map. This function requires at least one valid layer configuration.

**Request syntax**

```
Request = 
    add_layer(
    self,
    layer: Union[LayerCreationProps, dict, None] = None,
    **kwargs: Any
) -> Layer
```

**Request parameters**

The request accepts the following parameters.

Arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `layer` | Union[[LayerCreationProps](https://location.foursquare.com/developer/docs/studio-map-sdk-types#layercreationprops), dict, None] | A set of properties used to create a layer. | 

**Response**

The layer object that was added to the map.

## update\$1layer API
<a name="geo-update-layer"></a>

Update an existing layer with given values.

**Request syntax**

```
Request = 
    update_layer(
  self,
  layer_id: str,
  values: Union[LayerUpdateProps, dict, None],
  **kwargs: Any
) -> Layer
```

**Request parameters**

The request accepts the following parameters.

Arguments


| Positional argument |  Type  |  Description  | 
| --- | --- | --- | 
| `layer_id` | String | The ID of the layer to be updated. | 
| `values` | Union[[LayerUpdateProps](https://location.foursquare.com/developer/docs/studio-map-sdk-types#layerupdateprops), dict, None] | The values to update. | 

Keyword arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `type` | [LayerType](https://location.foursquare.com/developer/docs/studio-map-sdk-types#layertype) | The type of layer. | 
| `data_id` | String | Unique identifier of the dataset this layer visualizes. | 
| `fields` | Dict [string, Optional[string]] | Dictionary that maps fields that the layer requires for visualization to appropriate dataset fields. | 
| `label` | String | Canonical label of this layer. | 
| `is_visible` | Boolean | Whether the layer is visible or not. | 
| `config` | [LayerConfig](https://location.foursquare.com/developer/docs/studio-map-sdk-types#layerconfig) | Layer configuration specific to its type.  | 

**Response**

Returns the updated layer object.

## visualize\$1eoj\$1aoi API
<a name="geo-visualize-eoj-aoi"></a>

Visualize the AoI of the given job ARN.

**Request parameters**

The request accepts the following parameters.

Arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
|  `Arn`  |  String  |  The ARN of the job.  | 
|  `config`  |  Dictionary config = \$1 label: <string> custom label of the added AoI layer, default AoI \$1  |  An option to pass layer properties.  | 

**Response**

Reference of the added input layer object.

## visualize\$1eoj\$1input API
<a name="geo-visualize-eoj-input"></a>

Visualize the input of the given EOJ ARN.

**Request parameters**

The request accepts the following parameters.

Arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
| `Arn` | String | The ARN of the job. | 
| `time_range_filter` |  Dictionary time\$1range\$1filter = \$1 start\$1date: <string> date in ISO format end\$1date: <string> date in ISO format \$1  | An option to provide the start and end time. Defaults to the raster data collection search start and end date. | 
| `config` |  Dictionary config = \$1 label: <string> custom label of the added output layer, default Input \$1  | An option to pass layer properties. | 

**Response**

Reference of the added input layer object.

## visualize\$1eoj\$1output API
<a name="geo-visualize-eoj-output"></a>

Visualize the output of the given EOJ ARN.

**Request parameters**

The request accepts the following parameters.

Arguments


| Argument |  Type  |  Description  | 
| --- | --- | --- | 
|  `Arn`  |  String  |  The ARN of the job.  | 
|  `time_range_filter`  |  Dictionary time\$1range\$1filter = \$1 start\$1date: <string> date in ISO format end\$1date: <string> date in ISO format \$1  | An option to provide the start and end time. Defaults to the raster data collection search start and end date. | 
| `config` |  Dictionary config = \$1 label: <string> custom label of the added output layer, default Output preset: <string> singleBand or trueColor, band\$1name: <string>, only required for 'singleBand' preset. Allowed bands for a EOJ \$1  | An option to pass layer properties. | 

**Response**

Reference of the added output Layer object.

To learn more about visualizing your geospatial data, refer to [Visualization Using Amazon SageMaker geospatial](https://docs.aws.amazon.com/sagemaker/latest/dg/geospatial-visualize.html).

# SageMaker geospatial capabilities FAQ
<a name="geospatial-faq"></a>

Use the following FAQ items to find answers to commonly asked questions about SageMaker geospatial capabilities.

1. **What regions are Amazon SageMaker geospatial capabilities available in?**

   Currently, SageMaker geospatial capabilities are only supported in the US West (Oregon) Region. To view SageMaker geospatial, choose the name of the currently displayed Region in the navigation bar of the console. Then choose the US West (Oregon) Region.

1. ** What AWS Identity and Access Management permissions and policies are required to use SageMaker geospatial?**

   To use SageMaker geospatial you need a user, group, or role that can access SageMaker AI. You also need to create a SageMaker AI execution role so that SageMaker geospatial can perform operations on your behalf. To learn more, see [SageMaker geospatial capabilities roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-geospatial-roles.html).

1. **I have an existing SageMaker AI execution role. Do I need to update it?**

   Yes. To use SageMaker geospatial you must specify an additional service principal in your IAM trust policy: `sagemaker-geospatial.amazonaws.com`. To learn about specifying a service principal in a trust relationship, see [Adding  the SageMaker geospatial service principal to an existing SageMaker AI execution role](sagemaker-geospatial-roles-pass-role.md) in the *Amazon SageMaker AI Developer Guide*.

1. **Can I use SageMaker geospatial capabilities through my VPC environment?**

   Yes, you can use SageMaker geospatial through a VPN. To learn more, see [Use Amazon SageMaker geospatial capabilities in Your Amazon Virtual Private Cloud](geospatial-notebooks-and-internet-access-vpc-requirements.md).

1. **Why can't I see the SageMaker geospatial map visualizer, image or instance type when I navigate to Amazon SageMaker Studio Classic?**

   Verify that you are launching Amazon SageMaker Studio Classic in the US West (Oregon) Region and that you are not using a shared space.

1. **Why can't I see the SageMaker geospatial image or instance type when I try to create a notebook instance in Studio Classic?**

   Verify that you are launching Amazon SageMaker Studio Classic in the US West (Oregon) Region and that you are not using a shared space. To learn more, see [Create an Amazon SageMaker Studio Classic notebook using the geospatial image](geospatial-launch-notebook.md).

1. **What bands supported for various raster data collections?**

   Use the `GetRasterDataCollection` API response and refer to the `ImageSourceBands` field to find the bands supported for that particular data collection.

# SageMaker geospatial Security and Permissions
<a name="geospatial-security-general"></a>

Use the topics on this page to learn about SageMaker geospatial capabilities security features. Additionally, learn how to use SageMaker geospatial capabilities in an Amazon Virtual Private Cloud as well as protect your data at rest using encryption.

For more information about IAM users and roles, see [Identities (Users, Groups, and Roles)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id.html) in the IAM User Guide. 

To learn more about using IAM with SageMaker AI, see [AWS Identity and Access Management for Amazon SageMaker AI](security-iam.md).

**Topics**
+ [Configuration and Vulnerability Analysis in SageMaker geospatial](geospatial-config-vulnerability.md)
+ [Security Best Practices for SageMaker geospatial capabilities](geospatial-sec-best-practices.md)
+ [Use Amazon SageMaker geospatial capabilities in Your Amazon Virtual Private Cloud](geospatial-notebooks-and-internet-access-vpc-requirements.md)
+ [Use AWS KMS Permissions for Amazon SageMaker geospatial capabilities](geospatial-kms.md)

# Configuration and Vulnerability Analysis in SageMaker geospatial
<a name="geospatial-config-vulnerability"></a>

Configuration and IT controls are a shared responsibility between AWS and you, our customer. AWS handles basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. These procedures have been reviewed and certified by the appropriate third parties. For more details, see the following resources: 
+ [Shared Responsibility Model](https://aws.amazon.com/compliance/shared-responsibility-model/).
+ [Amazon Web Services: Overview of Security Processes](https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Whitepaper.pdf).

# Security Best Practices for SageMaker geospatial capabilities
<a name="geospatial-sec-best-practices"></a>

Amazon SageMaker geospatial capabilities provide a number of security features to consider as you develop and implement your own security policies. The following best practices are general guidelines and don't represent a complete security solution. Because these best practices might not be appropriate or sufficient for your environment, treat them as helpful considerations rather than prescriptions.
<a name="geospatial-least-privilege"></a>
**Apply principle of least privilege**  
Amazon SageMaker geospatial capabilities provide granular access policy for applications using IAM roles. We recommend that the roles be granted only the minimum set of privileges required by the job. We also recommend auditing the jobs for permissions on a regular basis and upon any change to your application.
<a name="geospatial-role-access"></a>
**Role-based access control (RBAC) permissions**  
Administrators should strictly control Role-based access control (RBAC) permissions for Amazon SageMaker geospatial capabilities.
<a name="geospatial-temp-creditentials"></a>
**Use temporary credentials whenever possible**  
Where possible, use temporary credentials instead of long-term credentials, such as access keys. For scenarios in which you need IAM users with programmatic access and long-term credentials, we recommend that you rotate access keys. Regularly rotating long-term credentials helps you familiarize yourself with the process. This is useful in case you are ever in a situation where you must rotate credentials, such as when an employee leaves your company. We recommend that you use IAM access last used information to rotate and remove access keys safely. For more information, see [Rotating access keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_RotateAccessKey) and [Security best practices in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html).
<a name="geospatial-cloudtrail-log"></a>
**Use AWS CloudTrail to view and log API calls**  
AWS CloudTrail tracks anyone making API calls in your AWS account. API calls are logged whenever anyone uses the Amazon SageMaker geospatial capabilities API, the Amazon SageMaker geospatial capabilities console or Amazon SageMaker geospatial capabilities AWS CLI commands. Enable logging and specify an Amazon S3 bucket to store the logs.

Your trust, privacy, and the security of your content are our highest priorities. We implement responsible and sophisticated technical and physical controls designed to prevent unauthorized access to, or disclosure of, your content and ensure that our use complies with our commitments to you. For more information, see [AWS Data Privacy FAQ](https://aws.amazon.com/compliance/data-privacy-faq/).

# Use Amazon SageMaker geospatial capabilities in Your Amazon Virtual Private Cloud
<a name="geospatial-notebooks-and-internet-access-vpc-requirements"></a>

The following topic gives information on how to use SageMaker notebooks with a SageMaker geospatial image in a Amazon SageMaker AI domain with VPC only mode. For more information on VPCs in Amazon SageMaker Studio Classic see [Choose an Amazon VPC](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-vpc.html).

## `VPC only` communication with the internet
<a name="studio-notebooks-and-internet-access-vpc-geospatial"></a>

By default, SageMaker AI domain uses two Amazon VPC. One of the Amazon VPC is managed by Amazon SageMaker AI and provides direct internet access. You specify the other Amazon VPC, which provides encrypted traffic between the domain and your Amazon Elastic File System (Amazon EFS) volume.

You can change this behavior so that SageMaker AI sends all traffic over your specified Amazon VPC. If `VPC only` has been choosen as the network access mode during the SageMaker AI domain creation, the following requirements need to be considered to still allow usage of SageMaker Studio Classic notebooks within the created SageMaker AI domain.

## Requirements to use `VPC only` mode
<a name="studio-notebooks-and-internet-access-vpc-geospatial-requirements"></a>

**Note**  
In order to use the visualization components of SageMaker geospatial capabilities, the browser you use to access the SageMaker Studio Classic UI needs to be connected to the internet.

When you choose `VpcOnly`, follow these steps:

1. You must use private subnets only. You cannot use public subnets in `VpcOnly` mode.

1. Ensure your subnets have the required number of IP addresses needed. The expected number of IP addresses needed per user can vary based on use case. We recommend between 2 and 4 IP addresses per user. The total IP address capacity for a Studio Classic domain is the sum of available IP addresses for each subnet provided when the domain is created. Ensure that your estimated IP address usage does not exceed the capacity supported by the number of subnets you provide. Additionally, using subnets distributed across many availability zones can aid in IP address availability. For more information, see [VPC and subnet sizing for IPv4](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html#vpc-sizing-ipv4).
**Note**  
You can configure only subnets with a default tenancy VPC in which your instance runs on shared hardware. For more information on the tenancy attribute for VPCs, see [Dedicated Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated-instance.html).

1. Set up one or more security groups with inbound and outbound rules that together allow the following traffic:
   + [NFS traffic over TCP on port 2049](https://docs.aws.amazon.com/efs/latest/ug/network-access.html) between the domain and the Amazon EFS volume.
   + [TCP traffic within the security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-rules-reference.html#sg-rules-other-instances). This is required for connectivity between the JupyterServer app and the KernelGateway apps. You must allow access to at least ports in the range `8192-65535`.

1. If you want to allow internet access, you must use a [NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html#nat-gateway-working-with) with access to the internet, for example through an [internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html).

1. If you don't want to allow internet access, [create interface VPC endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/vpce-interface.html) (AWS PrivateLink) to allow Studio Classic to access the following services with the corresponding service names. You must also associate the security groups for your VPC with these endpoints.
**Note**  
Currently, SageMaker geospatial capabilities are only supported in the US West (Oregon) Region.
   + SageMaker API : `com.amazonaws.us-west-2.sagemaker.api` 
   + SageMaker AI runtime: `com.amazonaws.us-west-2.sagemaker.runtime`. This is required to run Studio Classic notebooks with a SageMaker geospatial image.
   + Amazon S3: `com.amazonaws.us-west-2.s3`.
   + To use SageMaker Projects: `com.amazonaws.us-west-2.servicecatalog`.
   + SageMaker geospatial capabilities: `com.amazonaws.us-west-2.sagemaker-geospatial`

    If you use the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) to run remote training jobs, you must also create the following Amazon VPC endpoints.
   + AWS Security Token Service: `com.amazonaws.region.sts`
   + Amazon CloudWatch: `com.amazonaws.region.logs`. This is required to allow SageMaker Python SDK to get the remote training job status from Amazon CloudWatch.

**Note**  
For a customer working within VPC mode, company firewalls can cause connection issues with SageMaker Studio Classic or between JupyterServer and the KernelGateway. Make the following checks if you encounter one of these issues when using SageMaker Studio Classic from behind a firewall.  
Check that the Studio Classic URL is in your networks allowlist.
Check that the websocket connections are not blocked. Jupyter uses websocket under the hood. If the KernelGateway application is InService, JupyterServer may not be able to connect to the KernelGateway. You should see this problem when opening System Terminal as well. 

# Use AWS KMS Permissions for Amazon SageMaker geospatial capabilities
<a name="geospatial-kms"></a>

You can protect your data at rest using encryption for SageMaker geospatial capabilities. By default, it uses server-side encryption with an Amazon SageMaker geospatial owned key. SageMaker geospatial capabilities also supports an option for server-side encryption with a customer managed KMS key.

## Server-Side Encryption with Amazon SageMaker geospatial managed key (Default)
<a name="geospatial-managed-key"></a>

SageMaker geospatial capabilities encrypts all your data, including computational results from your Earth Observation jobs (EOJ) and Vector Enrichment jobs (VEJ) along with all your service metadata. There is no data that is stored within SageMaker geospatial capabilities unencrypted. It uses a default AWS owned key to encrypt all your data.

## Server-Side Encryption with customer managed KMS key (Optional)
<a name="geospatial-customer-managed-key"></a>

SageMaker geospatial capabilities supports the use of a symmetric customer managed key that you create, own, and manage to add a second layer of encryption over the existing AWS owned encryption. Because you have full control of this layer of encryption, you can perform such tasks as:
+ Establishing and maintaining key policies
+ Establishing and maintaining IAM policies and grants
+ Enabling and disabling key policies
+ Rotating key cryptographic material
+ Adding tags
+ Creating key aliases
+ Scheduling keys for deletion

For more information, see [Customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk) in the *AWS Key Management Service Developer Guide*.

## How SageMaker geospatial capabilities uses grants in AWS KMS
<a name="geospatial-grants-cmk"></a>

 SageMaker geospatial capabilities requires a grant to use your customer managed key. When you create an EOJ or an VEJ encrypted with a customer managed key, SageMaker geospatial capabilities creates a grant on your behalf by sending a `CreateGrant` request to AWS KMS. Grants in AWS KMS are used to give SageMaker geospatial capabilities access to a KMS key in a customer account. You can revoke access to the grant, or remove the service's access to the customer managed key at any time. If you do, SageMaker geospatial capabilities won't be able to access any of the data encrypted by the customer managed key, which affects operations that are dependent on that data. 

## Create a customer managed key
<a name="geospatial-create-cmk"></a>

You can create a symmetric customer managed key by using the AWS Management Console, or the AWS KMS APIs.

**To create a symmetric customer managed key**

Follow the steps for [Creating symmetric encryption KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html#create-symmetric-cmk) in the AWS Key Management Service Developer Guide.

**Key policy**

Key policies control access to your customer managed key. Every customer managed key must have exactly one key policy, which contains statements that determine who can use the key and how they can use it. When you create your customer managed key, you can specify a key policy. For more information, see [Determining access to AWS KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/determining-access.html) in the *AWS Key Management Service Developer Guide*.

To use your customer managed key with your SageMaker geospatial capabilities resources, the following API operations must be permitted in the key policy. The principal for these operations should be the Execution Role you provide in the SageMaker geospatial capabilities request. SageMaker geospatial capabilities assumes the provided Execution Role in the request to perform these KMS operations.
+ `[kms:CreateGrant](https://docs.aws.amazon.com/kms/latest/APIReference/API_CreateGrant.html)`
+ `kms:GenerateDataKey`
+ `kms:Decrypt`
+ `kms:GenerateDataKeyWithoutPlaintext`

The following are policy statement examples you can add for SageMaker geospatial capabilities:

**CreateGrant**

```
"Statement" : [ 
    {
      "Sid" : "Allow access to Amazon SageMaker geospatial capabilities",
      "Effect" : "Allow",
      "Principal" : {
        "AWS" : "<Customer provided Execution Role ARN>"
      },
      "Action" : [ 
          "kms:CreateGrant",
           "kms:Decrypt",
           "kms:GenerateDataKey",
           "kms:GenerateDataKeyWithoutPlaintext"
      ],
      "Resource" : "*",
    },
 ]
```

For more information about specifying permissions in a policy, see [AWS KMS permissions](https://docs.aws.amazon.com/kms/latest/developerguide/kms-api-permissions-reference.html) in the *AWS Key Management Service Developer Guide*. For more information about troubleshooting, see [Troubleshooting key access](https://docs.aws.amazon.com/kms/latest/developerguide/policy-evaluation.html) in the *AWS Key Management Service Developer Guide*. 

If your key policy does not have your account root as key administrator, you need to add the same KMS permissions on your execution role ARN. Here is a sample policy you can add to the execution role:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": [
                "kms:CreateGrant",
                "kms:Decrypt",
                "kms:GenerateDataKey",
                "kms:GenerateDataKeyWithoutPlaintext"
            ],
            "Resource": [
              "arn:aws:kms:us-east-1:111122223333:key/key-id"
            ],
            "Effect": "Allow"
        }
    ]
}
```

------

## Monitoring your encryption keys for SageMaker geospatial capabilities
<a name="geospatial-monitor-cmk"></a>

When you use an AWS KMS customer managed key with your SageMaker geospatial capabilities resources, you can use AWS CloudTrail or Amazon CloudWatch Logs to track requests that SageMaker geospatial sends to AWS KMS.

Select a tab in the following table to see examples of AWS CloudTrail events to monitor KMS operations called by SageMaker geospatial capabilities to access data encrypted by your customer managed key.

------
#### [ CreateGrant ]

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROAIGDTESTANDEXAMPLE:SageMaker-Geospatial-StartEOJ-KMSAccess",
        "arn": "arn:aws:sts::111122223333:assumed-role/SageMakerGeospatialCustomerRole/SageMaker-Geospatial-StartEOJ-KMSAccess",
        "accountId": "111122223333",
        "accessKeyId": "AKIAIOSFODNN7EXAMPLE3",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AKIAIOSFODNN7EXAMPLE3",
                "arn": "arn:aws:sts::111122223333:assumed-role/SageMakerGeospatialCustomerRole",
                "accountId": "111122223333",
                "userName": "SageMakerGeospatialCustomerRole"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2023-03-17T18:02:06Z",
                "mfaAuthenticated": "false"
            }
        },
        "invokedBy": "arn:aws:iam::111122223333:root"
    },
    "eventTime": "2023-03-17T18:02:06Z",
    "eventSource": "kms.amazonaws.com",
    "eventName": "CreateGrant",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "172.12.34.56",
    "userAgent": "ExampleDesktop/1.0 (V1; OS)",
    "requestParameters": {
        "retiringPrincipal": "sagemaker-geospatial.us-west-2.amazonaws.com",
        "keyId": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE",
        "operations": [
            "Decrypt"
        ],
        "granteePrincipal": "sagemaker-geospatial.us-west-2.amazonaws.com"
    },
    "responseElements": {
        "grantId": "0ab0ac0d0b000f00ea00cc0a0e00fc00bce000c000f0000000c0bc0a0000aaafSAMPLE",
        "keyId": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
    },
    "requestID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "eventID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "readOnly": false,
    "resources": [
        {
            "accountId": "111122223333",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management"
}
```

------
#### [ GenerateDataKey ]

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AWSService",
        "invokedBy": "sagemaker-geospatial.amazonaws.com"
    },
    "eventTime": "2023-03-24T00:29:45Z",
    "eventSource": "kms.amazonaws.com",
    "eventName": "GenerateDataKey",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "sagemaker-geospatial.amazonaws.com",
    "userAgent": "sagemaker-geospatial.amazonaws.com",
    "requestParameters": {
        "encryptionContext": {
            "aws:s3:arn": "arn:aws:s3:::axis-earth-observation-job-378778860802/111122223333/napy9eintp64/output/consolidated/32PPR/2022-01-04T09:58:03Z/S2B_32PPR_20220104_0_L2A_msavi.tif"
        },
        "keyId": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE",
        "keySpec": "AES_256"
    },
    "responseElements": null,
    "requestID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "eventID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "readOnly": true,
    "resources": [
        {
            "accountId": "111122223333",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management"
}
```

------
#### [ Decrypt ]

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AWSService",
        "invokedBy": "sagemaker-geospatial.amazonaws.com"
    },
    "eventTime": "2023-03-28T22:04:24Z",
    "eventSource": "kms.amazonaws.com",
    "eventName": "Decrypt",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "sagemaker-geospatial.amazonaws.com",
    "userAgent": "sagemaker-geospatial.amazonaws.com",
    "requestParameters": {
        "encryptionAlgorithm": "SYMMETRIC_DEFAULT",
        "encryptionContext": {
            "aws:s3:arn": "arn:aws:s3:::axis-earth-observation-job-378778860802/111122223333/napy9eintp64/output/consolidated/32PPR/2022-01-04T09:58:03Z/S2B_32PPR_20220104_0_L2A_msavi.tif"
        },
    },
    "responseElements": null,
    "requestID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "eventID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "readOnly": true,
    "resources": [
        {
            "accountId": "111122223333",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management"
}
```

------
#### [ GenerateDataKeyWithoutPlainText ]

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROAIGDTESTANDEXAMPLE:SageMaker-Geospatial-StartEOJ-KMSAccess",
        "arn": "arn:aws:sts::111122223333:assumed-role/SageMakerGeospatialCustomerRole/SageMaker-Geospatial-StartEOJ-KMSAccess",
        "accountId": "111122223333",
        "accessKeyId": "AKIAIOSFODNN7EXAMPLE3",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AKIAIOSFODNN7EXAMPLE3",
                "arn": "arn:aws:sts::111122223333:assumed-role/SageMakerGeospatialCustomerRole",
                "accountId": "111122223333",
                "userName": "SageMakerGeospatialCustomerRole"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2023-03-17T18:02:06Z",
                "mfaAuthenticated": "false"
            }
        },
        "invokedBy": "arn:aws:iam::111122223333:root"
    },
    "eventTime": "2023-03-28T22:09:16Z",
    "eventSource": "kms.amazonaws.com",
    "eventName": "GenerateDataKeyWithoutPlaintext",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "172.12.34.56",
    "userAgent": "ExampleDesktop/1.0 (V1; OS)",
    "requestParameters": {
        "keySpec": "AES_256",
        "keyId": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
    },
    "responseElements": null,
    "requestID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "eventID": "ff000af-00eb-00ce-0e00-ea000fb0fba0SAMPLE",
    "readOnly": true,
    "resources": [
        {
            "accountId": "111122223333",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-123456SAMPLE"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management"
}
```

------

# Types of compute instances
<a name="geospatial-instances"></a>

SageMaker geospatial capabilities offer three types of compute instances.
+ **SageMaker Studio Classic geospatial notebook instances** – SageMaker geospatial supports both CPU and GPU-based notebook instances in Studio Classic. Notebook instances are used to build, train, and deploy ML models. For a list of available notebook instance types that work with the geospatial image, see [Supported notebook instance types](#supported-geospatial-instances). 
+ **SageMaker geospatial jobs instances** – Run processing jobs to transform satellite image data.
+ **SageMaker geospatial model inference types** – Make predictions by using pre-trained ML models on satellite imagery.

The instance type is determined by the operations that you run.

The following table shows the available SageMaker geospatial specific operations and  instance types that you can use.


|  Operations  |  Instance  | 
| --- | --- | 
| Temporal Statistics | ml.geospatial.jobs | 
| Zonal Statistics | ml.geospatial.jobs | 
| Resampling | ml.geospatial.jobs | 
| Geomosaic | ml.geospatial.jobs | 
| Band Stacking | ml.geospatial.jobs | 
| Band Math | ml.geospatial.jobs | 
| Cloud Removal with Landsat8 | ml.geospatial.jobs | 
| Cloud Removal with Sentinel-2 | ml.geospatial.models | 
| Cloud Masking | ml.geospatial.models | 
| Land Cover Segmentation | ml.geospatial.models | 

## SageMaker geospatial supported notebook instance types
<a name="notebook-instances"></a>

SageMaker geospatial supports both CPU and GPU-based notebook instances in Studio Classic. If when starting a GPU enabled notebook instance you receive a ResourceLimitExceeded error, you need to request a quota increase. To get started on a Service Quotas quota increase request, see [Requesting a quota increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide*.

Supported Studio Classic notebook instance types


|  Name  |  Instance type  | 
| --- | --- | 
| ml.geospatial.interactive | CPU | 
| ml.g5.xlarge | GPU | 
| ml.g5.2xlarge | GPU | 
| ml.g5.4xlarge | GPU | 
| ml.g5.8xlarge | GPU | 
| ml.g5.16xlarge | GPU | 
| ml.g5.12xlarge | GPU | 
| ml.g5.24xlarge | GPU | 
| ml.g5.48xlarge | GPU | 

You are charged different rates for each type of compute instance that you use. For more information about pricing, see [Geospatial ML with Amazon SageMaker AI](https://aws.amazon.com/sagemaker/geospatial).

## SageMaker geospatial libraries
<a name="geospatial-notebook-libraries"></a>

The SageMaker geospatial specific **Instance type**, **ml.geospatial.interactive** contains the following Python libraries.

Geospatial libraries available on the geospatial instance type


|  Library name  |  Version available  | 
| --- | --- | 
| numpy | 1.23.4 | 
| scipy | 1.11.2 | 
| pandas | 1.4.4 | 
| gdal | 3.2.2 | 
| fiona | 1.8.22 | 
| geopandas | 0.11.1 | 
| shapley | 1.8.4 | 
| seaborn | 0.11.2 | 
| notebook | 1.8.22 | 
| scikit-image | 0.11.2 | 
| rasterio | 6.4.12 | 
| scikit-learn | 0.19.2 | 
| ipyleaflet | 1.0.1 | 
| rtree | 0.17.2 | 
| opencv | 4.6.0.66 | 
| supy | 2022.4.7 | 
| SNAP toolbox | 9.0 | 
| cdsapi | 0.6.1 | 
| arosics | 1.8.1 | 
| rasterstats | 0.18.0 | 
| rioxarray | 0.14.1 | 
| pyroSAR | 0.20.0 | 
| eo-learn | 1.4.1 | 
| deepforest | 1.2.7 | 
| scrapy | 2.8.0 | 
| netCDF4 | 1.6.3 | 
| xarray[complete] | 0.20.1 | 
| Orfeotoolbox | OTB-8.1.1 | 
| pytorch | 2.0.1 | 
| pytorch-cuda | 11.8 | 
| torchvision | 0.15.2 | 
| torchaudio | 2.0.2 | 
| pytorch-lightning | 2.0.6 | 
| tensorflow | 2.13.0 | 

# Data collections
<a name="geospatial-data-collections"></a>

Amazon SageMaker geospatial supports the following raster data collections. Of the following data collections, you can use the  USGS Landsat and the Sentinel-2 Cloud-Optimized GeoTIFF data collections when starting an Earth Observation Job (EOJ). To learn more about the EOJs, see [Earth Observation Jobs](geospatial-eoj.md).
+ [Copernicus Digital Elevation Model (DEM) – GLO-30](https://registry.opendata.aws/copernicus-dem/)
+ [Copernicus Digital Elevation Model (DEM) – GLO-90](https://registry.opendata.aws/copernicus-dem/)
+ [https://registry.opendata.aws/sentinel-2-l2a-cogs/](https://registry.opendata.aws/sentinel-2-l2a-cogs/)
+ [https://registry.opendata.aws/sentinel-1/](https://registry.opendata.aws/sentinel-1/)
+ [National Agriculture Imagery Program (NAIP) on AWS](https://registry.opendata.aws/naip/)
+ [https://registry.opendata.aws/usgs-landsat/](https://registry.opendata.aws/usgs-landsat/)

To find the list of available raster data collections in your AWS Regions, use `ListRasterDataCollections`. In the [`ListRasterDataCollections` response](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_ListRasterDataCollections.html#API_geospatial_ListRasterDataCollections_ResponseSyntax), you get a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_ListRasterDataCollections.html#API_geospatial_ListRasterDataCollections_ResponseSyntax](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_geospatial_ListRasterDataCollections.html#API_geospatial_ListRasterDataCollections_ResponseSyntax) object that contains details about the available raster data collections.

**Example – Calling the `ListRasterDataCollections` API using the AWS SDK for Python (Boto3)**  <a name="list-raster-data-collections"></a>
When you use the SDK for Python (Boto3) and SageMaker geospatial, you must create a geospatial client, `geospatial_client`. Use the following Python snippet to make a call to the `list_raster_data_collections` API:  

```
import boto3
import sagemaker
import sagemaker_geospatial_map
import json 

## SageMaker Geospatial Capabilities is currently only avaialable in US-WEST-2  
session = boto3.Session(region_name='us-west-2')
execution_role = sagemaker.get_execution_role()

## Creates a SageMaker Geospatial client instance 
geospatial_client = session.client(service_name="sagemaker-geospatial")

# Creates a resusable Paginator for the list_raster_data_collections API operation 
paginator = geospatial_client.get_paginator("list_raster_data_collections")

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate()

# Use the iterator to iterate throught the results of list_raster_data_collections
results = []
for page in page_iterator:
	results.append(page['RasterDataCollectionSummaries'])

print (results)
```
In the JSON response, you will receive the following, which has been truncated for clarity:  

```
{
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:555555555555:raster-data-collection/public/dxxbpqwvu9041ny8",
    "Description": "Copernicus DEM is a Digital Surface Model which represents the surface of the Earth including buildings, infrastructure, and vegetation. GLO-30 is instance of Copernicus DEM that provides limited worldwide coverage at 30 meters.",
    "DescriptionPageUrl": "https://registry.opendata.aws/copernicus-dem/",
    "Name": "Copernicus DEM GLO-30",
    "Tags": {},
    "Type": "PUBLIC"
}
```

## Image band information from the USGS Landsat and Sentinel-2 data collections
<a name="image-band-information"></a>

Image band information from the USGS Landsat 8 and Sentinel-2 data collections are provided in the following table.

USGS Landsat


| Band name | Wave length range (nm) | Units | Valid range | Fill value | Spatial resolution | 
| --- | --- | --- | --- | --- | --- | 
| coastal | 435 - 451 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| blue | 452 - 512 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| green | 533 - 590 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| red | 636 - 673 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| nir | 851 - 879 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| swir16 | 1566 - 1651 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| swir22 | 2107 - 2294 | Unitless | 1 - 65455 | 0 (No Data) | 30m | 
| qa\$1aerosol | NA | Bit Index | 0 - 255 | 1 | 30m | 
| qa\$1pixel | NA | Bit Index | 1 - 65455 | 1 (bit 0) | 30m | 
| qa\$1radsat | NA | Bit Index | 1 - 65455 | NA | 30m | 
| t | 10600 - 11190 | Scaled Kelvin | 1 - 65455 | 0 (No Data) | 30m (scaled from 100m) | 
| atran | NA | Unitless | 0 - 10000 | -9999 (No Data) | 30m | 
| cdist | NA | Kilometers | 0 - 24000 | -9999 (No Data) | 30m | 
| drad | NA | W/(m^2 sr µm)/DN | 0 - 28000 | -9999 (No Data) | 30m | 
| urad | NA | W/(m^2 sr µm)/DN | 0 - 28000 | -9999 (No Data) | 30m | 
| trad | NA | W/(m^2 sr µm)/DN | 0 - 28000 | -9999 (No Data) | 30m | 
| emis | NA | Emissivity coefficient | 1 - 10000 | -9999 (No Data) | 30m | 
| emsd | NA | Emissivity coefficient | 1 - 10000 | -9999 (No Data) | 30m | 

Sentinel-2


| Band name | Wave length range (nm) | Scale | Valid range | Fill value | Spatial resolution | 
| --- | --- | --- | --- | --- | --- | 
| coastal | 443 | 0.0001 | NA | 0 (No Data) | 60m | 
| blue | 490 | 0.0001 | NA | 0 (No Data) | 10m | 
| green | 560 | 0.0001 | NA | 0 (No Data) | 10m | 
| red | 665 | 0.0001 | NA | 0 (No Data) | 10m | 
| rededge1 | 705 | 0.0001 | NA | 0 (No Data) | 20m | 
| rededge2 | 740 | 0.0001 | NA | 0 (No Data) | 20m | 
| rededge3 | 783 | 0.0001 | NA | 0 (No Data) | 20m | 
| nir | 842 | 0.0001 | NA | 0 (No Data) | 10m | 
| nir08 | 865 | 0.0001 | NA | 0 (No Data) | 20m | 
| nir08 | 865 | 0.0001 | NA | 0 (No Data) | 20m | 
| nir09 | 940 | 0.0001 | NA | 0 (No Data) | 60m | 
| swir16 | 1610 | 0.0001 | NA | 0 (No Data) | 20m | 
| swir22 | 2190 | 0.0001 | NA | 0 (No Data) | 20m | 
| aot | Aerosol optical thickness | 0.001 | NA | 0 (No Data) | 10m | 
| wvp | Scene-average water vapor | 0.001 | NA | 0 (No Data) | 10m | 
| scl | Scene classification data | NA | 1 - 11 | 0 (No Data) | 20m | 