

# Getting started with Amazon DataZone
<a name="getting-started"></a>

The information in this section helps you get started using Amazon DataZone. If you are new to Amazon DataZone, start by becoming familiar with the concepts and terminology presented in [Amazon DataZone terminology and concepts](datazone-concepts.md).

Before you begin the steps in either of these quickstart workflows, you must complete the procedures described in the [Setting Up](setting-up.md) section of this guide. If you are using a brand new AWS account, you must [configure permissions required to use the Amazon DataZone management console](create-iam-roles.md). If you are using an AWS account that has existing AWS Glue Data Catalog objects, you must also [configure Lake Formation permissions to Amazon DataZone](lake-formation-permissions-for-datazone.md). 

This getting started section takes you through the following Amazon DataZone quickstart workflows:

**Topics**
+ [

# Amazon DataZone quickstart with AWS Glue data
](quickstart-glue.md)
+ [

# Amazon DataZone quickstart with Amazon Redshift data
](quickstart-rs.md)
+ [

# Amazon DataZone quickstart with sample scripts
](quickstart-apis.md)

# Amazon DataZone quickstart with AWS Glue data
<a name="quickstart-glue"></a>

Complete the following quickstart steps to run through the complete data producer and data consumer workflows in Amazon DataZone with sample AWS Glue data. 

**Topics**
+ [

## Step 1 - Create the Amazon DataZone domain and data portal
](#create-domain-gs-glue)
+ [

## Step 2 - Create the publishing project
](#create-publishing-project-gs-glue)
+ [

## Step 3 - Create the environment
](#create-environment-gs-glue)
+ [

## Step 4 - Produce data for publishing
](#produce-data-for-publishing-gs-glue)
+ [

## Step 5 - Gather metadata from AWS Glue
](#gather-metadata-from-glue-gs-glue)
+ [

## Step 6 - Curate and publish the data asset
](#curate-data-asset-gs-glue)
+ [

## Step 7 - Create the project for data analysis
](#create-project-for-data-analysis-gs-glue)
+ [

## Step 8 - Create an environment for data analysis
](#create-environment-gs2-glue)
+ [

## Step 9 - Search the data catalog and subscribe to data
](#search-catalog-subscribe-gs-glue)
+ [

## Step 10 - Approve the subscription request
](#approve-subscription-request-gs-glue)
+ [

## Step 11 - Build a query and analyze data in Amazon Athena
](#analyze-data-gs-glue)

## Step 1 - Create the Amazon DataZone domain and data portal
<a name="create-domain-gs-glue"></a>

This section describes the steps of creating an Amazon DataZone domain and data portal for this workflow.

Complete the following procedure to create an Amazon DataZone domain. For more information about Amazon DataZone domains, see [Amazon DataZone terminology and concepts](datazone-concepts.md). 

1. Navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone), sign in, and then choose **Create domain**. 
**Note**  
If you want to use an existing Amazon DataZone domain for this workflow, choose **View domains**, then choose the domain that you want to use, and then proceed to Step 2 of creating a publishing project.

1. On the **Create domain** page, provide values for the following fields: 
   + **Name** - specify a name for your domain. For the purposes of this workflow, you can call this domain **Marketing**.
   + **Description** - specify an optional domain description.
   + **Data encryption** - your data is encrypted by default with a key that AWS owns and manages for you. For this use case, you can leave the default data encryption settings.

     For more information about using customer managed keys, see [Data encryption at rest for Amazon DataZone](encryption-rest-datazone.md). If you use your own KMS key for data encryption, you must include the following statement in your default [AmazonDataZoneDomainExecutionRole](AmazonDataZoneDomainExecutionRole.md).

------
#### [ JSON ]

****  

     ```
     {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Sid": "Statement1",
                 "Effect": "Allow",
                 "Action": [
                     "kms:Decrypt",
                     "kms:DescribeKey",
                     "kms:GenerateDataKey"
                 ],
                 "Resource": [
                     "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
                 ]
             }
         ]
     }
     ```

------
   + **Service access** - leave the selected by default **Use a default role** option unchanged.
**Note**  
If you are using an existing Amazon DataZone domain for this workflow, you can choose **Use an existing service role** option and then choose an existing role from the drop-down menu.
   + Under **Quick setup**, choose **Set up this account for data consumption and publishing**. This option enables the built-in Amazon DataZone blueprints of **Data lake** and **Data warehouse**, and configures the required permissions, resources, a default project, and default data lake and data warehouse environment profiles for this account. For more information about Amazon DataZone blueprints, see [Amazon DataZone terminology and concepts](datazone-concepts.md).
   + Keep the remaining fields under **Permissions details** unchanged. 
**Note**  
If you have an existing Amazon DataZone domain, you can choose the **Use an existing service role** option and then choose an existing role from the drop-down menu for the **Glue Manage Access role**, **Redshift Manage Access role**, and **Provisioning role**. 
   + Keep the fields under **Tags** unchanged.
   + Choose **Create domain**.

1. Once the domain is successfully created, choose this domain, and on the domain's summary page, note the **Data portal URL** for this domain. You can use this URL to access your Amazon DataZone data portal in order to complete the rest of the steps in this workflow. You can also navigate to the data portal by choosing **Open data portal**.

**Note**  
In the current release of Amazon DataZone, once the domain is created, the URL generated for the data portal cannot be modified.

Domain creation can take several minutes to complete. Wait for the domain to have a status of **Available** before proceeding to the next step.

## Step 2 - Create the publishing project
<a name="create-publishing-project-gs-glue"></a>

This section describes the steps required to create the publishing project for this workflow.

1. Once you complete Step 1 above and create a domain, you'll see the **Welcome to Amazon DataZone\$1** window. In this window, choose **Create project**.

1. Specify the project name, for example, for this workflow, you can name it **SalesDataPublishingProject**, then leave the rest of the fields unchanged, and then choose **Create**.

## Step 3 - Create the environment
<a name="create-environment-gs-glue"></a>

This section describes the steps required to create an environment for this workflow.

1. Once you complete Step 2 above and create your project, you'll see the **Your project is ready to use** window. In this window, choose **Create environment**.

1. On the **Create environment** page, specify the following and then choose **Create environment**.

1. Specify values for the following:
   + **Name** - specify the name for the environment. For this walkthrough, you can call it `Default data lake environment`.
   + **Description** - specify a description for the environment.
   + **Environment profile** - choose the **DataLakeProfile** environment profile. This enables you to use Amazon DataZone in this workflow to work with data in Amazon S3, AWS Glue Catalog, and Amazon Athena.
   + For this walkthrough, keep the rest of the fields unchanged.

1. Choose **Create environment**.

## Step 4 - Produce data for publishing
<a name="produce-data-for-publishing-gs-glue"></a>

This section describes the steps required to produce data for publishing in this workflow.

1. Once you complete step 3 above, in your `SalesDataPublishingProject` project, in the right-hand panel, under **Analytics tools**, choose **Amazon Athena**. This opens the Athena query editor using your project’s credentials for authentication. Make sure that your publishing environment is selected in the **Amazon DataZone environment** dropdown and the `<environment_name>%_pub_db` database is selected as in the query editor.

1. For this walkthrough, you are using the **Create Table as Select** (CTAS) query script to create a new table that you want to publish to Amazon DataZone. In your query editor, execute this CTAS script to create a `mkt_sls_table` table that you can publish and make available for search and subscription. 

   ```
   CREATE TABLE mkt_sls_table AS
   SELECT 146776932 AS ord_num, 23 AS sales_qty_sld, 23.4 AS wholesale_cost, 45.0 as lst_pr, 43.0 as sell_pr, 2.0 as disnt, 12 as ship_mode,13 as warehouse_id, 23 as item_id, 34 as ctlg_page, 232 as ship_cust_id, 4556 as bill_cust_id
   UNION ALL SELECT 46776931, 24, 24.4, 46, 44, 1, 14, 15, 24, 35, 222, 4551
   UNION ALL SELECT 46777394, 42, 43.4, 60, 50, 10, 30, 20, 27, 43, 241, 4565
   UNION ALL SELECT 46777831, 33, 40.4, 51, 46, 15, 16, 26, 33, 40, 234, 4563
   UNION ALL SELECT 46779160, 29, 26.4, 50, 61, 8, 31, 15, 36, 40, 242, 4562
   UNION ALL SELECT 46778595, 43, 28.4, 49, 47, 7, 28, 22, 27, 43, 224, 4555
   UNION ALL SELECT 46779482, 34, 33.4, 64, 44, 10, 17, 27, 43, 52, 222, 4556
   UNION ALL SELECT 46779650, 39, 37.4, 51, 62, 13, 31, 25, 31, 52, 224, 4551
   UNION ALL SELECT 46780524, 33, 40.4, 60, 53, 18, 32, 31, 31, 39, 232, 4563
   UNION ALL SELECT 46780634, 39, 35.4, 46, 44, 16, 33, 19, 31, 52, 242, 4557
   UNION ALL SELECT 46781887, 24, 30.4, 54, 62, 13, 18, 29, 24, 52, 223, 4561
   ```

   Make sure that the **mkt\$1sls\$1table** table is successfully created in the **Tables and views** section on the left-hand side. Now you have a data asset that can be published into the Amazon DataZone catalog.

## Step 5 - Gather metadata from AWS Glue
<a name="gather-metadata-from-glue-gs-glue"></a>

This section describes the step of gathering metadata from AWS Glue for this workflow.

1. Once you complete step 4 above, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project, then choose the **Data** tab, and then choose **Data sources** in the left-hand panel.

1. Choose the source that was created as part of the environment creation process. 

1. Choose **Run** next to the **Action** dropdown menu and then choose the refresh button. Once the data source run is complete, the assets are added to the Amazon DataZone inventory.

## Step 6 - Curate and publish the data asset
<a name="curate-data-asset-gs-glue"></a>

This section describes the steps of curating and publishing the data asset in this workflow.

1. Once you complete step 5 above, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project that you created in the previous step, choose the **Data** tab, choose **Inventory data** in the left-hand panel, and locate the `mkt_sls_table` table.

1. Open `mkt_sls_table` asset's details page to see the automatically generated business names. Choose the **Automatically generated metadata** icon to view the auto-generated names for asset and columns. You can either accept or reject each name individually or choose **Accept all** to apply the generated names. Optionally, you can also add the available metadata form to your asset and select glossary terms to classify your data.

1. Choose **Publish asset** to publish the `mkt_sls_table` asset.

## Step 7 - Create the project for data analysis
<a name="create-project-for-data-analysis-gs-glue"></a>

This section describes the steps of creating the project for data analysis. This is the beginning of the data consumer steps of this workflow.

1. Once you complete step 6 above, in the Amazon DataZone data portal, choose **Create project** from the **Project** drop-down menu.

1. On the **Create project** page, specify the project name, for example, for this workflow, you can name it **MarketingDataAnalysisProject**, then leave the rest of the fields unchanged, and then choose **Create**.

## Step 8 - Create an environment for data analysis
<a name="create-environment-gs2-glue"></a>

This section describes the steps of creating an environment for data analysis.

1. Once you complete step 7 above, in the Amazon DataZone data portal, choose the `MarketingDataAnalysisProject` project, then choose the **Environments** tab, and then choose **Create environment**.

1. On the **Create environment** page, specify the following and then choose **Create environment**.
   + **Name** - specify the name for the environment. For this walkthrough, you can call it `Default data lake environment`.
   + **Description** - specify a description for the environment.
   + **Environment profile** - choose the built-in **DataLakeProfile** environment profile.
   + For this walkthrough, keep the rest of the fields unchanged.

## Step 9 - Search the data catalog and subscribe to data
<a name="search-catalog-subscribe-gs-glue"></a>

This section describes the steps of searching the data catalog and subscribing to data.

1. Once you complete step 8 above, in the Amazon DataZone data portal, choose the Amazon DataZone icon, and in the Amazon DataZone **Search** field, search for data assets using keywords (e.g., 'catalog' or 'sales') in the data portal's **Search** bar. 

   If necessary, apply filters or sorting, and once you locate the **Product Sales Data** asset, you can choose it to open the asset's details page.

1. On the **Catalog Sales Data** asset's details page, choose **Subscribe**.

1. In the **Subscribe** dialog, choose your **MarketingDataAnalysisProject** consumer project from the dropdown, then specify the reason for your subscription request, and then choose **Subscribe**.

## Step 10 - Approve the subscription request
<a name="approve-subscription-request-gs-glue"></a>

This section describes the steps of approving the subscription request.

1. Once you complete step 9 above, in the Amazon DataZone data portal, choose the **SalesDataPublishingProject** project with which you published your asset.

1. Choose the **Data** tab, then **Published data**, and then chose **Incoming requests**.

1. Now you can see the row for the new request that needs an approval. Choose **View request**. Provide a reason for approval and choose **Approve**.

## Step 11 - Build a query and analyze data in Amazon Athena
<a name="analyze-data-gs-glue"></a>

Now that you have successfully published an asset to the Amazon DataZone catalog and subscribed to it, you can analyze it.

1. In the Amazon DataZone data portal, choose your **MarketingDataAnalysisProject** consumer project and then, from the right-hand panel, under **Analytics tools**, choose the **Query data** link with Amazon Athena. This opens the Amazon Athena query editor using your project’s credentials for authentication. Choose the **MarketingDataAnalysisProject** consumer environment from the **Amazon DataZone Environment** dropdown in the query editor and then choose your project's `<environment_name>%sub_db` from the database dropdown.

1. You can now run queries on the subscribed table. You can choose the table from **Tables and Views**, and then choose **Preview** to have the select statement on the editor screen. Run the query to see the results. 

# Amazon DataZone quickstart with Amazon Redshift data
<a name="quickstart-rs"></a>

Complete the following quickstart steps to run through the complete data producer and data consumer workflows in Amazon DataZone with sample Amazon Redshift data. 

**Topics**
+ [

## Step 1 - Create the Amazon DataZone domain and data portal
](#create-domain-gs-rs)
+ [

## Step 2 - Create the publishing project
](#create-publishing-project-gs-rs)
+ [

## Step 3 - Create the environment
](#create-environment-gs-rs)
+ [

## Step 4 - Produce data for publishing
](#produce-data-for-publishing-gs-rs)
+ [

## Step 5 - Gather metadata from Amazon Redshift
](#gather-metadata-from-glue-gs-rs)
+ [

## Step 6 - Curate and publish the data asset
](#curate-data-asset-gs-rs)
+ [

## Step 7 - Create the project for data analysis
](#create-project-for-data-analysis-gs-rs)
+ [

## Step 8 - Create an environment for data analysis
](#create-environment-gs2-rs)
+ [

## Step 9 - Search the data catalog and subscribe to data
](#search-catalog-subscribe-gs-rs)
+ [

## Step 10 - Approve the subscription request
](#approve-subscription-request-gs-rs)
+ [

## Step 11 - Build a query and analyze data in Amazon Redshift
](#analyze-data-gs-rs)

## Step 1 - Create the Amazon DataZone domain and data portal
<a name="create-domain-gs-rs"></a>

Complete the following procedure to create an Amazon DataZone domain. For more information about Amazon DataZone domains, see [Amazon DataZone terminology and concepts](datazone-concepts.md). 

1. Navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone), sign in, and then choose **Create domain**.
**Note**  
If you want to use an existing Amazon DataZone domain for this workflow, choose View domains, then choose the domain that you want to use, and then proceed to Step 2 of creating a publishing project.

1. On the **Create domain** page, provide values for the following fields: 
   + **Name** - specify a name for your domain. For the purposes of this workflow, you can call this domain `Marketing`.
   + **Description** - specify an optional domain description.
   + **Data encryption** - your data is encrypted by default with a key that AWS owns and manages for you. For this walkthrough, you can leave the default data encryption settings.

     For more information about using customer managed keys, see [Data encryption at rest for Amazon DataZone](encryption-rest-datazone.md). If you use your own KMS key for data encryption, you must include the following statement in your default [AmazonDataZoneDomainExecutionRole](AmazonDataZoneDomainExecutionRole.md).

------
#### [ JSON ]

****  

     ```
     {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Sid": "Statement1",
                 "Effect": "Allow",
                 "Action": [
                     "kms:Decrypt",
                     "kms:DescribeKey",
                     "kms:GenerateDataKey"
                 ],
                 "Resource": [
                     "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
                 ]
             }
         ]
     }
     ```

------
   + **Service access** - choose the **Use a custom service role** option and then choose the **AmazonDataZoneDomainExecutionRole** from the drop-down menu.
   + Under **Quick setup**, choose **Set up this account for data consumption and publishing**. This option enables the built-in Amazon DataZone blueprints of **Data lake** and **Data warehouse**, and configures the required permissions and resources to complete the rest of the steps in this workflow. For more information about Amazon DataZone blueprints, see [Amazon DataZone terminology and concepts](datazone-concepts.md).
   + Keep the remaining fields under **Permissions details** and **Tags** unchanged and then choose **Create domain**.

1. Once the domain is successfully created, choose this domain, and on the domain's summary page, note the **Data portal URL** for this domain. You can use this URL to access your Amazon DataZone data portal in order to complete the rest of the steps in this workflow.

**Note**  
In the current release of Amazon DataZone, once the domain is created, the URL generated for the data portal cannot be modified.

Domain creation can take several minutes to complete. Wait for the domain to have a status of **Available** before proceeding to the next step.

## Step 2 - Create the publishing project
<a name="create-publishing-project-gs-rs"></a>

The following section describes the steps of creating the publishing project in this workflow.

1. Once you complete Step 1, navigate to the Amazon DataZone data portal using the data portal URL and log in using your single sign-on (SSO) or AWS IAM credentials. 

1. Choose **Create project**, specify the project name, for example, for this workflow, you can name it **SalesDataPublishingProject**, then leave the rest of the fields unchanged, and then choose **Create**.

## Step 3 - Create the environment
<a name="create-environment-gs-rs"></a>

The following section describes the steps of creating an environment in this workflow.

1. Once you complete Step 2, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project that you created in the previous step, then choose the **Environments** tab, and then choose **Create environment**.

1. On the **Create environment** page, specify the following and then choose **Create environment**.
   + **Name** - specify the name for the environment. For this walkthrough, you can call it `Default data warehouse environment`.
   + **Description** - specify a description for the environment.
   + **Environment profile** - choose the **DataWarehouseProfile** environment profile.
   + Provide the name of your Amazon Redshift cluster, database name, and the secret ARN for the Amazon Redshift cluster where your data is stored. 
**Note**  
Make sure that your secret in AWS Secrets Manager includes the following tags (key/value):  
For Amazon Redshift cluster - datazone.rs.cluster: <cluster\$1name:database name>  
For Amazon Redshift Serverless workgroup - datazone.rs.workgroup: <workgroup\$1name:database\$1name>
AmazonDataZoneProject: <projectID> 
AmazonDataZoneDomain: <domainID>
For more information, see [Storing database credentials in AWS Secrets Manager](https://docs.aws.amazon.com//redshift/latest/mgmt/data-api-access.html#data-api-secrets).  
The database user you provide in the AWS Secrets Manager must have super user permissions.

## Step 4 - Produce data for publishing
<a name="produce-data-for-publishing-gs-rs"></a>

The following section describes the steps of producing data for publishing in this workflow.

1. Once you complete Step 3, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project, and then, in the right-hand panel, under **Analytics tools**, choose **Amazon Redshift**. This opens the Amazon Redshift query editor using your project’s credentials for authentication.

1. For this walkthrough, you are using the **Create Table as Select** (CTAS) query script to create a new table that you want to publish to Amazon DataZone. In your query editor, execute this CTAS script to create a `mkt_sls_table` table that you can publish and make available for search and subscription. 

   ```
   CREATE TABLE mkt_sls_table AS
   SELECT 146776932 AS ord_num, 23 AS sales_qty_sld, 23.4 AS wholesale_cost, 45.0 as lst_pr, 43.0 as sell_pr, 2.0 as disnt, 12 as ship_mode,13 as warehouse_id, 23 as item_id, 34 as ctlg_page, 232 as ship_cust_id, 4556 as bill_cust_id
   UNION ALL SELECT 46776931, 24, 24.4, 46, 44, 1, 14, 15, 24, 35, 222, 4551
   UNION ALL SELECT 46777394, 42, 43.4, 60, 50, 10, 30, 20, 27, 43, 241, 4565
   UNION ALL SELECT 46777831, 33, 40.4, 51, 46, 15, 16, 26, 33, 40, 234, 4563
   UNION ALL SELECT 46779160, 29, 26.4, 50, 61, 8, 31, 15, 36, 40, 242, 4562
   UNION ALL SELECT 46778595, 43, 28.4, 49, 47, 7, 28, 22, 27, 43, 224, 4555
   UNION ALL SELECT 46779482, 34, 33.4, 64, 44, 10, 17, 27, 43, 52, 222, 4556
   UNION ALL SELECT 46779650, 39, 37.4, 51, 62, 13, 31, 25, 31, 52, 224, 4551
   UNION ALL SELECT 46780524, 33, 40.4, 60, 53, 18, 32, 31, 31, 39, 232, 4563
   UNION ALL SELECT 46780634, 39, 35.4, 46, 44, 16, 33, 19, 31, 52, 242, 4557
   UNION ALL SELECT 46781887, 24, 30.4, 54, 62, 13, 18, 29, 24, 52, 223, 4561
   ```

   Make sure that the **mkt\$1sls\$1table** table is successfully created. Now you have a data asset that can be published into the Amazon DataZone catalog.

## Step 5 - Gather metadata from Amazon Redshift
<a name="gather-metadata-from-glue-gs-rs"></a>

The following section describes the steps of gathering metadata from Amazon Redshift.

1. Once you complete Step 4, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project, then choose the **Data** tab, and then choose **Data sources**.

1. Choose the source that was created as part of the environment creation process. 

1. Choose **Run** next to the **Action** dropdown menu and then choose the refresh button. Once the data source run is complete, the assets are added to the Amazon DataZone inventory.

## Step 6 - Curate and publish the data asset
<a name="curate-data-asset-gs-rs"></a>

The following section describes the steps of curating and publishing the data asset in this workflow.

1. Once you complete step 5, in the Amazon DataZone data portal, choose the `SalesDataPublishingProject` project, then choose the **Data** tab, choose **Inventory data**, and locate the `mkt_sls_table` table.

1. Open `mkt_sls_table` asset's details page to see the automatically generated business names. Choose the **Automatically generated metadata** icon to view the auto-generated names for asset and columns. You can either accept or reject each name individually or choose **Accept all** to apply the generated names. Optionally, you can also add the available metadata form to your asset and select glossary terms to classify your data.

1. Choose **Publish** to publish the `mkt_sls_table` asset.

## Step 7 - Create the project for data analysis
<a name="create-project-for-data-analysis-gs-rs"></a>

The following section describes the steps of creating te project for data analysis in this workflow.

1. Once you complete Step 6, in the Amazon DataZone data portal, choose **Create project**.

1. In the **Create project** page, specify the project name, for example, for this workflow, you can name it **MarketingDataAnalysisProject**, then leave the rest of the fields unchanged, and then choose **Create**.

## Step 8 - Create an environment for data analysis
<a name="create-environment-gs2-rs"></a>

The following section describes the steps of creating an environment for data analysis in this workflow.

1. Once you complete Step 7, in the Amazon DataZone data portal, choose the `MarketingDataAnalysisProject` project that you created in the previous step, then choose the **Environments** tab, and then choose **Add environment**.

1. On the **Create environment** page, specify the following and then choose **Create environment**.
   + **Name** - specify the name for the environment. For this walkthrough, you can call it `Default data warehouse environment`.
   + **Description** - specify a description for the environment.
   + **Environment profile** - choose **DataWarehouseProfile** environment profile.
   + Provide the name of your Amazon Redshift cluster, database name, and the secret ARN for the Amazon Redshift cluster where your data is stored. 
**Note**  
Make sure that your secret in AWS Secrets Manager includes the following tags (key/value):  
For Amazon Redshift cluster - datazone.rs.cluster: <cluster\$1name:database name>  
For Amazon Redshift Serverless workgroup - datazone.rs.workgroup: <workgroup\$1name:database\$1name>
AmazonDataZoneProject: <projectID> 
AmazonDataZoneDomain: <domainID>
For more information, see [Storing database credentials in AWS Secrets Manager](https://docs.aws.amazon.com//redshift/latest/mgmt/data-api-access.html#data-api-secrets).  
The database user you provide in the AWS Secrets Manager must have super user permissions.
   + For this walkthrough, keep the rest of the fields unchanged.

## Step 9 - Search the data catalog and subscribe to data
<a name="search-catalog-subscribe-gs-rs"></a>

The following section describes the steps of searching the data catalog and subscribing to data.

1. Once you complete Step 8, in the Amazon DataZone data portal, search for data assets using keywords (e.g., 'catalog' or 'sales') in the data portal's **Search** bar. 

   If necessary, apply filters or sorting, and once you locate the Product Sales Data asset, you can choose it to open the asset's details page.

1. On the Product Sales Data asset's details page, choose **Subscribe**.

1. In the dialog, choose your consumer project from the dropdown, provide the reason for access request, and then choose **Subscribe**.

## Step 10 - Approve the subscription request
<a name="approve-subscription-request-gs-rs"></a>

The following section describes the steps of approving the subscription request in this workflow.

1. Once you complete Step 9, in the Amazon DataZone data portal, choose the **SalesDataPublishingProject** project with which you published your asset.

1. Choose the **Data** tab, then **Published data**, and then **Incoming requests**.

1. Choose the view request link and then choose **Approve**. 

## Step 11 - Build a query and analyze data in Amazon Redshift
<a name="analyze-data-gs-rs"></a>

Now that you have successfully published an asset to the Amazon DataZone catalog and subscribed to it, you can analyze it.

1. In the Amazon DataZone data portal, on the right-hand panel, click the Amazon Redshift link. This opens the Amazon Redshift query editor using project’s credential for authentication.

1. You can now run a query (select statement) on the subscribed table. You can click on the table (three-vertical-dots option) and choose preview to have select statement on the editor screen. Execute the query to see the results. 

# Amazon DataZone quickstart with sample scripts
<a name="quickstart-apis"></a>

You can access Amazon DataZone via the management portal or the Amazon DataZone data portal, or programmatically by using the Amazon DataZone HTTPS API, which lets you issue HTTPS requests directly to the service. This section contains sample scripts that invoke Amazon DataZone APIs that you can use to complete the following common tasks:

**Topics**
+ [

## Create an Amazon DataZone domain and data portal
](#create-domain-gs-glue-api)
+ [

## Create a publishing project
](#create-publishing-project-gs-glue-api)
+ [

## Create an environment profile
](#create-environment-profile-gs-glue-api)
+ [

## Create an environment
](#create-environment-gs-glue-api)
+ [

## Gather metadata from AWS Glue
](#gather-metadata-from-glue-gs-glue-api)
+ [

## Curate and publish a data asset
](#curate-data-asset-gs-glue-api)
+ [

## Search the data catalog and subscribe to data
](#search-catalog-subscribe-gs-glue-api)
+ [

## Search for assets in the data catalog
](#search-catalog-subscribe-gs-glue-api)
+ [

## Other useful sample scripts
](#other-useful-scripts-api)

## Create an Amazon DataZone domain and data portal
<a name="create-domain-gs-glue-api"></a>

You can use the following sample script to create an Amazon DataZone domain. For more information about Amazon DataZone domains, see [Amazon DataZone terminology and concepts](datazone-concepts.md). 

```
import sys
import boto3

// Initialize datazone client
region = 'us-east-1'
dzclient = boto3.client(service_name='datazone', region_name='us-east-1')

// Create DataZone domain
def create_domain(name):
    return dzclient.create_domain(
        name = name,
        description = "this is a description",
        domainExecutionRole = "arn:aws:iam::<account>:role/AmazonDataZoneDomainExecutionRole",
    )
```

## Create a publishing project
<a name="create-publishing-project-gs-glue-api"></a>

You can use the following sample script to create a publishing project in Amazon DataZone.

```
// Create Project
def create_project(domainId):
    return dzclient.create_project(
        domainIdentifier = domainId,
        name = "sample-project"
    )
```

## Create an environment profile
<a name="create-environment-profile-gs-glue-api"></a>

You can use the following sample scripts to create an environment profile in Amazon DataZone.

This sample payload is used when the `CreateEnvironmentProfile` API is invoked:

```
Sample Payload
{
    "Content":{
        "project_name": "Admin_project",
        "domain_name": "Drug-Research-and-Development",
        "blueprint_account_region": [
            {
                "blueprint_name": "DefaultDataLake",
                "account_id": ["066535990535",
                "413878397724",
                "676266385322", 
                "747721550195", 
                "755347404384"
                ],
                "region": ["us-west-2", "us-east-1"]
            },
            {
                "blueprint_name": "DefaultDataWarehouse",
                "account_id": ["066535990535",
                "413878397724",
                "676266385322", 
                "747721550195", 
                "755347404384"
                ],
                "region":["us-west-2", "us-east-1"]
            }
        ]
    }
}
```

This sample script invokes the `CreateEnvironmentProfile` API:

```
def create_environment_profile(domain_id, project_id, env_blueprints)    
        try:
            response = dz.list_environment_blueprints(
                domainIdentifier=domain_id,
                managed=True
            )
            env_blueprints = response.get("items")
            env_blueprints_map = {}
            for i in env_blueprints:
                env_blueprints_map[i["name"]] = i['id']
            
            print("Environment Blueprint map", env_blueprints_map)
            for i in blueprint_account_region:
                print(i)
                for j in i["account_id"]:
                    for k in i["region"]:
                        print("The env blueprint name is", i['blueprint_name'])
                        dz.create_environment_profile(
                            description='This is a test environment profile created via lambda function',
                            domainIdentifier=domain_id,
                            awsAccountId=j,
                            awsAccountRegion=k,
                            environmentBlueprintIdentifier=env_blueprints_map.get(i["blueprint_name"]),
                            name=i["blueprint_name"] + j + k + "_profile",
                            projectIdentifier=project_id
                        )
        except Exception as e:
            print("Failed to created Environment Profile")
            raise e
```

This is the sample output payload once the `CreateEnvironmentProfile` API is invoked:

```
{
    "Content":{
        "project_name": "Admin_project",
        "domain_name": "Drug-Research-and-Development",
        "blueprint_account_region": [
            {
                "blueprint_name": "DefaultDataWarehouse",
                "account_id": ["111111111111"],
                "region":["us-west-2"],
                "user_parameters":[
                    {
                        "name": "dataAccessSecretsArn",
                        "value": ""
                    }
                ] 
            }
        ]
    }
}
```

## Create an environment
<a name="create-environment-gs-glue-api"></a>

You can use the following sample script to create an environment in Amazon DataZone.

```
def create_environment(domain_id, project_id,blueprint_account_region ):
         try:
            #refer to get_domain_id and get_project_id for fetching ids using names.
            sts_client = boto3.client("sts")
            # Get the current account ID
            account_id = sts_client.get_caller_identity()["Account"]
            print("Fetching environment profile ids")
            env_profile_map = get_env_profile_map(domain_id, project_id)

            for i in blueprint_account_region:
                for j in i["account_id"]:
                    for k in i["region"]:
                        print(" env blueprint name", i['blueprint_name'])
                        profile_name = i["blueprint_name"] + j + k + "_profile"
                        env_name = i["blueprint_name"] + j + k + "_env"
                        description = f'This is environment is created for {profile_name}, Account {account_id} and region {i["region"]}'
                        try:
                            dz.create_environment(
                                description=description,
                                domainIdentifier=domain_id,
                                environmentProfileIdentifier=env_profile_map.get(profile_name),
                                name=env_name,
                                projectIdentifier=project_id
                            )
                            print(f"Environment created - {env_name}")
                        except:
                            dz.create_environment(
                                description=description,
                                domainIdentifier=domain_id,
                                environmentProfileIdentifier=env_profile_map.get(profile_name),
                                name=env_name,
                                projectIdentifier=project_id,
                                userParameters= i["user_parameters"] 
                            )
                            print(f"Environment created - {env_name}")
        except Exception as e:
            print("Failed to created Environment")
            raise e
```

## Gather metadata from AWS Glue
<a name="gather-metadata-from-glue-gs-glue-api"></a>

You can use this sample script to gather metadata from AWS Glue. This script runs on a standard schedule. You can retrieve the parameters from the sample script and make them global. Fetch the project, environment, and domain ID using standard functions. The AWS Glue data source is created and ran at a standard time which can be updated in the cron section of the script. 

```
def crcreate_data_source(domain_id, project_id,data_source_name)
        print("Creating Data Source")
        data_source_creation = dz.create_data_source(
            # Define data source : Customize the data source to which you'd like to connect
            # define the name of the Data source to create, example: name ='TestGlueDataSource'
            name=data_source_name,
            # give a description for the datasource (optional), example: description='This is a dorra test for creation on DZ datasources'
            description=data_source_description,
            # insert the domain identifier corresponding to the domain to which the datasource will belong, example: domainIdentifier= 'dzd_6f3gst5jjmrrmv'
            domainIdentifier=domain_id,
            # give environment identifier , example: environmentIdentifier= '3weyt6hhn8qcvb'
            environmentIdentifier=environment_id,
            # give corresponding project identifier, example: projectIdentifier= '6tl4csoyrg16ef',
            projectIdentifier=project_id,
            enableSetting="ENABLED",
            # publishOnImport used to select whether assets are added to the inventory and/or discovery catalog .
            # publishOnImport = True : Assets will be added to project's inventory as well as published to the discovery catalog
            # publishOnImport = False : Assets will only be added to project's inventory.
            # You can later curate the metadata of the assets and choose subscription terms to publish them from the inventory to the discovery catalog.
            publishOnImport=False,
            # Automated business name generation : Use AI to automatically generate metadata for assets as they are published or updated by this data source run.
            # Automatically generated metadata can be be approved, rejected, or edited by data publishers.
            # Automatically generated metadata is badged with a small icon next to the corresponding metadata field.
            recommendation={"enableBusinessNameGeneration": True},
            type="GLUE",
            configuration={
                "glueRunConfiguration": {
                    "dataAccessRole": "arn:aws:iam::"
                    + account_id
                    + ":role/service-role/AmazonDataZoneGlueAccess-"
                    + current_region
                    + "-"
                    + domain_id
                    + "",
                    "relationalFilterConfigurations": [
                        {
                            #
                            "databaseName": glue_database_name,
                            "filterExpressions": [
                                {"expression": "*", "type": "INCLUDE"},
                            ],
                            #    "schemaName": "TestSchemaName",
                        },
                    ],
                },
            },
            # Add metadata forms to the data source (OPTIONAL).
            # Metadata forms will be automatically applied to any assets that are created by the data source.
            # assetFormsInput=[
            #     {
            #         "content": "string",
            #         "formName": "string",
            #         "typeIdentifier": "string",
            #         "typeRevision": "string",
            #     },
            # ],
            schedule={
                "schedule": "cron(5 20 * * ? *)",
                "timezone": "UTC",
            },
        )
        # This is a suggested syntax to return values
        #        return_values["data_source_creation"] = data_source_creation["items"]
        print("Data Source Created")


//This is the sample response payload after the CreateDataSource API is invoked:

{
    "Content":{
        "project_name": "Admin",
        "domain_name": "Drug-Research-and-Development",
        "env_name": "GlueEnvironment",
        "glue_database_name": "test",
        "data_source_name" : "test",
        "data_source_description" : "This is a test data source"
    }
}
```

## Curate and publish a data asset
<a name="curate-data-asset-gs-glue-api"></a>

You can use the following sample scripts to curate and publish data assets in Amazon DataZone.

You can use the following script to create custom form types:

```
 
def create_form_type(domainId, projectId):
    return dzclient.create_form_type(
        domainIdentifier = domainId,
        name = "customForm",
        model = {
            "smithy": "structure customForm { simple: String }"
        },
        owningProjectIdentifier = projectId,
        status = "ENABLED"
    )
```

You can use the following sample script to create custom asset types:

```
def create_custom_asset_type(domainId, projectId):
    return dzclient.create_asset_type(
        domainIdentifier = domainId,
        name = "userCustomAssetType",
        formsInput = {
            "Model": {
                "typeIdentifier": "customForm",
                "typeRevision": "1",
                "required": False
            }
        },
        owningProjectIdentifier = projectId,
    )
```

You can use the following sample script to create custom assets:

```
def create_custom_asset(domainId, projectId):
    return dzclient.create_asset(
        domainIdentifier = domainId,
        name = 'custom asset',
        description = "custom asset",
        owningProjectIdentifier = projectId,
        typeIdentifier = "userCustomAssetType",
        formsInput = [
            {
                "formName": "UserCustomForm",
                "typeIdentifier": "customForm",
                "content": "{\"simple\":\"sample-catalogId\"}"
            }
        ]
    )
```

You can use the following sample script to create a glossary:

```
def create_glossary(domainId, projectId):
    return dzclient.create_glossary(
        domainIdentifier = domainId,
        name = "test7",
        description = "this is a test glossary",
        owningProjectIdentifier = projectId
    )
```

You can use the following sample script to create a glossary term:

```
def create_glossary_term(domainId, glossaryId):
    return dzclient.create_glossary_term(
        domainIdentifier = domainId,
        name = "soccer",
        shortDescription = "this is a test glossary",
        glossaryIdentifier = glossaryId,
    )
```

You can use the following sample script to create an asset using a system-defined asset type:

```
def create_asset(domainId, projectId):
    return dzclient.create_asset(
        domainIdentifier = domainId,
        name = 'sample asset name',
        description = "this is a glue table asset",
        owningProjectIdentifier = projectId,
        typeIdentifier = "amazon.datazone.GlueTableAssetType",
        formsInput = [
            {
                "formName": "GlueTableForm",
                "content": "{\"catalogId\":\"sample-catalogId\",\"columns\":[{\"columnDescription\":\"sample-columnDescription\",\"columnName\":\"sample-columnName\",\"dataType\":\"sample-dataType\",\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}}],\"compressionType\":\"sample-compressionType\",\"lakeFormationDetails\":{\"lakeFormationManagedTable\":false,\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}},\"primaryKeys\":[\"sample-Key1\",\"sample-Key2\"],\"region\":\"us-east-1\",\"sortKeys\":[\"sample-sortKey1\"],\"sourceClassification\":\"sample-sourceClassification\",\"sourceLocation\":\"sample-sourceLocation\",\"tableArn\":\"sample-tableArn\",\"tableDescription\":\"sample-tableDescription\",\"tableName\":\"sample-tableName\"}"
            }
        ]
    )
```

You can use the following sample script to create an asset revision and attach a glossary term:

```
def create_asset_revision(domainId, assetId):
    return dzclient.create_asset_revision(
        domainIdentifier = domainId,
        identifier = assetId,
        name = 'glue table asset 7',
        description = "glue table asset description update",
        formsInput = [
            {
                "formName": "GlueTableForm",
                "content": "{\"catalogId\":\"sample-catalogId\",\"columns\":[{\"columnDescription\":\"sample-columnDescription\",\"columnName\":\"sample-columnName\",\"dataType\":\"sample-dataType\",\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}}],\"compressionType\":\"sample-compressionType\",\"lakeFormationDetails\":{\"lakeFormationManagedTable\":false,\"lakeFormationTags\":{\"sample-key1\":\"sample-value1\",\"sample-key2\":\"sample-value2\"}},\"primaryKeys\":[\"sample-Key1\",\"sample-Key2\"],\"region\":\"us-east-1\",\"sortKeys\":[\"sample-sortKey1\"],\"sourceClassification\":\"sample-sourceClassification\",\"sourceLocation\":\"sample-sourceLocation\",\"tableArn\":\"sample-tableArn\",\"tableDescription\":\"sample-tableDescription\",\"tableName\":\"sample-tableName\"}"
            }
        ],
        glossaryTerms = ["<glossaryTermId:>"]
    )
```

You can use the following sample script to publish an asset:

```
def publish_asset(domainId, assetId):
    return dzclient.create_listing_change_set(
        domainIdentifier = domainId,
        entityIdentifier = assetId,
        entityType = "ASSET",
        action = "PUBLISH",
    )
```

## Search the data catalog and subscribe to data
<a name="search-catalog-subscribe-gs-glue-api"></a>

You can use the following sample scripts to search the data catalog and subscribe to data:

```
def search_asset(domainId, projectId, text):
    return dzclient.search(
        domainIdentifier = domainId,
        owningProjectIdentifier = projectId,
        searchScope = "ASSET",
        searchText = text,
    )
```

You can use the following sample script to get the listing ID for the asset:

```
def search_listings(domainId, assetName, assetId):
    listings = dzclient.search_listings(
        domainIdentifier=domainId,
        searchText=assetName,
        additionalAttributes=["FORMS"]
    )
    
    assetListing = None
    for listing in listings['items']:
        if listing['assetListing']['entityId'] == assetId: 
            assetListing = listing
    
    return listing['assetListing']['listingId']
```

You can use the following sample scripts to create a subscription request using the listing ID:

```
create_subscription_response = def create_subscription_request(domainId, projectId, listingId):
    return dzclient.create_subscription_request(
        subscribedPrincipals=[{
            "project": {
                "identifier": projectId
            }
        }],
        subscribedListings=[{
            "identifier": listingId
        }],
        requestReason="Give request reason here."
    )
```

Using the `create_subscription_response` above, get the ` subscription_request_id`, and then accept/approve the subscription using the following sample script: 

```
subscription_request_id = create_subscription_response["id"]

def accept_subscription_request(domainId, subscriptionRequestId): 
    return dzclient.accept_subscription_request(
        domainIdentifier=domainId,
        identifier=subscriptionRequestId
    )
```

## Search for assets in the data catalog
<a name="search-catalog-subscribe-gs-glue-api"></a>

You can use the following sample scripts that utilize free text search to look up your published data assets (listings) in the Amazon DataZone catalog.
+ The following example performs a free text keyword search in the domain and returns all the listings that match the provided keyword 'credit':

  ```
  aws datazone search-listings \
    --domain-identifier dzd_c1s7uxe71prrtz \
    --search-text "credit"
  ```
+ You can also combine multiple keywords to further narrow down the search scope. For example, if you are looking for all the published data assets (listings) that have data related to sales in Mexico, you can formulate your query with two keyword 'Mexico' and 'sales'. 

  ```
              aws datazone search-listings \
    --domain-identifier dzd_c1s7uxe71prrtz \
    --search-text "mexico sales"
  ```

You can also search for listing using filters. The `filters` parameter in the SearchListings API allows you to retrieve filtered results from the domain. The API supports multiple default filters and you can also combine two or more filters and perform AND/OR operation on them. The filter clause takes in two parameters: attrbibute and value. The default supported filter attributes are `typeName`, `owningProjectId`, and `glossaryTerms`. 
+ The following example performs a search of all the listings in a given domain using the `assetType` filter where the listing is a type of Redshift Table.

  ```
              aws datazone search-listings \
  --domain-identifier dzd_c1s7uxe71prrtz \
  --filters '{"or":[{"filter":{"attribute":"typeName","value":"RedshiftTableAssetType"}} ]}'
  ```
+ You can also combine multiple filters together using AND/OR operations. In the following example, you combine `typeName` and `project` filters.

  ```
              aws datazone search-listings \
  --domain-identifier dzd_c1s7uxe71prrtz \
  --filters '{"or":[{"filter":{"attribute":"typeName","value":"RedshiftTableAssetType"}},  {"filter":{"attribute":"owningProjectId","value":"cwrrjch7f5kppj"}} ]}'
  ```
+ You can even combine free text search along with filters to find exact results and further sort them by creation/last updated time of the listing as shown in the following example:

  ```
              aws datazone search-listings \
  --domain-identifier dzd_c1s7uxe71prrtz \
  --search-text "finance sales" \
  --filters '{"or":[{"filter":{"attribute":"typeName","value":"GlueTableViewType"}} ]}' \
  --sort '{"attribute": "UPDATED_AT", "order":"ASCENDING"}'
  ```

## Other useful sample scripts
<a name="other-useful-scripts-api"></a>

You can use the following sample scripts to complete various tasks as you work with your data in Amazon DataZone.

Use the following sample script to list existing Amazon DataZone domains:

```
def list_domains():
    datazone = boto3.client('datazone')
    response = datazone.list_domains(status='AVAILABLE')
    [print("%12s | %16s | %12s | %52s" % (item['id'], item['name'], item['managedAccountId'], item['portalUrl'])) for item in response['items']]
    return
```

Use the following sample script to list existing Amazon DataZone projects:

```
def list_projects(domain_id):
    datazone = boto3.client('datazone')
    response = datazone.list_projects(domainIdentifier=domain_id)
    [print("%12s | %16s " % (item['id'], item['name'])) for item in response['items']]
    return
```

Use the following sample script to list existing Amazon DataZone metadata forms:

```
def list_metadata_forms(domain_id):
    datazone = boto3.client('datazone')
    response = datazone.search_types(domainIdentifier=domain_id, 
        managed=False,
        searchScope='FORM_TYPE')
    [print("%16s | %16s | %3s | %8s" % (item['formTypeItem']['name'], item['formTypeItem']['owningProjectId'],item['formTypeItem']['revision'], item['formTypeItem']['status'])) for item in response['items']]
    return
```