Appendix - SageMaker Studio Administration Best Practices

Appendix

Multi-tenancy comparison

Table 2 — Multi-tenancy comparison

Multi-domain

Multi-account

Attribute-based access control (ABAC) within a single domain

Resource isolation is achieved using tags. SageMaker AI Studio automatically tags all resources with the domain ARN and user profile/ space ARN.

Each tenant is in their own account, so there is absolute resource isolation.

Resource isolation is achieved using tags. Users have to manage the tagging of created resources for ABAC.

List APIs cannot be restricted by tags. UI filtering of resources is done on shared spaces, however, List API calls made through the AWS CLI or the Boto3 SDK will list resources across the Region.

List APIs isolation is also possible, since tenants are in their dedicated accounts.

List APIs cannot be restricted by tags. List API calls made through the AWS CLI or the Boto3 SDK will list resources across the Region.

SageMaker AI Studio compute and storage costs per tenant can be easily monitored by using Domain ARN as a cost allocation tag.

SageMaker AI Studio compute and storage costs per tenant are easy to monitor with a dedicated account.

SageMaker AI Studio compute costs per tenant need to be calculated using custom tags.

SageMaker AI Studio storage costs cannot be monitored per domain since all tenants share the same EFS volume.

Service quotas are set at the account level, so a single tenant could still use up all resources.

Service quotas can be set at the account level for each tenant.

Service quotas are set at the account level, so a single tenant could still use up all resources.

Scaling to multiple tenants can be achieved through infrastructure as code (IaC) or Service Catalog.

Scaling to multiple tenants involve Organizations and vending multiple accounts.

Scaling needs a tenant specific role for each new tenant, and user profiles need to be manually tagged with tenant names.

Collaboration between users within a tenant is possible through shared spaces.

Collaboration between user within a tenant is possible through shared spaces.

All tenants will have access to the same shared space for collaboration.

SageMaker AI Studio domain backup and recovery

In the event of an accidental EFS delete, or when a domain needs to be recreated due to changes in networking or authentication, follow these instructions.

Option 1: Back up from existing EFS using EC2

SageMaker Studio domain backup

  1. List user profiles and spaces in SageMaker Studio (CLI, SDK).

  2. Map user profiles/spaces to UIDs on EFS.

    1. For each user in list of users/spaces, describe the user profile/space (CLI, SDK).

    2. Map user profile/space to HomeEfsFileSystemUid.

    3. Map user profile to UserSettings['ExecutionRole'] if users have distinct execution roles.

    4. Identify the default Space execution role.

  3. Create a new domain and specify the default Space execution role.

  4. Create user profiles and spaces.

    • For each user in list of users, create user profile (CLI, SDK) using the execution role mapping.

  5. Create a mapping for the new EFS and UIDs.

    1. For each user in list of users, describe user profile (CLI, SDK).

    2. Map user profile to HomeEfsFileSystemUid.

  6. Optionally, delete all apps, user profiles, spaces, and then delete the domain.

EFS backup

To back up EFS, use the following instructions:

  1. Launch the EC2 instance, and attach the old SageMaker Studio domain’s inbound/outbound security groups to the new EC2 instance (allow NFS traffic over TCP on port 2049. Refer to Connect SageMaker Studio Notebooks in a VPC to External Resources.

  2. Mount the SageMaker Studio EFS volume to the new EC2 instance. Refer to Mounting EFS file systems.

  3. Copy over the files to EBS local storage: >sudo cp -rp /efs /studio-backup:

    1. Attach the new domain security groups to the EC2 instance.

    2. Mount the new EFS volume to the EC2 instance.

    3. Copy files to the new EFS volume.

    4. For each user in user’s collection:

      1. Create the directory: mkdir new_uid.

      2. Copy files from old UID directory to new UID directory.

      3. Change ownership for all files: chown <new_UID> for all files.

Option 2: Back up from existing EFS using S3 and lifecycle configuration

  1. Refer to Migrate your work to an Amazon SageMaker notebook instance with Amazon Linux 2.

  2. Create an S3 bucket for backup (such as >studio-backup.

  3. List all user profiles with execution roles.

  4. In the current SageMaker Studio domain, set a default LCC script at the domain level.

    • In the LCC, copy everything in /home/sagemaker-user to the user profile prefix in S3 (for example, s3://studio-backup/studio-user1).

  5. Restart all default Jupyter Server apps (for the LCC to be run).

  6. Delete all apps, user profiles, and domains.

  7. Create a new SageMaker Studio domain.

  8. Create new user profiles from the list of user profiles and execution roles.

  9. Set up an LCC at the domain level:

    • In the LCC, copy everything in the user profile prefix in S3 to /home/sagemaker-user

  10. Create default Jupyter Server apps for all users with the LCC configuration (CLI, SDK).

SageMaker Studio access using SAML assertion

Solution setup:

  1. Create a SAML application in your external IdP.

  2. Set up the external IdP as an Identity Provider in IAM.

  3. Create a SAMLValidator Lambda function that can be accessed by the IdP (through a function URL or API Gateway).

  4. Create a GeneratePresignedUrl Lambda function and an API Gateway to access the function.

  5. Create an IAM role that users can assume to invoke the API Gateway. This role should be passed in SAML assertion as an attribute in the following format:

    • Attribute name: https://aws.amazon.com/SAML/Attributes/Role

    • Attribute value: <IdentityProviderARN>, <RoleARN>

  6. Update the SAML Assertion Consumer Service (ACS) endpoint to the SAMLValidator invoke URL.

SAML validator example code:

import requests import os import boto3 from urllib.parse import urlparse, parse_qs import base64 import requests from aws_requests_auth.aws_auth import AWSRequestsAuth import json # Config for calling AssumeRoleWithSAML idp_arn = "arn:aws:iam::0123456789:saml-provider/MyIdentityProvider" api_gw_role_arn = 'arn:aws:iam:: 0123456789:role/APIGWAccessRole' studio_api_url = "abcdef.execute-api.us-east-1.amazonaws.com" studio_api_gw_path = "https://" + studio_api_url + "/Prod " # Every customer will need to get SAML Response from the POST call def get_saml_response(event): saml_response_uri = base64.b64decode(event['body']).decode('ascii') request_body = parse_qs(saml_response_uri) print(f"b64 saml response: {request_body['SAMLResponse'][0]}") return request_body['SAMLResponse'][0] def lambda_handler(event, context): sts = boto3.client('sts') # get temporary credentials response = sts.assume_role_with_saml( RoleArn=api_gw_role_arn, PrincipalArn=durga_idp_arn, SAMLAssertion=get_saml_response(event) ) auth = AWSRequestsAuth(aws_access_key=response['Credentials']['AccessKeyId'], aws_secret_access_key=response['Credentials']['SecretAccessKey'], aws_host=studio_api_url, aws_region='us-west-2', aws_service='execute-api', aws_token=response['Credentials']['SessionToken']) presigned_response = requests.post( studio_api_gw_path, data=saml_response_data, auth=auth) return presigned_response