AWS Glue examples using SDK for Rust - AWS SDK for Rust

AWS Glue examples using SDK for Rust

The following code examples show you how to perform actions and implement common scenarios by using the AWS SDK for Rust with AWS Glue.

Basics are code examples that show you how to perform the essential operations within a service.

Actions are code excerpts from larger programs and must be run in context. While actions show you how to call individual service functions, you can see actions in context in their related scenarios.

Each example includes a link to the complete source code, where you can find instructions on how to set up and run the code in context.

Get started

The following code examples show how to get started using AWS Glue.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let mut list_jobs = glue.list_jobs().into_paginator().send(); while let Some(list_jobs_output) = list_jobs.next().await { match list_jobs_output { Ok(list_jobs) => { let names = list_jobs.job_names(); info!(?names, "Found these jobs") } Err(err) => return Err(GlueMvpError::from_glue_sdk(err)), } }
  • For API details, see ListJobs in AWS SDK for Rust API reference.

Basics

The following code example shows how to:

  • Create a crawler that crawls a public Amazon S3 bucket and generates a database of CSV-formatted metadata.

  • List information about databases and tables in your AWS Glue Data Catalog.

  • Create a job to extract CSV data from the S3 bucket, transform the data, and load JSON-formatted output into another S3 bucket.

  • List information about job runs, view transformed data, and clean up resources.

For more information, see Tutorial: Getting started with AWS Glue Studio.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

Create and run a crawler that crawls a public Amazon Simple Storage Service (Amazon S3) bucket and generates a metadata database that describes the CSV-formatted data it finds.

let create_crawler = glue .create_crawler() .name(self.crawler()) .database_name(self.database()) .role(self.iam_role.expose_secret()) .targets( CrawlerTargets::builder() .s3_targets(S3Target::builder().path(CRAWLER_TARGET).build()) .build(), ) .send() .await; match create_crawler { Err(err) => { let glue_err: aws_sdk_glue::Error = err.into(); match glue_err { aws_sdk_glue::Error::AlreadyExistsException(_) => { info!("Using existing crawler"); Ok(()) } _ => Err(GlueMvpError::GlueSdk(glue_err)), } } Ok(_) => Ok(()), }?; let start_crawler = glue.start_crawler().name(self.crawler()).send().await; match start_crawler { Ok(_) => Ok(()), Err(err) => { let glue_err: aws_sdk_glue::Error = err.into(); match glue_err { aws_sdk_glue::Error::CrawlerRunningException(_) => Ok(()), _ => Err(GlueMvpError::GlueSdk(glue_err)), } } }?;

List information about databases and tables in your AWS Glue Data Catalog.

let database = glue .get_database() .name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)? .to_owned(); let database = database .database() .ok_or_else(|| GlueMvpError::Unknown("Could not find database".into()))?; let tables = glue .get_tables() .database_name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let tables = tables.table_list();

Create and run a job that extracts CSV data from the source Amazon S3 bucket, transforms it by removing and renaming fields, and loads JSON-formatted output into another Amazon S3 bucket.

let create_job = glue .create_job() .name(self.job()) .role(self.iam_role.expose_secret()) .command( JobCommand::builder() .name("glueetl") .python_version("3") .script_location(format!("s3://{}/job.py", self.bucket())) .build(), ) .glue_version("3.0") .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let job_name = create_job.name().ok_or_else(|| { GlueMvpError::Unknown("Did not get job name after creating job".into()) })?; let job_run_output = glue .start_job_run() .job_name(self.job()) .arguments("--input_database", self.database()) .arguments( "--input_table", self.tables .first() .ok_or_else(|| GlueMvpError::Unknown("Missing crawler table".into()))? .name(), ) .arguments("--output_bucket_url", self.bucket()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let job = job_run_output .job_run_id() .ok_or_else(|| GlueMvpError::Unknown("Missing run id from just started job".into()))? .to_string();

Delete all resources created by the demo.

glue.delete_job() .job_name(self.job()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; for t in &self.tables { glue.delete_table() .name(t.name()) .database_name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; } glue.delete_database() .name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; glue.delete_crawler() .name(self.crawler()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?;

Actions

The following code example shows how to use CreateCrawler.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let create_crawler = glue .create_crawler() .name(self.crawler()) .database_name(self.database()) .role(self.iam_role.expose_secret()) .targets( CrawlerTargets::builder() .s3_targets(S3Target::builder().path(CRAWLER_TARGET).build()) .build(), ) .send() .await; match create_crawler { Err(err) => { let glue_err: aws_sdk_glue::Error = err.into(); match glue_err { aws_sdk_glue::Error::AlreadyExistsException(_) => { info!("Using existing crawler"); Ok(()) } _ => Err(GlueMvpError::GlueSdk(glue_err)), } } Ok(_) => Ok(()), }?;
  • For API details, see CreateCrawler in AWS SDK for Rust API reference.

The following code example shows how to use CreateJob.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let create_job = glue .create_job() .name(self.job()) .role(self.iam_role.expose_secret()) .command( JobCommand::builder() .name("glueetl") .python_version("3") .script_location(format!("s3://{}/job.py", self.bucket())) .build(), ) .glue_version("3.0") .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let job_name = create_job.name().ok_or_else(|| { GlueMvpError::Unknown("Did not get job name after creating job".into()) })?;
  • For API details, see CreateJob in AWS SDK for Rust API reference.

The following code example shows how to use DeleteCrawler.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

glue.delete_crawler() .name(self.crawler()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?;
  • For API details, see DeleteCrawler in AWS SDK for Rust API reference.

The following code example shows how to use DeleteDatabase.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

glue.delete_database() .name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?;
  • For API details, see DeleteDatabase in AWS SDK for Rust API reference.

The following code example shows how to use DeleteJob.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

glue.delete_job() .job_name(self.job()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?;
  • For API details, see DeleteJob in AWS SDK for Rust API reference.

The following code example shows how to use DeleteTable.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

for t in &self.tables { glue.delete_table() .name(t.name()) .database_name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; }
  • For API details, see DeleteTable in AWS SDK for Rust API reference.

The following code example shows how to use GetCrawler.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let tmp_crawler = glue .get_crawler() .name(self.crawler()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?;
  • For API details, see GetCrawler in AWS SDK for Rust API reference.

The following code example shows how to use GetDatabase.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let database = glue .get_database() .name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)? .to_owned(); let database = database .database() .ok_or_else(|| GlueMvpError::Unknown("Could not find database".into()))?;
  • For API details, see GetDatabase in AWS SDK for Rust API reference.

The following code example shows how to use GetJobRun.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let get_job_run = || async { Ok::<JobRun, GlueMvpError>( glue.get_job_run() .job_name(self.job()) .run_id(job_run_id.to_string()) .send() .await .map_err(GlueMvpError::from_glue_sdk)? .job_run() .ok_or_else(|| GlueMvpError::Unknown("Failed to get job_run".into()))? .to_owned(), ) }; let mut job_run = get_job_run().await?; let mut state = job_run.job_run_state().unwrap_or(&unknown_state).to_owned(); while matches!( state, JobRunState::Starting | JobRunState::Stopping | JobRunState::Running ) { info!(?state, "Waiting for job to finish"); tokio::time::sleep(self.wait_delay).await; job_run = get_job_run().await?; state = job_run.job_run_state().unwrap_or(&unknown_state).to_owned(); }
  • For API details, see GetJobRun in AWS SDK for Rust API reference.

The following code example shows how to use GetTables.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let tables = glue .get_tables() .database_name(self.database()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let tables = tables.table_list();
  • For API details, see GetTables in AWS SDK for Rust API reference.

The following code example shows how to use ListJobs.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let mut list_jobs = glue.list_jobs().into_paginator().send(); while let Some(list_jobs_output) = list_jobs.next().await { match list_jobs_output { Ok(list_jobs) => { let names = list_jobs.job_names(); info!(?names, "Found these jobs") } Err(err) => return Err(GlueMvpError::from_glue_sdk(err)), } }
  • For API details, see ListJobs in AWS SDK for Rust API reference.

The following code example shows how to use StartCrawler.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let start_crawler = glue.start_crawler().name(self.crawler()).send().await; match start_crawler { Ok(_) => Ok(()), Err(err) => { let glue_err: aws_sdk_glue::Error = err.into(); match glue_err { aws_sdk_glue::Error::CrawlerRunningException(_) => Ok(()), _ => Err(GlueMvpError::GlueSdk(glue_err)), } } }?;
  • For API details, see StartCrawler in AWS SDK for Rust API reference.

The following code example shows how to use StartJobRun.

SDK for Rust
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

let job_run_output = glue .start_job_run() .job_name(self.job()) .arguments("--input_database", self.database()) .arguments( "--input_table", self.tables .first() .ok_or_else(|| GlueMvpError::Unknown("Missing crawler table".into()))? .name(), ) .arguments("--output_bucket_url", self.bucket()) .send() .await .map_err(GlueMvpError::from_glue_sdk)?; let job = job_run_output .job_run_id() .ok_or_else(|| GlueMvpError::Unknown("Missing run id from just started job".into()))? .to_string();
  • For API details, see StartJobRun in AWS SDK for Rust API reference.