AWS Glue esempi che utilizzano SDK for C++ - AWS Esempi di codice SDK

Sono disponibili altri esempi AWS SDK nel repository AWS Doc SDK Examples. GitHub

Le traduzioni sono generate tramite traduzione automatica. In caso di conflitto tra il contenuto di una traduzione e la versione originale in Inglese, quest'ultima prevarrà.

AWS Glue esempi che utilizzano SDK for C++

I seguenti esempi di codice mostrano come eseguire azioni e implementare scenari comuni utilizzando AWS SDK for C++ with AWS Glue.

Le nozioni di base sono esempi di codice che mostrano come eseguire le operazioni essenziali all'interno di un servizio.

Le operazioni sono estratti di codice da programmi più grandi e devono essere eseguite nel contesto. Sebbene le operazioni mostrino come richiamare le singole funzioni del servizio, è possibile visualizzarle contestualizzate negli scenari correlati.

Ogni esempio include un collegamento al codice sorgente completo, in cui è possibile trovare istruzioni su come configurare ed eseguire il codice nel contesto.

Nozioni di base

L'esempio di codice seguente mostra come iniziare a utilizzare AWS Glue.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Codice per il CMake file CMake Lists.txt.

# Set the minimum required version of CMake for this project. cmake_minimum_required(VERSION 3.13) # Set the AWS service components used by this project. set(SERVICE_COMPONENTS glue) # Set this project's name. project("hello_glue") # Set the C++ standard to use to build this target. # At least C++ 11 is required for the AWS SDK for C++. set(CMAKE_CXX_STANDARD 11) # Use the MSVC variable to determine if this is a Windows build. set(WINDOWS_BUILD ${MSVC}) if (WINDOWS_BUILD) # Set the location where CMake can find the installed libraries for the AWS SDK. string(REPLACE ";" "/aws-cpp-sdk-all;" SYSTEM_MODULE_PATH "${CMAKE_SYSTEM_PREFIX_PATH}/aws-cpp-sdk-all") list(APPEND CMAKE_PREFIX_PATH ${SYSTEM_MODULE_PATH}) endif () # Find the AWS SDK for C++ package. find_package(AWSSDK REQUIRED COMPONENTS ${SERVICE_COMPONENTS}) if (WINDOWS_BUILD AND AWSSDK_INSTALL_AS_SHARED_LIBS) # Copy relevant AWS SDK for C++ libraries into the current binary directory for running and debugging. # set(BIN_SUB_DIR "/Debug") # if you are building from the command line you may need to uncomment this # and set the proper subdirectory to the executables' location. AWSSDK_CPY_DYN_LIBS(SERVICE_COMPONENTS "" ${CMAKE_CURRENT_BINARY_DIR}${BIN_SUB_DIR}) endif () add_executable(${PROJECT_NAME} hello_glue.cpp) target_link_libraries(${PROJECT_NAME} ${AWSSDK_LINK_LIBRARIES})

Codice per il file di origine hello_glue.cpp.

#include <aws/core/Aws.h> #include <aws/glue/GlueClient.h> #include <aws/glue/model/ListJobsRequest.h> #include <iostream> /* * A "Hello Glue" starter application which initializes an AWS Glue client and lists the * AWS Glue job definitions. * * main function * * Usage: 'hello_glue' * */ int main(int argc, char **argv) { Aws::SDKOptions options; // Optionally change the log level for debugging. // options.loggingOptions.logLevel = Utils::Logging::LogLevel::Debug; Aws::InitAPI(options); // Should only be called once. int result = 0; { Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient glueClient(clientConfig); std::vector<Aws::String> jobs; Aws::String nextToken; // Used for pagination. do { Aws::Glue::Model::ListJobsRequest listJobsRequest; if (!nextToken.empty()) { listJobsRequest.SetNextToken(nextToken); } Aws::Glue::Model::ListJobsOutcome listRunsOutcome = glueClient.ListJobs( listJobsRequest); if (listRunsOutcome.IsSuccess()) { const std::vector<Aws::String> &jobNames = listRunsOutcome.GetResult().GetJobNames(); jobs.insert(jobs.end(), jobNames.begin(), jobNames.end()); nextToken = listRunsOutcome.GetResult().GetNextToken(); } else { std::cerr << "Error listing jobs. " << listRunsOutcome.GetError().GetMessage() << std::endl; result = 1; break; } } while (!nextToken.empty()); std::cout << "Your account has " << jobs.size() << " jobs." << std::endl; for (size_t i = 0; i < jobs.size(); ++i) { std::cout << " " << i + 1 << ". " << jobs[i] << std::endl; } } Aws::ShutdownAPI(options); // Should only be called once. return result; }
  • Per i dettagli sull'API, consulta la ListJobssezione AWS SDK for C++ API Reference.

Nozioni di base

L'esempio di codice seguente mostra come:

  • Crea un crawler che esegue la scansione di un bucket Amazon S3 pubblico e genera un database di metadati in formato CSV.

  • Elenca le informazioni su database e tabelle nel tuo AWS Glue Data Catalog.

  • Crea un processo per estrarre i dati CSV dal bucket S3, trasformare i dati e caricare l'output in formato JSON in un altro bucket S3.

  • Elenca le informazioni sulle esecuzioni dei processi, visualizza i dati trasformati e pulisci le risorse.

Per ulteriori informazioni, consulta Tutorial: Guida introduttiva a AWS Glue Studio.

SDK per C++
Nota

C'è altro da sapere GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

//! Scenario which demonstrates using AWS Glue to add a crawler and run a job. /*! \\sa runGettingStartedWithGlueScenario() \param bucketName: An S3 bucket created in the setup. \param roleName: An AWS Identity and Access Management (IAM) role created in the setup. \param clientConfig: AWS client configuration. \return bool: Successful completion. */ bool AwsDoc::Glue::runGettingStartedWithGlueScenario(const Aws::String &bucketName, const Aws::String &roleName, const Aws::Client::ClientConfiguration &clientConfig) { Aws::Glue::GlueClient client(clientConfig); Aws::String roleArn; if (!getRoleArn(roleName, roleArn, clientConfig)) { std::cerr << "Error getting role ARN for role." << std::endl; return false; } // 1. Upload the job script to the S3 bucket. { std::cout << "Uploading the job script '" << AwsDoc::Glue::PYTHON_SCRIPT << "'." << std::endl; if (!AwsDoc::Glue::uploadFile(bucketName, AwsDoc::Glue::PYTHON_SCRIPT_PATH, AwsDoc::Glue::PYTHON_SCRIPT, clientConfig)) { std::cerr << "Error uploading the job file." << std::endl; return false; } } // 2. Create a crawler. { Aws::Glue::Model::S3Target s3Target; s3Target.SetPath("s3://crawler-public-us-east-1/flight/2016/csv"); Aws::Glue::Model::CrawlerTargets crawlerTargets; crawlerTargets.AddS3Targets(s3Target); Aws::Glue::Model::CreateCrawlerRequest request; request.SetTargets(crawlerTargets); request.SetName(CRAWLER_NAME); request.SetDatabaseName(CRAWLER_DATABASE_NAME); request.SetTablePrefix(CRAWLER_DATABASE_PREFIX); request.SetRole(roleArn); Aws::Glue::Model::CreateCrawlerOutcome outcome = client.CreateCrawler(request); if (outcome.IsSuccess()) { std::cout << "Successfully created the crawler." << std::endl; } else { std::cerr << "Error creating a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets("", CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } // 3. Get a crawler. { Aws::Glue::Model::GetCrawlerRequest request; request.SetName(CRAWLER_NAME); Aws::Glue::Model::GetCrawlerOutcome outcome = client.GetCrawler(request); if (outcome.IsSuccess()) { Aws::Glue::Model::CrawlerState crawlerState = outcome.GetResult().GetCrawler().GetState(); std::cout << "Retrieved crawler with state " << Aws::Glue::Model::CrawlerStateMapper::GetNameForCrawlerState( crawlerState) << "." << std::endl; } else { std::cerr << "Error retrieving a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } // 4. Start a crawler. { Aws::Glue::Model::StartCrawlerRequest request; request.SetName(CRAWLER_NAME); Aws::Glue::Model::StartCrawlerOutcome outcome = client.StartCrawler(request); if (outcome.IsSuccess() || (Aws::Glue::GlueErrors::CRAWLER_RUNNING == outcome.GetError().GetErrorType())) { if (!outcome.IsSuccess()) { std::cout << "Crawler was already started." << std::endl; } else { std::cout << "Successfully started crawler." << std::endl; } std::cout << "This may take a while to run." << std::endl; Aws::Glue::Model::CrawlerState crawlerState = Aws::Glue::Model::CrawlerState::NOT_SET; int iterations = 0; while (Aws::Glue::Model::CrawlerState::READY != crawlerState) { std::this_thread::sleep_for(std::chrono::seconds(1)); ++iterations; if ((iterations % 10) == 0) { // Log status every 10 seconds. std::cout << "Crawler status " << Aws::Glue::Model::CrawlerStateMapper::GetNameForCrawlerState( crawlerState) << ". After " << iterations << " seconds elapsed." << std::endl; } Aws::Glue::Model::GetCrawlerRequest getCrawlerRequest; getCrawlerRequest.SetName(CRAWLER_NAME); Aws::Glue::Model::GetCrawlerOutcome getCrawlerOutcome = client.GetCrawler( getCrawlerRequest); if (getCrawlerOutcome.IsSuccess()) { crawlerState = getCrawlerOutcome.GetResult().GetCrawler().GetState(); } else { std::cerr << "Error getting crawler. " << getCrawlerOutcome.GetError().GetMessage() << std::endl; break; } } if (Aws::Glue::Model::CrawlerState::READY == crawlerState) { std::cout << "Crawler finished running after " << iterations << " seconds." << std::endl; } } else { std::cerr << "Error starting a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } // 5. Get a database. { Aws::Glue::Model::GetDatabaseRequest request; request.SetName(CRAWLER_DATABASE_NAME); Aws::Glue::Model::GetDatabaseOutcome outcome = client.GetDatabase(request); if (outcome.IsSuccess()) { const Aws::Glue::Model::Database &database = outcome.GetResult().GetDatabase(); std::cout << "Successfully retrieve the database\n" << database.Jsonize().View().WriteReadable() << "'." << std::endl; } else { std::cerr << "Error getting the database. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } // 6. Get tables. Aws::String tableName; { Aws::Glue::Model::GetTablesRequest request; request.SetDatabaseName(CRAWLER_DATABASE_NAME); std::vector<Aws::Glue::Model::Table> all_tables; Aws::String nextToken; // Used for pagination. do { Aws::Glue::Model::GetTablesOutcome outcome = client.GetTables(request); if (outcome.IsSuccess()) { const std::vector<Aws::Glue::Model::Table> &tables = outcome.GetResult().GetTableList(); all_tables.insert(all_tables.end(), tables.begin(), tables.end()); nextToken = outcome.GetResult().GetNextToken(); } else { std::cerr << "Error getting the tables. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } while (!nextToken.empty()); std::cout << "The database contains " << all_tables.size() << (all_tables.size() == 1 ? " table." : "tables.") << std::endl; std::cout << "Here is a list of the tables in the database."; for (size_t index = 0; index < all_tables.size(); ++index) { std::cout << " " << index + 1 << ": " << all_tables[index].GetName() << std::endl; } if (!all_tables.empty()) { int tableIndex = askQuestionForIntRange( "Enter an index to display the database detail ", 1, static_cast<int>(all_tables.size())); std::cout << all_tables[tableIndex - 1].Jsonize().View().WriteReadable() << std::endl; tableName = all_tables[tableIndex - 1].GetName(); } } // 7. Create a job. { Aws::Glue::Model::CreateJobRequest request; request.SetName(JOB_NAME); request.SetRole(roleArn); request.SetGlueVersion(GLUE_VERSION); Aws::Glue::Model::JobCommand command; command.SetName(JOB_COMMAND_NAME); command.SetPythonVersion(JOB_PYTHON_VERSION); command.SetScriptLocation( Aws::String("s3://") + bucketName + "/" + PYTHON_SCRIPT); request.SetCommand(command); Aws::Glue::Model::CreateJobOutcome outcome = client.CreateJob(request); if (outcome.IsSuccess()) { std::cout << "Successfully created the job." << std::endl; } else { std::cerr << "Error creating the job. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } // 8. Start a job run. { Aws::Glue::Model::StartJobRunRequest request; request.SetJobName(JOB_NAME); Aws::Map<Aws::String, Aws::String> arguments; arguments["--input_database"] = CRAWLER_DATABASE_NAME; arguments["--input_table"] = tableName; arguments["--output_bucket_url"] = Aws::String("s3://") + bucketName + "/"; request.SetArguments(arguments); Aws::Glue::Model::StartJobRunOutcome outcome = client.StartJobRun(request); if (outcome.IsSuccess()) { std::cout << "Successfully started the job." << std::endl; Aws::String jobRunId = outcome.GetResult().GetJobRunId(); int iterator = 0; bool done = false; while (!done) { ++iterator; std::this_thread::sleep_for(std::chrono::seconds(1)); Aws::Glue::Model::GetJobRunRequest jobRunRequest; jobRunRequest.SetJobName(JOB_NAME); jobRunRequest.SetRunId(jobRunId); Aws::Glue::Model::GetJobRunOutcome jobRunOutcome = client.GetJobRun( jobRunRequest); if (jobRunOutcome.IsSuccess()) { const Aws::Glue::Model::JobRun &jobRun = jobRunOutcome.GetResult().GetJobRun(); Aws::Glue::Model::JobRunState jobRunState = jobRun.GetJobRunState(); if ((jobRunState == Aws::Glue::Model::JobRunState::STOPPED) || (jobRunState == Aws::Glue::Model::JobRunState::FAILED) || (jobRunState == Aws::Glue::Model::JobRunState::TIMEOUT)) { std::cerr << "Error running job. " << jobRun.GetErrorMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } else if (jobRunState == Aws::Glue::Model::JobRunState::SUCCEEDED) { std::cout << "Job run succeeded after " << iterator << " seconds elapsed." << std::endl; done = true; } else if ((iterator % 10) == 0) { // Log status every 10 seconds. std::cout << "Job run status " << Aws::Glue::Model::JobRunStateMapper::GetNameForJobRunState( jobRunState) << ". " << iterator << " seconds elapsed." << std::endl; } } else { std::cerr << "Error retrieving job run state. " << jobRunOutcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } } } else { std::cerr << "Error starting a job. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } } // 9. List the output data stored in the S3 bucket. { Aws::S3::S3Client s3Client; Aws::S3::Model::ListObjectsV2Request request; request.SetBucket(bucketName); request.SetPrefix(OUTPUT_FILE_PREFIX); Aws::String continuationToken; // Used for pagination. std::vector<Aws::S3::Model::Object> allObjects; do { if (!continuationToken.empty()) { request.SetContinuationToken(continuationToken); } Aws::S3::Model::ListObjectsV2Outcome outcome = s3Client.ListObjectsV2( request); if (outcome.IsSuccess()) { const std::vector<Aws::S3::Model::Object> &objects = outcome.GetResult().GetContents(); allObjects.insert(allObjects.end(), objects.begin(), objects.end()); continuationToken = outcome.GetResult().GetNextContinuationToken(); } else { std::cerr << "Error listing objects. " << outcome.GetError().GetMessage() << std::endl; break; } } while (!continuationToken.empty()); std::cout << "Data from your job is in " << allObjects.size() << " files in the S3 bucket, " << bucketName << "." << std::endl; for (size_t i = 0; i < allObjects.size(); ++i) { std::cout << " " << i + 1 << ". " << allObjects[i].GetKey() << std::endl; } int objectIndex = askQuestionForIntRange( std::string( "Enter the number of a block to download it and see the first ") + std::to_string(LINES_OF_RUN_FILE_TO_DISPLAY) + " lines of JSON output in the block: ", 1, static_cast<int>(allObjects.size())); Aws::String objectKey = allObjects[objectIndex - 1].GetKey(); std::stringstream stringStream; if (getObjectFromBucket(bucketName, objectKey, stringStream, clientConfig)) { for (int i = 0; i < LINES_OF_RUN_FILE_TO_DISPLAY && stringStream; ++i) { std::string line; std::getline(stringStream, line); std::cout << " " << line << std::endl; } } else { deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } } // 10. List all the jobs. Aws::String jobName; { Aws::Glue::Model::ListJobsRequest listJobsRequest; Aws::String nextToken; std::vector<Aws::String> allJobNames; do { if (!nextToken.empty()) { listJobsRequest.SetNextToken(nextToken); } Aws::Glue::Model::ListJobsOutcome listRunsOutcome = client.ListJobs( listJobsRequest); if (listRunsOutcome.IsSuccess()) { const std::vector<Aws::String> &jobNames = listRunsOutcome.GetResult().GetJobNames(); allJobNames.insert(allJobNames.end(), jobNames.begin(), jobNames.end()); nextToken = listRunsOutcome.GetResult().GetNextToken(); } else { std::cerr << "Error listing jobs. " << listRunsOutcome.GetError().GetMessage() << std::endl; } } while (!nextToken.empty()); std::cout << "Your account has " << allJobNames.size() << " jobs." << std::endl; for (size_t i = 0; i < allJobNames.size(); ++i) { std::cout << " " << i + 1 << ". " << allJobNames[i] << std::endl; } int jobIndex = askQuestionForIntRange( Aws::String("Enter a number between 1 and ") + std::to_string(allJobNames.size()) + " to see the list of runs for a job: ", 1, static_cast<int>(allJobNames.size())); jobName = allJobNames[jobIndex - 1]; } // 11. Get the job runs for a job. Aws::String jobRunID; if (!jobName.empty()) { Aws::Glue::Model::GetJobRunsRequest getJobRunsRequest; getJobRunsRequest.SetJobName(jobName); Aws::String nextToken; // Used for pagination. std::vector<Aws::Glue::Model::JobRun> allJobRuns; do { if (!nextToken.empty()) { getJobRunsRequest.SetNextToken(nextToken); } Aws::Glue::Model::GetJobRunsOutcome jobRunsOutcome = client.GetJobRuns( getJobRunsRequest); if (jobRunsOutcome.IsSuccess()) { const std::vector<Aws::Glue::Model::JobRun> &jobRuns = jobRunsOutcome.GetResult().GetJobRuns(); allJobRuns.insert(allJobRuns.end(), jobRuns.begin(), jobRuns.end()); nextToken = jobRunsOutcome.GetResult().GetNextToken(); } else { std::cerr << "Error getting job runs. " << jobRunsOutcome.GetError().GetMessage() << std::endl; break; } } while (!nextToken.empty()); std::cout << "There are " << allJobRuns.size() << " runs in the job '" << jobName << "'." << std::endl; for (size_t i = 0; i < allJobRuns.size(); ++i) { std::cout << " " << i + 1 << ". " << allJobRuns[i].GetJobName() << std::endl; } int runIndex = askQuestionForIntRange( Aws::String("Enter a number between 1 and ") + std::to_string(allJobRuns.size()) + " to see details for a run: ", 1, static_cast<int>(allJobRuns.size())); jobRunID = allJobRuns[runIndex - 1].GetId(); } // 12. Get a single job run. if (!jobRunID.empty()) { Aws::Glue::Model::GetJobRunRequest jobRunRequest; jobRunRequest.SetJobName(jobName); jobRunRequest.SetRunId(jobRunID); Aws::Glue::Model::GetJobRunOutcome jobRunOutcome = client.GetJobRun( jobRunRequest); if (jobRunOutcome.IsSuccess()) { std::cout << "Displaying the job run JSON description." << std::endl; std::cout << jobRunOutcome.GetResult().GetJobRun().Jsonize().View().WriteReadable() << std::endl; } else { std::cerr << "Error get a job run. " << jobRunOutcome.GetError().GetMessage() << std::endl; } } return deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); } //! Cleanup routine to delete created assets. /*! \\sa deleteAssets() \param crawler: Name of an AWS Glue crawler. \param database: The name of an AWS Glue database. \param job: The name of an AWS Glue job. \param bucketName: The name of an S3 bucket. \param clientConfig: AWS client configuration. \return bool: Successful completion. */ bool AwsDoc::Glue::deleteAssets(const Aws::String &crawler, const Aws::String &database, const Aws::String &job, const Aws::String &bucketName, const Aws::Client::ClientConfiguration &clientConfig) { const Aws::Glue::GlueClient client(clientConfig); bool result = true; // 13. Delete a job. if (!job.empty()) { Aws::Glue::Model::DeleteJobRequest request; request.SetJobName(job); Aws::Glue::Model::DeleteJobOutcome outcome = client.DeleteJob(request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the job." << std::endl; } else { std::cerr << "Error deleting the job. " << outcome.GetError().GetMessage() << std::endl; result = false; } } // 14. Delete a database. if (!database.empty()) { Aws::Glue::Model::DeleteDatabaseRequest request; request.SetName(database); Aws::Glue::Model::DeleteDatabaseOutcome outcome = client.DeleteDatabase( request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the database." << std::endl; } else { std::cerr << "Error deleting database. " << outcome.GetError().GetMessage() << std::endl; result = false; } } // 15. Delete a crawler. if (!crawler.empty()) { Aws::Glue::Model::DeleteCrawlerRequest request; request.SetName(crawler); Aws::Glue::Model::DeleteCrawlerOutcome outcome = client.DeleteCrawler(request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the crawler." << std::endl; } else { std::cerr << "Error deleting the crawler. " << outcome.GetError().GetMessage() << std::endl; result = false; } } // 16. Delete the job script and run data from the S3 bucket. result &= AwsDoc::Glue::deleteAllObjectsInS3Bucket(bucketName, clientConfig); return result; } //! Routine which uploads a file to an S3 bucket. /*! \\sa uploadFile() \param bucketName: An S3 bucket created in the setup. \param filePath: The path of the file to upload. \param fileName The name for the uploaded file. \param clientConfig: AWS client configuration. \return bool: Successful completion. */ bool AwsDoc::Glue::uploadFile(const Aws::String &bucketName, const Aws::String &filePath, const Aws::String &fileName, const Aws::Client::ClientConfiguration &clientConfig) { Aws::S3::S3Client s3_client(clientConfig); Aws::S3::Model::PutObjectRequest request; request.SetBucket(bucketName); request.SetKey(fileName); std::shared_ptr<Aws::IOStream> inputData = Aws::MakeShared<Aws::FStream>("SampleAllocationTag", filePath.c_str(), std::ios_base::in | std::ios_base::binary); if (!*inputData) { std::cerr << "Error unable to read file " << filePath << std::endl; return false; } request.SetBody(inputData); Aws::S3::Model::PutObjectOutcome outcome = s3_client.PutObject(request); if (!outcome.IsSuccess()) { std::cerr << "Error: PutObject: " << outcome.GetError().GetMessage() << std::endl; } else { std::cout << "Added object '" << filePath << "' to bucket '" << bucketName << "'." << std::endl; } return outcome.IsSuccess(); } //! Routine which deletes all objects in an S3 bucket. /*! \\sa deleteAllObjectsInS3Bucket() \param bucketName: The S3 bucket name. \param clientConfig: AWS client configuration. \return bool: Successful completion. */ bool AwsDoc::Glue::deleteAllObjectsInS3Bucket(const Aws::String &bucketName, const Aws::Client::ClientConfiguration &clientConfig) { Aws::S3::S3Client client(clientConfig); Aws::S3::Model::ListObjectsV2Request listObjectsRequest; listObjectsRequest.SetBucket(bucketName); Aws::String continuationToken; // Used for pagination. bool result = true; do { if (!continuationToken.empty()) { listObjectsRequest.SetContinuationToken(continuationToken); } Aws::S3::Model::ListObjectsV2Outcome listObjectsOutcome = client.ListObjectsV2( listObjectsRequest); if (listObjectsOutcome.IsSuccess()) { const std::vector<Aws::S3::Model::Object> &objects = listObjectsOutcome.GetResult().GetContents(); if (!objects.empty()) { Aws::S3::Model::DeleteObjectsRequest deleteObjectsRequest; deleteObjectsRequest.SetBucket(bucketName); std::vector<Aws::S3::Model::ObjectIdentifier> objectIdentifiers; for (const Aws::S3::Model::Object &object: objects) { objectIdentifiers.push_back( Aws::S3::Model::ObjectIdentifier().WithKey( object.GetKey())); } Aws::S3::Model::Delete objectsDelete; objectsDelete.SetObjects(objectIdentifiers); objectsDelete.SetQuiet(true); deleteObjectsRequest.SetDelete(objectsDelete); Aws::S3::Model::DeleteObjectsOutcome deleteObjectsOutcome = client.DeleteObjects(deleteObjectsRequest); if (!deleteObjectsOutcome.IsSuccess()) { std::cerr << "Error deleting objects. " << deleteObjectsOutcome.GetError().GetMessage() << std::endl; result = false; break; } else { std::cout << "Successfully deleted the objects." << std::endl; } } else { std::cout << "No objects to delete in '" << bucketName << "'." << std::endl; } continuationToken = listObjectsOutcome.GetResult().GetNextContinuationToken(); } else { std::cerr << "Error listing objects. " << listObjectsOutcome.GetError().GetMessage() << std::endl; result = false; break; } } while (!continuationToken.empty()); return result; } //! Routine which retrieves an object from an S3 bucket. /*! \\sa getObjectFromBucket() \param bucketName: The S3 bucket name. \param objectKey: The object's name. \param objectStream: A stream to receive the retrieved data. \param clientConfig: AWS client configuration. \return bool: Successful completion. */ bool AwsDoc::Glue::getObjectFromBucket(const Aws::String &bucketName, const Aws::String &objectKey, std::ostream &objectStream, const Aws::Client::ClientConfiguration &clientConfig) { Aws::S3::S3Client client(clientConfig); Aws::S3::Model::GetObjectRequest request; request.SetBucket(bucketName); request.SetKey(objectKey); Aws::S3::Model::GetObjectOutcome outcome = client.GetObject(request); if (outcome.IsSuccess()) { std::cout << "Successfully retrieved '" << objectKey << "'." << std::endl; auto &body = outcome.GetResult().GetBody(); objectStream << body.rdbuf(); } else { std::cerr << "Error retrieving object. " << outcome.GetError().GetMessage() << std::endl; } return outcome.IsSuccess(); }

Operazioni

Il seguente esempio di codice mostra come usareCreateCrawler.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::S3Target s3Target; s3Target.SetPath("s3://crawler-public-us-east-1/flight/2016/csv"); Aws::Glue::Model::CrawlerTargets crawlerTargets; crawlerTargets.AddS3Targets(s3Target); Aws::Glue::Model::CreateCrawlerRequest request; request.SetTargets(crawlerTargets); request.SetName(CRAWLER_NAME); request.SetDatabaseName(CRAWLER_DATABASE_NAME); request.SetTablePrefix(CRAWLER_DATABASE_PREFIX); request.SetRole(roleArn); Aws::Glue::Model::CreateCrawlerOutcome outcome = client.CreateCrawler(request); if (outcome.IsSuccess()) { std::cout << "Successfully created the crawler." << std::endl; } else { std::cerr << "Error creating a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets("", CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, CreateCrawlerconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareCreateJob.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::CreateJobRequest request; request.SetName(JOB_NAME); request.SetRole(roleArn); request.SetGlueVersion(GLUE_VERSION); Aws::Glue::Model::JobCommand command; command.SetName(JOB_COMMAND_NAME); command.SetPythonVersion(JOB_PYTHON_VERSION); command.SetScriptLocation( Aws::String("s3://") + bucketName + "/" + PYTHON_SCRIPT); request.SetCommand(command); Aws::Glue::Model::CreateJobOutcome outcome = client.CreateJob(request); if (outcome.IsSuccess()) { std::cout << "Successfully created the job." << std::endl; } else { std::cerr << "Error creating the job. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, CreateJobconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareDeleteCrawler.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::DeleteCrawlerRequest request; request.SetName(crawler); Aws::Glue::Model::DeleteCrawlerOutcome outcome = client.DeleteCrawler(request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the crawler." << std::endl; } else { std::cerr << "Error deleting the crawler. " << outcome.GetError().GetMessage() << std::endl; result = false; }
  • Per i dettagli sull'API, DeleteCrawlerconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareDeleteDatabase.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::DeleteDatabaseRequest request; request.SetName(database); Aws::Glue::Model::DeleteDatabaseOutcome outcome = client.DeleteDatabase( request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the database." << std::endl; } else { std::cerr << "Error deleting database. " << outcome.GetError().GetMessage() << std::endl; result = false; }
  • Per i dettagli sull'API, DeleteDatabaseconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareDeleteJob.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::DeleteJobRequest request; request.SetJobName(job); Aws::Glue::Model::DeleteJobOutcome outcome = client.DeleteJob(request); if (outcome.IsSuccess()) { std::cout << "Successfully deleted the job." << std::endl; } else { std::cerr << "Error deleting the job. " << outcome.GetError().GetMessage() << std::endl; result = false; }
  • Per i dettagli sull'API, DeleteJobconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareGetCrawler.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::GetCrawlerRequest request; request.SetName(CRAWLER_NAME); Aws::Glue::Model::GetCrawlerOutcome outcome = client.GetCrawler(request); if (outcome.IsSuccess()) { Aws::Glue::Model::CrawlerState crawlerState = outcome.GetResult().GetCrawler().GetState(); std::cout << "Retrieved crawler with state " << Aws::Glue::Model::CrawlerStateMapper::GetNameForCrawlerState( crawlerState) << "." << std::endl; } else { std::cerr << "Error retrieving a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, GetCrawlerconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareGetDatabase.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::GetDatabaseRequest request; request.SetName(CRAWLER_DATABASE_NAME); Aws::Glue::Model::GetDatabaseOutcome outcome = client.GetDatabase(request); if (outcome.IsSuccess()) { const Aws::Glue::Model::Database &database = outcome.GetResult().GetDatabase(); std::cout << "Successfully retrieve the database\n" << database.Jsonize().View().WriteReadable() << "'." << std::endl; } else { std::cerr << "Error getting the database. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, GetDatabaseconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareGetJobRun.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::GetJobRunRequest jobRunRequest; jobRunRequest.SetJobName(jobName); jobRunRequest.SetRunId(jobRunID); Aws::Glue::Model::GetJobRunOutcome jobRunOutcome = client.GetJobRun( jobRunRequest); if (jobRunOutcome.IsSuccess()) { std::cout << "Displaying the job run JSON description." << std::endl; std::cout << jobRunOutcome.GetResult().GetJobRun().Jsonize().View().WriteReadable() << std::endl; } else { std::cerr << "Error get a job run. " << jobRunOutcome.GetError().GetMessage() << std::endl; }
  • Per i dettagli sull'API, GetJobRunconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareGetJobRuns.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::GetJobRunsRequest getJobRunsRequest; getJobRunsRequest.SetJobName(jobName); Aws::String nextToken; // Used for pagination. std::vector<Aws::Glue::Model::JobRun> allJobRuns; do { if (!nextToken.empty()) { getJobRunsRequest.SetNextToken(nextToken); } Aws::Glue::Model::GetJobRunsOutcome jobRunsOutcome = client.GetJobRuns( getJobRunsRequest); if (jobRunsOutcome.IsSuccess()) { const std::vector<Aws::Glue::Model::JobRun> &jobRuns = jobRunsOutcome.GetResult().GetJobRuns(); allJobRuns.insert(allJobRuns.end(), jobRuns.begin(), jobRuns.end()); nextToken = jobRunsOutcome.GetResult().GetNextToken(); } else { std::cerr << "Error getting job runs. " << jobRunsOutcome.GetError().GetMessage() << std::endl; break; } } while (!nextToken.empty());
  • Per i dettagli sull'API, GetJobRunsconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareGetTables.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::GetTablesRequest request; request.SetDatabaseName(CRAWLER_DATABASE_NAME); std::vector<Aws::Glue::Model::Table> all_tables; Aws::String nextToken; // Used for pagination. do { Aws::Glue::Model::GetTablesOutcome outcome = client.GetTables(request); if (outcome.IsSuccess()) { const std::vector<Aws::Glue::Model::Table> &tables = outcome.GetResult().GetTableList(); all_tables.insert(all_tables.end(), tables.begin(), tables.end()); nextToken = outcome.GetResult().GetNextToken(); } else { std::cerr << "Error getting the tables. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; } } while (!nextToken.empty()); std::cout << "The database contains " << all_tables.size() << (all_tables.size() == 1 ? " table." : "tables.") << std::endl; std::cout << "Here is a list of the tables in the database."; for (size_t index = 0; index < all_tables.size(); ++index) { std::cout << " " << index + 1 << ": " << all_tables[index].GetName() << std::endl; } if (!all_tables.empty()) { int tableIndex = askQuestionForIntRange( "Enter an index to display the database detail ", 1, static_cast<int>(all_tables.size())); std::cout << all_tables[tableIndex - 1].Jsonize().View().WriteReadable() << std::endl; tableName = all_tables[tableIndex - 1].GetName(); }
  • Per i dettagli sull'API, GetTablesconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareListJobs.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::ListJobsRequest listJobsRequest; Aws::String nextToken; std::vector<Aws::String> allJobNames; do { if (!nextToken.empty()) { listJobsRequest.SetNextToken(nextToken); } Aws::Glue::Model::ListJobsOutcome listRunsOutcome = client.ListJobs( listJobsRequest); if (listRunsOutcome.IsSuccess()) { const std::vector<Aws::String> &jobNames = listRunsOutcome.GetResult().GetJobNames(); allJobNames.insert(allJobNames.end(), jobNames.begin(), jobNames.end()); nextToken = listRunsOutcome.GetResult().GetNextToken(); } else { std::cerr << "Error listing jobs. " << listRunsOutcome.GetError().GetMessage() << std::endl; } } while (!nextToken.empty());
  • Per i dettagli sull'API, ListJobsconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareStartCrawler.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::StartCrawlerRequest request; request.SetName(CRAWLER_NAME); Aws::Glue::Model::StartCrawlerOutcome outcome = client.StartCrawler(request); if (outcome.IsSuccess() || (Aws::Glue::GlueErrors::CRAWLER_RUNNING == outcome.GetError().GetErrorType())) { if (!outcome.IsSuccess()) { std::cout << "Crawler was already started." << std::endl; } else { std::cout << "Successfully started crawler." << std::endl; } std::cout << "This may take a while to run." << std::endl; Aws::Glue::Model::CrawlerState crawlerState = Aws::Glue::Model::CrawlerState::NOT_SET; int iterations = 0; while (Aws::Glue::Model::CrawlerState::READY != crawlerState) { std::this_thread::sleep_for(std::chrono::seconds(1)); ++iterations; if ((iterations % 10) == 0) { // Log status every 10 seconds. std::cout << "Crawler status " << Aws::Glue::Model::CrawlerStateMapper::GetNameForCrawlerState( crawlerState) << ". After " << iterations << " seconds elapsed." << std::endl; } Aws::Glue::Model::GetCrawlerRequest getCrawlerRequest; getCrawlerRequest.SetName(CRAWLER_NAME); Aws::Glue::Model::GetCrawlerOutcome getCrawlerOutcome = client.GetCrawler( getCrawlerRequest); if (getCrawlerOutcome.IsSuccess()) { crawlerState = getCrawlerOutcome.GetResult().GetCrawler().GetState(); } else { std::cerr << "Error getting crawler. " << getCrawlerOutcome.GetError().GetMessage() << std::endl; break; } } if (Aws::Glue::Model::CrawlerState::READY == crawlerState) { std::cout << "Crawler finished running after " << iterations << " seconds." << std::endl; } } else { std::cerr << "Error starting a crawler. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, "", bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, StartCrawlerconsulta AWS SDK for C++ API Reference.

Il seguente esempio di codice mostra come utilizzareStartJobRun.

SDK per C++
Nota

C'è altro da fare GitHub. Trova l'esempio completo e scopri di più sulla configurazione e l'esecuzione nel Repository di esempi di codice AWS.

Aws::Client::ClientConfiguration clientConfig; // Optional: Set to the AWS Region in which the bucket was created (overrides config file). // clientConfig.region = "us-east-1"; Aws::Glue::GlueClient client(clientConfig); Aws::Glue::Model::StartJobRunRequest request; request.SetJobName(JOB_NAME); Aws::Map<Aws::String, Aws::String> arguments; arguments["--input_database"] = CRAWLER_DATABASE_NAME; arguments["--input_table"] = tableName; arguments["--output_bucket_url"] = Aws::String("s3://") + bucketName + "/"; request.SetArguments(arguments); Aws::Glue::Model::StartJobRunOutcome outcome = client.StartJobRun(request); if (outcome.IsSuccess()) { std::cout << "Successfully started the job." << std::endl; Aws::String jobRunId = outcome.GetResult().GetJobRunId(); int iterator = 0; bool done = false; while (!done) { ++iterator; std::this_thread::sleep_for(std::chrono::seconds(1)); Aws::Glue::Model::GetJobRunRequest jobRunRequest; jobRunRequest.SetJobName(JOB_NAME); jobRunRequest.SetRunId(jobRunId); Aws::Glue::Model::GetJobRunOutcome jobRunOutcome = client.GetJobRun( jobRunRequest); if (jobRunOutcome.IsSuccess()) { const Aws::Glue::Model::JobRun &jobRun = jobRunOutcome.GetResult().GetJobRun(); Aws::Glue::Model::JobRunState jobRunState = jobRun.GetJobRunState(); if ((jobRunState == Aws::Glue::Model::JobRunState::STOPPED) || (jobRunState == Aws::Glue::Model::JobRunState::FAILED) || (jobRunState == Aws::Glue::Model::JobRunState::TIMEOUT)) { std::cerr << "Error running job. " << jobRun.GetErrorMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } else if (jobRunState == Aws::Glue::Model::JobRunState::SUCCEEDED) { std::cout << "Job run succeeded after " << iterator << " seconds elapsed." << std::endl; done = true; } else if ((iterator % 10) == 0) { // Log status every 10 seconds. std::cout << "Job run status " << Aws::Glue::Model::JobRunStateMapper::GetNameForJobRunState( jobRunState) << ". " << iterator << " seconds elapsed." << std::endl; } } else { std::cerr << "Error retrieving job run state. " << jobRunOutcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; } } } else { std::cerr << "Error starting a job. " << outcome.GetError().GetMessage() << std::endl; deleteAssets(CRAWLER_NAME, CRAWLER_DATABASE_NAME, JOB_NAME, bucketName, clientConfig); return false; }
  • Per i dettagli sull'API, StartJobRunconsulta AWS SDK for C++ API Reference.