You are viewing documentation for version 2 of the AWS SDK for Ruby. Version 3 documentation can be found here.
Class: Aws::Glue::Client
- Inherits:
-
Seahorse::Client::Base
- Object
- Seahorse::Client::Base
- Aws::Glue::Client
- Defined in:
- (unknown)
Overview
An API client for AWS Glue. To construct a client, you need to configure a :region
and :credentials
.
glue = Aws::Glue::Client.new(
region: region_name,
credentials: credentials,
# ...
)
See #initialize for a full list of supported configuration options.
Region
You can configure a default region in the following locations:
ENV['AWS_REGION']
Aws.config[:region]
Go here for a list of supported regions.
Credentials
Default credentials are loaded automatically from the following locations:
ENV['AWS_ACCESS_KEY_ID']
andENV['AWS_SECRET_ACCESS_KEY']
Aws.config[:credentials]
- The shared credentials ini file at
~/.aws/credentials
(more information) - From an instance profile when running on EC2
You can also construct a credentials object from one of the following classes:
Alternatively, you configure credentials with :access_key_id
and
:secret_access_key
:
# load credentials from disk
creds = YAML.load(File.read('/path/to/secrets'))
Aws::Glue::Client.new(
access_key_id: creds['access_key_id'],
secret_access_key: creds['secret_access_key']
)
Always load your credentials from outside your application. Avoid configuring credentials statically and never commit them to source control.
Instance Attribute Summary
Attributes inherited from Seahorse::Client::Base
Constructor collapse
-
#initialize(options = {}) ⇒ Aws::Glue::Client
constructor
Constructs an API client.
API Operations collapse
-
#batch_create_partition(options = {}) ⇒ Types::BatchCreatePartitionResponse
Creates one or more partitions in a batch operation.
.
-
#batch_delete_connection(options = {}) ⇒ Types::BatchDeleteConnectionResponse
Deletes a list of connection definitions from the Data Catalog.
.
-
#batch_delete_partition(options = {}) ⇒ Types::BatchDeletePartitionResponse
Deletes one or more partitions in a batch operation.
.
-
#batch_delete_table(options = {}) ⇒ Types::BatchDeleteTableResponse
Deletes multiple tables at once.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table.
-
#batch_delete_table_version(options = {}) ⇒ Types::BatchDeleteTableVersionResponse
Deletes a specified batch of versions of a table.
.
-
#batch_get_crawlers(options = {}) ⇒ Types::BatchGetCrawlersResponse
Returns a list of resource metadata for a given list of crawler names.
-
#batch_get_dev_endpoints(options = {}) ⇒ Types::BatchGetDevEndpointsResponse
Returns a list of resource metadata for a given list of development endpoint names.
-
#batch_get_jobs(options = {}) ⇒ Types::BatchGetJobsResponse
Returns a list of resource metadata for a given list of job names.
-
#batch_get_partition(options = {}) ⇒ Types::BatchGetPartitionResponse
Retrieves partitions in a batch request.
.
-
#batch_get_triggers(options = {}) ⇒ Types::BatchGetTriggersResponse
Returns a list of resource metadata for a given list of trigger names.
-
#batch_get_workflows(options = {}) ⇒ Types::BatchGetWorkflowsResponse
Returns a list of resource metadata for a given list of workflow names.
-
#batch_stop_job_run(options = {}) ⇒ Types::BatchStopJobRunResponse
Stops one or more job runs for a specified job definition.
.
-
#batch_update_partition(options = {}) ⇒ Types::BatchUpdatePartitionResponse
Updates one or more partitions in a batch operation.
.
-
#cancel_ml_task_run(options = {}) ⇒ Types::CancelMLTaskRunResponse
Cancels (stops) a task run.
-
#check_schema_version_validity(options = {}) ⇒ Types::CheckSchemaVersionValidityResponse
Validates the supplied schema.
-
#create_classifier(options = {}) ⇒ Struct
Creates a classifier in the user's account.
-
#create_connection(options = {}) ⇒ Struct
Creates a connection definition in the Data Catalog.
.
-
#create_crawler(options = {}) ⇒ Struct
Creates a new crawler with specified targets, role, configuration, and optional schedule.
-
#create_database(options = {}) ⇒ Struct
Creates a new database in a Data Catalog.
.
-
#create_dev_endpoint(options = {}) ⇒ Types::CreateDevEndpointResponse
Creates a new development endpoint.
.
-
#create_job(options = {}) ⇒ Types::CreateJobResponse
Creates a new job definition.
.
-
#create_ml_transform(options = {}) ⇒ Types::CreateMLTransformResponse
Creates an AWS Glue machine learning transform.
-
#create_partition(options = {}) ⇒ Struct
Creates a new partition.
.
-
#create_registry(options = {}) ⇒ Types::CreateRegistryResponse
Creates a new registry which may be used to hold a collection of schemas.
.
-
#create_schema(options = {}) ⇒ Types::CreateSchemaResponse
Creates a new schema set and registers the schema definition.
-
#create_script(options = {}) ⇒ Types::CreateScriptResponse
Transforms a directed acyclic graph (DAG) into code.
.
-
#create_security_configuration(options = {}) ⇒ Types::CreateSecurityConfigurationResponse
Creates a new security configuration.
-
#create_table(options = {}) ⇒ Struct
Creates a new table definition in the Data Catalog.
.
-
#create_trigger(options = {}) ⇒ Types::CreateTriggerResponse
Creates a new trigger.
.
-
#create_user_defined_function(options = {}) ⇒ Struct
Creates a new function definition in the Data Catalog.
.
-
#create_workflow(options = {}) ⇒ Types::CreateWorkflowResponse
Creates a new workflow.
.
-
#delete_classifier(options = {}) ⇒ Struct
Removes a classifier from the Data Catalog.
.
-
#delete_column_statistics_for_partition(options = {}) ⇒ Struct
Delete the partition column statistics of a column.
The Identity and Access Management (IAM) permission required for this operation is
.DeletePartition
. -
#delete_column_statistics_for_table(options = {}) ⇒ Struct
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is
.DeleteTable
. -
#delete_connection(options = {}) ⇒ Struct
Deletes a connection from the Data Catalog.
.
-
#delete_crawler(options = {}) ⇒ Struct
Removes a specified crawler from the AWS Glue Data Catalog, unless the crawler state is
RUNNING
..
-
#delete_database(options = {}) ⇒ Struct
Removes a specified database from a Data Catalog.
After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database.
-
#delete_dev_endpoint(options = {}) ⇒ Struct
Deletes a specified development endpoint.
.
-
#delete_job(options = {}) ⇒ Types::DeleteJobResponse
Deletes a specified job definition.
-
#delete_ml_transform(options = {}) ⇒ Types::DeleteMLTransformResponse
Deletes an AWS Glue machine learning transform.
-
#delete_partition(options = {}) ⇒ Struct
Deletes a specified partition.
.
-
#delete_registry(options = {}) ⇒ Types::DeleteRegistryResponse
Delete the entire registry including schema and all of its versions.
-
#delete_resource_policy(options = {}) ⇒ Struct
Deletes a specified policy.
.
-
#delete_schema(options = {}) ⇒ Types::DeleteSchemaResponse
Deletes the entire schema set, including the schema set and all of its versions.
-
#delete_schema_versions(options = {}) ⇒ Types::DeleteSchemaVersionsResponse
Remove versions from the specified schema.
-
#delete_security_configuration(options = {}) ⇒ Struct
Deletes a specified security configuration.
.
-
#delete_table(options = {}) ⇒ Struct
Removes a table definition from the Data Catalog.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table.
-
#delete_table_version(options = {}) ⇒ Struct
Deletes a specified version of a table.
.
-
#delete_trigger(options = {}) ⇒ Types::DeleteTriggerResponse
Deletes a specified trigger.
-
#delete_user_defined_function(options = {}) ⇒ Struct
Deletes an existing function definition from the Data Catalog.
.
-
#delete_workflow(options = {}) ⇒ Types::DeleteWorkflowResponse
Deletes a workflow.
.
-
#get_catalog_import_status(options = {}) ⇒ Types::GetCatalogImportStatusResponse
Retrieves the status of a migration operation.
.
-
#get_classifier(options = {}) ⇒ Types::GetClassifierResponse
Retrieve a classifier by name.
.
-
#get_classifiers(options = {}) ⇒ Types::GetClassifiersResponse
Lists all classifier objects in the Data Catalog.
.
-
#get_column_statistics_for_partition(options = {}) ⇒ Types::GetColumnStatisticsForPartitionResponse
Retrieves partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is
.GetPartition
. -
#get_column_statistics_for_table(options = {}) ⇒ Types::GetColumnStatisticsForTableResponse
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is
.GetTable
. -
#get_connection(options = {}) ⇒ Types::GetConnectionResponse
Retrieves a connection definition from the Data Catalog.
.
-
#get_connections(options = {}) ⇒ Types::GetConnectionsResponse
Retrieves a list of connection definitions from the Data Catalog.
.
-
#get_crawler(options = {}) ⇒ Types::GetCrawlerResponse
Retrieves metadata for a specified crawler.
.
-
#get_crawler_metrics(options = {}) ⇒ Types::GetCrawlerMetricsResponse
Retrieves metrics about specified crawlers.
.
-
#get_crawlers(options = {}) ⇒ Types::GetCrawlersResponse
Retrieves metadata for all crawlers defined in the customer account.
.
-
#get_data_catalog_encryption_settings(options = {}) ⇒ Types::GetDataCatalogEncryptionSettingsResponse
Retrieves the security configuration for a specified catalog.
.
-
#get_database(options = {}) ⇒ Types::GetDatabaseResponse
Retrieves the definition of a specified database.
.
-
#get_databases(options = {}) ⇒ Types::GetDatabasesResponse
Retrieves all databases defined in a given Data Catalog.
.
-
#get_dataflow_graph(options = {}) ⇒ Types::GetDataflowGraphResponse
Transforms a Python script into a directed acyclic graph (DAG).
-
#get_dev_endpoint(options = {}) ⇒ Types::GetDevEndpointResponse
Retrieves information about a specified development endpoint.
When you create a development endpoint in a virtual private cloud (VPC), AWS Glue returns only a private IP address, and the public IP address field is not populated.
-
#get_dev_endpoints(options = {}) ⇒ Types::GetDevEndpointsResponse
Retrieves all the development endpoints in this AWS account.
When you create a development endpoint in a virtual private cloud (VPC), AWS Glue returns only a private IP address and the public IP address field is not populated.
-
#get_job(options = {}) ⇒ Types::GetJobResponse
Retrieves an existing job definition.
.
-
#get_job_bookmark(options = {}) ⇒ Types::GetJobBookmarkResponse
Returns information on a job bookmark entry.
.
-
#get_job_run(options = {}) ⇒ Types::GetJobRunResponse
Retrieves the metadata for a given job run.
.
-
#get_job_runs(options = {}) ⇒ Types::GetJobRunsResponse
Retrieves metadata for all runs of a given job definition.
.
-
#get_jobs(options = {}) ⇒ Types::GetJobsResponse
Retrieves all current job definitions.
.
-
#get_mapping(options = {}) ⇒ Types::GetMappingResponse
Creates mappings.
.
-
#get_ml_task_run(options = {}) ⇒ Types::GetMLTaskRunResponse
Gets details for a specific task run on a machine learning transform.
-
#get_ml_task_runs(options = {}) ⇒ Types::GetMLTaskRunsResponse
Gets a list of runs for a machine learning transform.
-
#get_ml_transform(options = {}) ⇒ Types::GetMLTransformResponse
Gets an AWS Glue machine learning transform artifact and all its corresponding metadata.
-
#get_ml_transforms(options = {}) ⇒ Types::GetMLTransformsResponse
Gets a sortable, filterable list of existing AWS Glue machine learning transforms.
-
#get_partition(options = {}) ⇒ Types::GetPartitionResponse
Retrieves information about a specified partition.
.
-
#get_partition_indexes(options = {}) ⇒ Types::GetPartitionIndexesResponse
Retrieves the partition indexes associated with a table.
.
-
#get_partitions(options = {}) ⇒ Types::GetPartitionsResponse
Retrieves information about the partitions in a table.
.
-
#get_plan(options = {}) ⇒ Types::GetPlanResponse
Gets code to perform a specified mapping.
.
-
#get_registry(options = {}) ⇒ Types::GetRegistryResponse
Describes the specified registry in detail.
.
-
#get_resource_policies(options = {}) ⇒ Types::GetResourcePoliciesResponse
Retrieves the security configurations for the resource policies set on individual resources, and also the account-level policy.
This operation also returns the Data Catalog resource policy.
-
#get_resource_policy(options = {}) ⇒ Types::GetResourcePolicyResponse
Retrieves a specified resource policy.
.
-
#get_schema(options = {}) ⇒ Types::GetSchemaResponse
Describes the specified schema in detail.
.
-
#get_schema_by_definition(options = {}) ⇒ Types::GetSchemaByDefinitionResponse
Retrieves a schema by the
SchemaDefinition
. -
#get_schema_version(options = {}) ⇒ Types::GetSchemaVersionResponse
Get the specified schema by its unique ID assigned when a version of the schema is created or registered.
-
#get_schema_versions_diff(options = {}) ⇒ Types::GetSchemaVersionsDiffResponse
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
This API allows you to compare two schema versions between two schema definitions under the same schema.
. -
#get_security_configuration(options = {}) ⇒ Types::GetSecurityConfigurationResponse
Retrieves a specified security configuration.
.
-
#get_security_configurations(options = {}) ⇒ Types::GetSecurityConfigurationsResponse
Retrieves a list of all security configurations.
.
-
#get_table(options = {}) ⇒ Types::GetTableResponse
Retrieves the
Table
definition in a Data Catalog for a specified table..
-
#get_table_version(options = {}) ⇒ Types::GetTableVersionResponse
Retrieves a specified version of a table.
.
-
#get_table_versions(options = {}) ⇒ Types::GetTableVersionsResponse
Retrieves a list of strings that identify available versions of a specified table.
.
-
#get_tables(options = {}) ⇒ Types::GetTablesResponse
Retrieves the definitions of some or all of the tables in a given
Database
..
-
#get_tags(options = {}) ⇒ Types::GetTagsResponse
Retrieves a list of tags associated with a resource.
.
-
#get_trigger(options = {}) ⇒ Types::GetTriggerResponse
Retrieves the definition of a trigger.
.
-
#get_triggers(options = {}) ⇒ Types::GetTriggersResponse
Gets all the triggers associated with a job.
.
-
#get_user_defined_function(options = {}) ⇒ Types::GetUserDefinedFunctionResponse
Retrieves a specified function definition from the Data Catalog.
.
-
#get_user_defined_functions(options = {}) ⇒ Types::GetUserDefinedFunctionsResponse
Retrieves multiple function definitions from the Data Catalog.
.
-
#get_workflow(options = {}) ⇒ Types::GetWorkflowResponse
Retrieves resource metadata for a workflow.
.
-
#get_workflow_run(options = {}) ⇒ Types::GetWorkflowRunResponse
Retrieves the metadata for a given workflow run.
-
#get_workflow_run_properties(options = {}) ⇒ Types::GetWorkflowRunPropertiesResponse
Retrieves the workflow run properties which were set during the run.
.
-
#get_workflow_runs(options = {}) ⇒ Types::GetWorkflowRunsResponse
Retrieves metadata for all runs of a given workflow.
.
-
#import_catalog_to_glue(options = {}) ⇒ Struct
Imports an existing Amazon Athena Data Catalog to AWS Glue
.
-
#list_crawlers(options = {}) ⇒ Types::ListCrawlersResponse
Retrieves the names of all crawler resources in this AWS account, or the resources with the specified tag.
-
#list_dev_endpoints(options = {}) ⇒ Types::ListDevEndpointsResponse
Retrieves the names of all
DevEndpoint
resources in this AWS account, or the resources with the specified tag. -
#list_jobs(options = {}) ⇒ Types::ListJobsResponse
Retrieves the names of all job resources in this AWS account, or the resources with the specified tag.
-
#list_ml_transforms(options = {}) ⇒ Types::ListMLTransformsResponse
Retrieves a sortable, filterable list of existing AWS Glue machine learning transforms in this AWS account, or the resources with the specified tag.
-
#list_registries(options = {}) ⇒ Types::ListRegistriesResponse
Returns a list of registries that you have created, with minimal registry information.
-
#list_schema_versions(options = {}) ⇒ Types::ListSchemaVersionsResponse
Returns a list of schema versions that you have created, with minimal information.
-
#list_schemas(options = {}) ⇒ Types::ListSchemasResponse
Returns a list of schemas with minimal details.
-
#list_triggers(options = {}) ⇒ Types::ListTriggersResponse
Retrieves the names of all trigger resources in this AWS account, or the resources with the specified tag.
-
#list_workflows(options = {}) ⇒ Types::ListWorkflowsResponse
Lists names of workflows created in the account.
.
-
#put_data_catalog_encryption_settings(options = {}) ⇒ Struct
Sets the security configuration for a specified catalog.
-
#put_resource_policy(options = {}) ⇒ Types::PutResourcePolicyResponse
Sets the Data Catalog resource policy for access control.
.
-
#put_schema_version_metadata(options = {}) ⇒ Types::PutSchemaVersionMetadataResponse
Puts the metadata key value pair for a specified schema version ID.
-
#put_workflow_run_properties(options = {}) ⇒ Struct
Puts the specified workflow run properties for the given workflow run.
-
#query_schema_version_metadata(options = {}) ⇒ Types::QuerySchemaVersionMetadataResponse
Queries for the schema version metadata information.
-
#register_schema_version(options = {}) ⇒ Types::RegisterSchemaVersionResponse
Adds a new version to the existing schema.
-
#remove_schema_version_metadata(options = {}) ⇒ Types::RemoveSchemaVersionMetadataResponse
Removes a key value pair from the schema version metadata for the specified schema version ID.
.
-
#reset_job_bookmark(options = {}) ⇒ Types::ResetJobBookmarkResponse
Resets a bookmark entry.
.
-
#resume_workflow_run(options = {}) ⇒ Types::ResumeWorkflowRunResponse
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run.
-
#search_tables(options = {}) ⇒ Types::SearchTablesResponse
Searches a set of tables based on properties in the table metadata as well as on the parent database.
-
#start_crawler(options = {}) ⇒ Struct
Starts a crawl using the specified crawler, regardless of what is scheduled.
-
#start_crawler_schedule(options = {}) ⇒ Struct
Changes the schedule state of the specified crawler to
SCHEDULED
, unless the crawler is already running or the schedule state is alreadySCHEDULED
..
-
#start_export_labels_task_run(options = {}) ⇒ Types::StartExportLabelsTaskRunResponse
Begins an asynchronous task to export all labeled data for a particular transform.
-
#start_import_labels_task_run(options = {}) ⇒ Types::StartImportLabelsTaskRunResponse
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality.
-
#start_job_run(options = {}) ⇒ Types::StartJobRunResponse
Starts a job run using a job definition.
.
-
#start_ml_evaluation_task_run(options = {}) ⇒ Types::StartMLEvaluationTaskRunResponse
Starts a task to estimate the quality of the transform.
-
#start_ml_labeling_set_generation_task_run(options = {}) ⇒ Types::StartMLLabelingSetGenerationTaskRunResponse
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
When the
StartMLLabelingSetGenerationTaskRun
finishes, AWS Glue will have generated a "labeling set" or a set of questions for humans to answer.In the case of the
FindMatches
transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?”After the labeling process is finished, you can upload your labels with a call to
StartImportLabelsTaskRun
. -
#start_trigger(options = {}) ⇒ Types::StartTriggerResponse
Starts an existing trigger.
-
#start_workflow_run(options = {}) ⇒ Types::StartWorkflowRunResponse
Starts a new run of the specified workflow.
.
-
#stop_crawler(options = {}) ⇒ Struct
If the specified crawler is running, stops the crawl.
.
-
#stop_crawler_schedule(options = {}) ⇒ Struct
Sets the schedule state of the specified crawler to
NOT_SCHEDULED
, but does not stop the crawler if it is already running..
-
#stop_trigger(options = {}) ⇒ Types::StopTriggerResponse
Stops a specified trigger.
.
-
#stop_workflow_run(options = {}) ⇒ Struct
Stops the execution of the specified workflow run.
.
-
#tag_resource(options = {}) ⇒ Struct
Adds tags to a resource.
-
#untag_resource(options = {}) ⇒ Struct
Removes tags from a resource.
.
-
#update_classifier(options = {}) ⇒ Struct
Modifies an existing classifier (a
GrokClassifier
, anXMLClassifier
, aJsonClassifier
, or aCsvClassifier
, depending on which field is present)..
-
#update_column_statistics_for_partition(options = {}) ⇒ Types::UpdateColumnStatisticsForPartitionResponse
Creates or updates partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is
.UpdatePartition
. -
#update_column_statistics_for_table(options = {}) ⇒ Types::UpdateColumnStatisticsForTableResponse
Creates or updates table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is
.UpdateTable
. -
#update_connection(options = {}) ⇒ Struct
Updates a connection definition in the Data Catalog.
.
-
#update_crawler(options = {}) ⇒ Struct
Updates a crawler.
-
#update_crawler_schedule(options = {}) ⇒ Struct
Updates the schedule of a crawler using a
cron
expression. -
#update_database(options = {}) ⇒ Struct
Updates an existing database definition in a Data Catalog.
.
-
#update_dev_endpoint(options = {}) ⇒ Struct
Updates a specified development endpoint.
.
-
#update_job(options = {}) ⇒ Types::UpdateJobResponse
Updates an existing job definition.
.
-
#update_ml_transform(options = {}) ⇒ Types::UpdateMLTransformResponse
Updates an existing machine learning transform.
-
#update_partition(options = {}) ⇒ Struct
Updates a partition.
.
-
#update_registry(options = {}) ⇒ Types::UpdateRegistryResponse
Updates an existing registry which is used to hold a collection of schemas.
-
#update_schema(options = {}) ⇒ Types::UpdateSchemaResponse
Updates the description, compatibility setting, or version checkpoint for a schema set.
For updating the compatibility setting, the call will not validate compatibility for the entire set of schema versions with the new compatibility setting.
-
#update_table(options = {}) ⇒ Struct
Updates a metadata table in the Data Catalog.
.
-
#update_trigger(options = {}) ⇒ Types::UpdateTriggerResponse
Updates a trigger definition.
.
-
#update_user_defined_function(options = {}) ⇒ Struct
Updates an existing function definition in the Data Catalog.
.
-
#update_workflow(options = {}) ⇒ Types::UpdateWorkflowResponse
Updates an existing workflow.
.
Instance Method Summary collapse
-
#wait_until(waiter_name, params = {}) {|waiter| ... } ⇒ Boolean
Waiters polls an API operation until a resource enters a desired state.
-
#waiter_names ⇒ Array<Symbol>
Returns the list of supported waiters.
Methods inherited from Seahorse::Client::Base
add_plugin, api, #build_request, clear_plugins, define, new, #operation, #operation_names, plugins, remove_plugin, set_api, set_plugins
Methods included from Seahorse::Client::HandlerBuilder
#handle, #handle_request, #handle_response
Constructor Details
#initialize(options = {}) ⇒ Aws::Glue::Client
Constructs an API client.
Instance Method Details
#batch_create_partition(options = {}) ⇒ Types::BatchCreatePartitionResponse
Creates one or more partitions in a batch operation.
#batch_delete_connection(options = {}) ⇒ Types::BatchDeleteConnectionResponse
Deletes a list of connection definitions from the Data Catalog.
#batch_delete_partition(options = {}) ⇒ Types::BatchDeletePartitionResponse
Deletes one or more partitions in a batch operation.
#batch_delete_table(options = {}) ⇒ Types::BatchDeleteTableResponse
Deletes multiple tables at once.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. AWS Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service. To ensure the immediate deletion of all related resources, before calling BatchDeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
#batch_delete_table_version(options = {}) ⇒ Types::BatchDeleteTableVersionResponse
Deletes a specified batch of versions of a table.
#batch_get_crawlers(options = {}) ⇒ Types::BatchGetCrawlersResponse
Returns a list of resource metadata for a given list of crawler names. After calling the ListCrawlers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
#batch_get_dev_endpoints(options = {}) ⇒ Types::BatchGetDevEndpointsResponse
Returns a list of resource metadata for a given list of development endpoint names. After calling the ListDevEndpoints
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
#batch_get_jobs(options = {}) ⇒ Types::BatchGetJobsResponse
Returns a list of resource metadata for a given list of job names. After calling the ListJobs
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
#batch_get_partition(options = {}) ⇒ Types::BatchGetPartitionResponse
Retrieves partitions in a batch request.
#batch_get_triggers(options = {}) ⇒ Types::BatchGetTriggersResponse
Returns a list of resource metadata for a given list of trigger names. After calling the ListTriggers
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
#batch_get_workflows(options = {}) ⇒ Types::BatchGetWorkflowsResponse
Returns a list of resource metadata for a given list of workflow names. After calling the ListWorkflows
operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.
#batch_stop_job_run(options = {}) ⇒ Types::BatchStopJobRunResponse
Stops one or more job runs for a specified job definition.
#batch_update_partition(options = {}) ⇒ Types::BatchUpdatePartitionResponse
Updates one or more partitions in a batch operation.
#cancel_ml_task_run(options = {}) ⇒ Types::CancelMLTaskRunResponse
Cancels (stops) a task run. Machine learning task runs are asynchronous tasks that AWS Glue runs on your behalf as part of various machine learning workflows. You can cancel a machine learning task run at any time by calling CancelMLTaskRun
with a task run's parent transform's TransformID
and the task run's TaskRunId
.
#check_schema_version_validity(options = {}) ⇒ Types::CheckSchemaVersionValidityResponse
Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat
as the format. Since it does not take a schema set name, no compatibility checks are performed.
#create_classifier(options = {}) ⇒ Struct
Creates a classifier in the user's account. This can be a GrokClassifier
, an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on which field of the request is present.
#create_connection(options = {}) ⇒ Struct
Creates a connection definition in the Data Catalog.
#create_crawler(options = {}) ⇒ Struct
Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets
field, the jdbcTargets
field, or the DynamoDBTargets
field.
#create_database(options = {}) ⇒ Struct
Creates a new database in a Data Catalog.
#create_dev_endpoint(options = {}) ⇒ Types::CreateDevEndpointResponse
Creates a new development endpoint.
#create_job(options = {}) ⇒ Types::CreateJobResponse
Creates a new job definition.
#create_ml_transform(options = {}) ⇒ Types::CreateMLTransformResponse
Creates an AWS Glue machine learning transform. This operation creates the transform and all the necessary parameters to train it.
Call this operation as the first step in the process of using a machine learning transform (such as the FindMatches
transform) for deduplicating data. You can provide an optional Description
, in addition to the parameters that you want to use for your algorithm.
You must also specify certain parameters for the tasks that AWS Glue runs on your behalf as part of learning from your data and creating a high-quality machine learning transform. These parameters include Role
, and optionally, AllocatedCapacity
, Timeout
, and MaxRetries
. For more information, see Jobs.
#create_partition(options = {}) ⇒ Struct
Creates a new partition.
#create_registry(options = {}) ⇒ Types::CreateRegistryResponse
Creates a new registry which may be used to hold a collection of schemas.
#create_schema(options = {}) ⇒ Types::CreateSchemaResponse
Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.
When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion
API is used.
When this API is called without a RegistryId
, this will create an entry for a "default-registry" in the registry database tables, if it is not already present.
#create_script(options = {}) ⇒ Types::CreateScriptResponse
Transforms a directed acyclic graph (DAG) into code.
#create_security_configuration(options = {}) ⇒ Types::CreateSecurityConfigurationResponse
Creates a new security configuration. A security configuration is a set of security properties that can be used by AWS Glue. You can use a security configuration to encrypt data at rest. For information about using security configurations in AWS Glue, see Encrypting Data Written by Crawlers, Jobs, and Development Endpoints.
#create_table(options = {}) ⇒ Struct
Creates a new table definition in the Data Catalog.
#create_trigger(options = {}) ⇒ Types::CreateTriggerResponse
Creates a new trigger.
#create_user_defined_function(options = {}) ⇒ Struct
Creates a new function definition in the Data Catalog.
#create_workflow(options = {}) ⇒ Types::CreateWorkflowResponse
Creates a new workflow.
#delete_classifier(options = {}) ⇒ Struct
Removes a classifier from the Data Catalog.
#delete_column_statistics_for_partition(options = {}) ⇒ Struct
Delete the partition column statistics of a column.
The Identity and Access Management (IAM) permission required for this operation is DeletePartition
.
#delete_column_statistics_for_table(options = {}) ⇒ Struct
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is DeleteTable
.
#delete_connection(options = {}) ⇒ Struct
Deletes a connection from the Data Catalog.
#delete_crawler(options = {}) ⇒ Struct
Removes a specified crawler from the AWS Glue Data Catalog, unless the crawler state is RUNNING
.
#delete_database(options = {}) ⇒ Struct
Removes a specified database from a Data Catalog.
After completing this operation, you no longer have access to the tables (and all table versions and partitions that might belong to the tables) and the user-defined functions in the deleted database. AWS Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service. To ensure the immediate deletion of all related resources, before calling DeleteDatabase
, use DeleteTableVersion
or BatchDeleteTableVersion
, DeletePartition
or BatchDeletePartition
, DeleteUserDefinedFunction
, and DeleteTable
or BatchDeleteTable
, to delete any resources that belong to the database.
#delete_dev_endpoint(options = {}) ⇒ Struct
Deletes a specified development endpoint.
#delete_job(options = {}) ⇒ Types::DeleteJobResponse
Deletes a specified job definition. If the job definition is not found, no exception is thrown.
#delete_ml_transform(options = {}) ⇒ Types::DeleteMLTransformResponse
Deletes an AWS Glue machine learning transform. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by AWS Glue. If you no longer need a transform, you can delete it by calling DeleteMLTransforms
. However, any AWS Glue jobs that still reference the deleted transform will no longer succeed.
#delete_partition(options = {}) ⇒ Struct
Deletes a specified partition.
#delete_registry(options = {}) ⇒ Types::DeleteRegistryResponse
Delete the entire registry including schema and all of its versions. To get the status of the delete operation, you can call the GetRegistry
API after the asynchronous call. Deleting a registry will disable all online operations for the registry such as the UpdateRegistry
, CreateSchema
, UpdateSchema
, and RegisterSchemaVersion
APIs.
#delete_resource_policy(options = {}) ⇒ Struct
Deletes a specified policy.
#delete_schema(options = {}) ⇒ Types::DeleteSchemaResponse
Deletes the entire schema set, including the schema set and all of its versions. To get the status of the delete operation, you can call GetSchema
API after the asynchronous call. Deleting a registry will disable all online operations for the schema, such as the GetSchemaByDefinition
, and RegisterSchemaVersion
APIs.
#delete_schema_versions(options = {}) ⇒ Types::DeleteSchemaVersionsResponse
Remove versions from the specified schema. A version number or range may be supplied. If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned. Calling the GetSchemaVersions
API after this call will list the status of the deleted versions.
When the range of version numbers contain check pointed version, the API will return a 409 conflict and will not proceed with the deletion. You have to remove the checkpoint first using the DeleteSchemaCheckpoint
API before using this API.
You cannot use the DeleteSchemaVersions
API to delete the first schema version in the schema set. The first schema version can only be deleted by the DeleteSchema
API. This operation will also delete the attached SchemaVersionMetadata
under the schema versions. Hard deletes will be enforced on the database.
If the compatibility mode forbids deleting of a version that is necessary, such as BACKWARDS_FULL, an error is returned.
#delete_security_configuration(options = {}) ⇒ Struct
Deletes a specified security configuration.
#delete_table(options = {}) ⇒ Struct
Removes a table definition from the Data Catalog.
After completing this operation, you no longer have access to the table versions and partitions that belong to the deleted table. AWS Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service. To ensure the immediate deletion of all related resources, before calling DeleteTable
, use DeleteTableVersion
or BatchDeleteTableVersion
, and DeletePartition
or BatchDeletePartition
, to delete any resources that belong to the table.
#delete_table_version(options = {}) ⇒ Struct
Deletes a specified version of a table.
#delete_trigger(options = {}) ⇒ Types::DeleteTriggerResponse
Deletes a specified trigger. If the trigger is not found, no exception is thrown.
#delete_user_defined_function(options = {}) ⇒ Struct
Deletes an existing function definition from the Data Catalog.
#delete_workflow(options = {}) ⇒ Types::DeleteWorkflowResponse
Deletes a workflow.
#get_catalog_import_status(options = {}) ⇒ Types::GetCatalogImportStatusResponse
Retrieves the status of a migration operation.
#get_classifier(options = {}) ⇒ Types::GetClassifierResponse
Retrieve a classifier by name.
#get_classifiers(options = {}) ⇒ Types::GetClassifiersResponse
Lists all classifier objects in the Data Catalog.
#get_column_statistics_for_partition(options = {}) ⇒ Types::GetColumnStatisticsForPartitionResponse
Retrieves partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetPartition
.
#get_column_statistics_for_table(options = {}) ⇒ Types::GetColumnStatisticsForTableResponse
Retrieves table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is GetTable
.
#get_connection(options = {}) ⇒ Types::GetConnectionResponse
Retrieves a connection definition from the Data Catalog.
#get_connections(options = {}) ⇒ Types::GetConnectionsResponse
Retrieves a list of connection definitions from the Data Catalog.
#get_crawler(options = {}) ⇒ Types::GetCrawlerResponse
Retrieves metadata for a specified crawler.
#get_crawler_metrics(options = {}) ⇒ Types::GetCrawlerMetricsResponse
Retrieves metrics about specified crawlers.
#get_crawlers(options = {}) ⇒ Types::GetCrawlersResponse
Retrieves metadata for all crawlers defined in the customer account.
#get_data_catalog_encryption_settings(options = {}) ⇒ Types::GetDataCatalogEncryptionSettingsResponse
Retrieves the security configuration for a specified catalog.
#get_database(options = {}) ⇒ Types::GetDatabaseResponse
Retrieves the definition of a specified database.
#get_databases(options = {}) ⇒ Types::GetDatabasesResponse
Retrieves all databases defined in a given Data Catalog.
#get_dataflow_graph(options = {}) ⇒ Types::GetDataflowGraphResponse
Transforms a Python script into a directed acyclic graph (DAG).
#get_dev_endpoint(options = {}) ⇒ Types::GetDevEndpointResponse
Retrieves information about a specified development endpoint.
When you create a development endpoint in a virtual private cloud (VPC), AWS Glue returns only a private IP address, and the public IP address field is not populated. When you create a non-VPC development endpoint, AWS Glue returns only a public IP address.
#get_dev_endpoints(options = {}) ⇒ Types::GetDevEndpointsResponse
Retrieves all the development endpoints in this AWS account.
When you create a development endpoint in a virtual private cloud (VPC), AWS Glue returns only a private IP address and the public IP address field is not populated. When you create a non-VPC development endpoint, AWS Glue returns only a public IP address.
#get_job(options = {}) ⇒ Types::GetJobResponse
Retrieves an existing job definition.
#get_job_bookmark(options = {}) ⇒ Types::GetJobBookmarkResponse
Returns information on a job bookmark entry.
#get_job_run(options = {}) ⇒ Types::GetJobRunResponse
Retrieves the metadata for a given job run.
#get_job_runs(options = {}) ⇒ Types::GetJobRunsResponse
Retrieves metadata for all runs of a given job definition.
#get_jobs(options = {}) ⇒ Types::GetJobsResponse
Retrieves all current job definitions.
#get_mapping(options = {}) ⇒ Types::GetMappingResponse
Creates mappings.
#get_ml_task_run(options = {}) ⇒ Types::GetMLTaskRunResponse
Gets details for a specific task run on a machine learning transform. Machine learning task runs are asynchronous tasks that AWS Glue runs on your behalf as part of various machine learning workflows. You can check the stats of any task run by calling GetMLTaskRun
with the TaskRunID
and its parent transform's TransformID
.
#get_ml_task_runs(options = {}) ⇒ Types::GetMLTaskRunsResponse
Gets a list of runs for a machine learning transform. Machine learning task runs are asynchronous tasks that AWS Glue runs on your behalf as part of various machine learning workflows. You can get a sortable, filterable list of machine learning task runs by calling GetMLTaskRuns
with their parent transform's TransformID
and other optional parameters as documented in this section.
This operation returns a list of historic runs and must be paginated.
#get_ml_transform(options = {}) ⇒ Types::GetMLTransformResponse
Gets an AWS Glue machine learning transform artifact and all its corresponding metadata. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by AWS Glue. You can retrieve their metadata by calling GetMLTransform
.
#get_ml_transforms(options = {}) ⇒ Types::GetMLTransformsResponse
Gets a sortable, filterable list of existing AWS Glue machine learning transforms. Machine learning transforms are a special type of transform that use machine learning to learn the details of the transformation to be performed by learning from examples provided by humans. These transformations are then saved by AWS Glue, and you can retrieve their metadata by calling GetMLTransforms
.
#get_partition(options = {}) ⇒ Types::GetPartitionResponse
Retrieves information about a specified partition.
#get_partition_indexes(options = {}) ⇒ Types::GetPartitionIndexesResponse
Retrieves the partition indexes associated with a table.
#get_partitions(options = {}) ⇒ Types::GetPartitionsResponse
Retrieves information about the partitions in a table.
#get_plan(options = {}) ⇒ Types::GetPlanResponse
Gets code to perform a specified mapping.
#get_registry(options = {}) ⇒ Types::GetRegistryResponse
Describes the specified registry in detail.
#get_resource_policies(options = {}) ⇒ Types::GetResourcePoliciesResponse
Retrieves the security configurations for the resource policies set on individual resources, and also the account-level policy.
This operation also returns the Data Catalog resource policy. However, if you enabled metadata encryption in Data Catalog settings, and you do not have permission on the AWS KMS key, the operation can't return the Data Catalog resource policy.
#get_resource_policy(options = {}) ⇒ Types::GetResourcePolicyResponse
Retrieves a specified resource policy.
#get_schema(options = {}) ⇒ Types::GetSchemaResponse
Describes the specified schema in detail.
#get_schema_by_definition(options = {}) ⇒ Types::GetSchemaByDefinitionResponse
Retrieves a schema by the SchemaDefinition
. The schema definition is sent to the Schema Registry, canonicalized, and hashed. If the hash is matched within the scope of the SchemaName
or ARN (or the default registry, if none is supplied), that schema’s metadata is returned. Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted
statuses will not be included in the results.
#get_schema_version(options = {}) ⇒ Types::GetSchemaVersionResponse
Get the specified schema by its unique ID assigned when a version of the schema is created or registered. Schema versions in Deleted status will not be included in the results.
#get_schema_versions_diff(options = {}) ⇒ Types::GetSchemaVersionsDiffResponse
Fetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry.
This API allows you to compare two schema versions between two schema definitions under the same schema.
#get_security_configuration(options = {}) ⇒ Types::GetSecurityConfigurationResponse
Retrieves a specified security configuration.
#get_security_configurations(options = {}) ⇒ Types::GetSecurityConfigurationsResponse
Retrieves a list of all security configurations.
#get_table(options = {}) ⇒ Types::GetTableResponse
Retrieves the Table
definition in a Data Catalog for a specified table.
#get_table_version(options = {}) ⇒ Types::GetTableVersionResponse
Retrieves a specified version of a table.
#get_table_versions(options = {}) ⇒ Types::GetTableVersionsResponse
Retrieves a list of strings that identify available versions of a specified table.
#get_tables(options = {}) ⇒ Types::GetTablesResponse
Retrieves the definitions of some or all of the tables in a given Database
.
#get_tags(options = {}) ⇒ Types::GetTagsResponse
Retrieves a list of tags associated with a resource.
#get_trigger(options = {}) ⇒ Types::GetTriggerResponse
Retrieves the definition of a trigger.
#get_triggers(options = {}) ⇒ Types::GetTriggersResponse
Gets all the triggers associated with a job.
#get_user_defined_function(options = {}) ⇒ Types::GetUserDefinedFunctionResponse
Retrieves a specified function definition from the Data Catalog.
#get_user_defined_functions(options = {}) ⇒ Types::GetUserDefinedFunctionsResponse
Retrieves multiple function definitions from the Data Catalog.
#get_workflow(options = {}) ⇒ Types::GetWorkflowResponse
Retrieves resource metadata for a workflow.
#get_workflow_run(options = {}) ⇒ Types::GetWorkflowRunResponse
Retrieves the metadata for a given workflow run.
#get_workflow_run_properties(options = {}) ⇒ Types::GetWorkflowRunPropertiesResponse
Retrieves the workflow run properties which were set during the run.
#get_workflow_runs(options = {}) ⇒ Types::GetWorkflowRunsResponse
Retrieves metadata for all runs of a given workflow.
#import_catalog_to_glue(options = {}) ⇒ Struct
Imports an existing Amazon Athena Data Catalog to AWS Glue
#list_crawlers(options = {}) ⇒ Types::ListCrawlersResponse
Retrieves the names of all crawler resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
#list_dev_endpoints(options = {}) ⇒ Types::ListDevEndpointsResponse
Retrieves the names of all DevEndpoint
resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
#list_jobs(options = {}) ⇒ Types::ListJobsResponse
Retrieves the names of all job resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
#list_ml_transforms(options = {}) ⇒ Types::ListMLTransformsResponse
Retrieves a sortable, filterable list of existing AWS Glue machine learning transforms in this AWS account, or the resources with the specified tag. This operation takes the optional Tags
field, which you can use as a filter of the responses so that tagged resources can be retrieved as a group. If you choose to use tag filtering, only resources with the tags are retrieved.
#list_registries(options = {}) ⇒ Types::ListRegistriesResponse
Returns a list of registries that you have created, with minimal registry information. Registries in the Deleting
status will not be included in the results. Empty results will be returned if there are no registries available.
#list_schema_versions(options = {}) ⇒ Types::ListSchemaVersionsResponse
Returns a list of schema versions that you have created, with minimal information. Schema versions in Deleted status will not be included in the results. Empty results will be returned if there are no schema versions available.
#list_schemas(options = {}) ⇒ Types::ListSchemasResponse
Returns a list of schemas with minimal details. Schemas in Deleting status will not be included in the results. Empty results will be returned if there are no schemas available.
When the RegistryId
is not provided, all the schemas across registries will be part of the API response.
#list_triggers(options = {}) ⇒ Types::ListTriggersResponse
Retrieves the names of all trigger resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.
This operation takes the optional Tags
field, which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag are retrieved.
#list_workflows(options = {}) ⇒ Types::ListWorkflowsResponse
Lists names of workflows created in the account.
#put_data_catalog_encryption_settings(options = {}) ⇒ Struct
Sets the security configuration for a specified catalog. After the configuration has been set, the specified encryption is applied to every catalog write thereafter.
#put_resource_policy(options = {}) ⇒ Types::PutResourcePolicyResponse
Sets the Data Catalog resource policy for access control.
#put_schema_version_metadata(options = {}) ⇒ Types::PutSchemaVersionMetadataResponse
Puts the metadata key value pair for a specified schema version ID. A maximum of 10 key value pairs will be allowed per schema version. They can be added over one or more calls.
#put_workflow_run_properties(options = {}) ⇒ Struct
Puts the specified workflow run properties for the given workflow run. If a property already exists for the specified run, then it overrides the value otherwise adds the property to existing properties.
#query_schema_version_metadata(options = {}) ⇒ Types::QuerySchemaVersionMetadataResponse
Queries for the schema version metadata information.
#register_schema_version(options = {}) ⇒ Types::RegisterSchemaVersionResponse
Adds a new version to the existing schema. Returns an error if new version of schema does not meet the compatibility requirements of the schema set. This API will not create a new schema set and will return a 404 error if the schema set is not already present in the Schema Registry.
If this is the first schema definition to be registered in the Schema Registry, this API will store the schema version and return immediately. Otherwise, this call has the potential to run longer than other operations due to compatibility modes. You can call the GetSchemaVersion
API with the SchemaVersionId
to check compatibility modes.
If the same schema definition is already stored in Schema Registry as a version, the schema ID of the existing schema is returned to the caller.
#remove_schema_version_metadata(options = {}) ⇒ Types::RemoveSchemaVersionMetadataResponse
Removes a key value pair from the schema version metadata for the specified schema version ID.
#reset_job_bookmark(options = {}) ⇒ Types::ResetJobBookmarkResponse
Resets a bookmark entry.
#resume_workflow_run(options = {}) ⇒ Types::ResumeWorkflowRunResponse
Restarts selected nodes of a previous partially completed workflow run and resumes the workflow run. The selected nodes and all nodes that are downstream from the selected nodes are run.
#search_tables(options = {}) ⇒ Types::SearchTablesResponse
Searches a set of tables based on properties in the table metadata as well as on the parent database. You can search against text or filter conditions.
You can only get tables that you have access to based on the security policies defined in Lake Formation. You need at least a read-only access to the table for it to be returned. If you do not have access to all the columns in the table, these columns will not be searched against when returning the list of tables back to you. If you have access to the columns but not the data in the columns, those columns and the associated metadata for those columns will be included in the search.
#start_crawler(options = {}) ⇒ Struct
Starts a crawl using the specified crawler, regardless of what is scheduled. If the crawler is already running, returns a CrawlerRunningException.
#start_crawler_schedule(options = {}) ⇒ Struct
Changes the schedule state of the specified crawler to SCHEDULED
, unless the crawler is already running or the schedule state is already SCHEDULED
.
#start_export_labels_task_run(options = {}) ⇒ Types::StartExportLabelsTaskRunResponse
Begins an asynchronous task to export all labeled data for a particular transform. This task is the only label-related API call that is not part of the typical active learning workflow. You typically use StartExportLabelsTaskRun
when you want to work with all of your existing labels at the same time, such as when you want to remove or change labels that were previously submitted as truth. This API operation accepts the TransformId
whose labels you want to export and an Amazon Simple Storage Service (Amazon S3) path to export the labels to. The operation returns a TaskRunId
. You can check on the status of your task run by calling the GetMLTaskRun
API.
#start_import_labels_task_run(options = {}) ⇒ Types::StartImportLabelsTaskRunResponse
Enables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality. This API operation is generally used as part of the active learning workflow that starts with the StartMLLabelingSetGenerationTaskRun
call and that ultimately results in improving the quality of your machine learning transform.
After the StartMLLabelingSetGenerationTaskRun
finishes, AWS Glue machine learning will have generated a series of questions for humans to answer. (Answering these questions is often called 'labeling' in the machine learning workflows). In the case of the FindMatches
transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?” After the labeling process is finished, users upload their answers/labels with a call to StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes, all future runs of the machine learning transform use the new and improved labels and perform a higher-quality transformation.
By default, StartMLLabelingSetGenerationTaskRun
continually learns from and combines all labels that you upload unless you set Replace
to true. If you set Replace
to true, StartImportLabelsTaskRun
deletes and forgets all previously uploaded labels and learns only from the exact set that you upload. Replacing labels can be helpful if you realize that you previously uploaded incorrect labels, and you believe that they are having a negative effect on your transform quality.
You can check on the status of your task run by calling the GetMLTaskRun
operation.
#start_job_run(options = {}) ⇒ Types::StartJobRunResponse
Starts a job run using a job definition.
#start_ml_evaluation_task_run(options = {}) ⇒ Types::StartMLEvaluationTaskRunResponse
Starts a task to estimate the quality of the transform.
When you provide label sets as examples of truth, AWS Glue machine learning uses some of those examples to learn from them. The rest of the labels are used as a test to estimate quality.
Returns a unique identifier for the run. You can call GetMLTaskRun
to get more information about the stats of the EvaluationTaskRun
.
#start_ml_labeling_set_generation_task_run(options = {}) ⇒ Types::StartMLLabelingSetGenerationTaskRunResponse
Starts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels.
When the StartMLLabelingSetGenerationTaskRun
finishes, AWS Glue will have generated a "labeling set" or a set of questions for humans to answer.
In the case of the FindMatches
transform, these questions are of the form, “What is the correct way to group these rows together into groups composed entirely of matching records?”
After the labeling process is finished, you can upload your labels with a call to StartImportLabelsTaskRun
. After StartImportLabelsTaskRun
finishes, all future runs of the machine learning transform will use the new and improved labels and perform a higher-quality transformation.
#start_trigger(options = {}) ⇒ Types::StartTriggerResponse
Starts an existing trigger. See Triggering Jobs for information about how different types of trigger are started.
#start_workflow_run(options = {}) ⇒ Types::StartWorkflowRunResponse
Starts a new run of the specified workflow.
#stop_crawler(options = {}) ⇒ Struct
If the specified crawler is running, stops the crawl.
#stop_crawler_schedule(options = {}) ⇒ Struct
Sets the schedule state of the specified crawler to NOT_SCHEDULED
, but does not stop the crawler if it is already running.
#stop_trigger(options = {}) ⇒ Types::StopTriggerResponse
Stops a specified trigger.
#stop_workflow_run(options = {}) ⇒ Struct
Stops the execution of the specified workflow run.
#tag_resource(options = {}) ⇒ Struct
Adds tags to a resource. A tag is a label you can assign to an AWS resource. In AWS Glue, you can tag only certain resources. For information about what resources you can tag, see AWS Tags in AWS Glue.
#untag_resource(options = {}) ⇒ Struct
Removes tags from a resource.
#update_classifier(options = {}) ⇒ Struct
Modifies an existing classifier (a GrokClassifier
, an XMLClassifier
, a JsonClassifier
, or a CsvClassifier
, depending on which field is present).
#update_column_statistics_for_partition(options = {}) ⇒ Types::UpdateColumnStatisticsForPartitionResponse
Creates or updates partition statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is UpdatePartition
.
#update_column_statistics_for_table(options = {}) ⇒ Types::UpdateColumnStatisticsForTableResponse
Creates or updates table statistics of columns.
The Identity and Access Management (IAM) permission required for this operation is UpdateTable
.
#update_connection(options = {}) ⇒ Struct
Updates a connection definition in the Data Catalog.
#update_crawler(options = {}) ⇒ Struct
Updates a crawler. If a crawler is running, you must stop it using StopCrawler
before updating it.
#update_crawler_schedule(options = {}) ⇒ Struct
Updates the schedule of a crawler using a cron
expression.
#update_database(options = {}) ⇒ Struct
Updates an existing database definition in a Data Catalog.
#update_dev_endpoint(options = {}) ⇒ Struct
Updates a specified development endpoint.
#update_job(options = {}) ⇒ Types::UpdateJobResponse
Updates an existing job definition.
#update_ml_transform(options = {}) ⇒ Types::UpdateMLTransformResponse
Updates an existing machine learning transform. Call this operation to tune the algorithm parameters to achieve better results.
After calling this operation, you can call the StartMLEvaluationTaskRun
operation to assess how well your new parameters achieved your goals (such as improving the quality of your machine learning transform, or making it more cost-effective).
#update_partition(options = {}) ⇒ Struct
Updates a partition.
#update_registry(options = {}) ⇒ Types::UpdateRegistryResponse
Updates an existing registry which is used to hold a collection of schemas. The updated properties relate to the registry, and do not modify any of the schemas within the registry.
#update_schema(options = {}) ⇒ Types::UpdateSchemaResponse
Updates the description, compatibility setting, or version checkpoint for a schema set.
For updating the compatibility setting, the call will not validate compatibility for the entire set of schema versions with the new compatibility setting. If the value for Compatibility
is provided, the VersionNumber
(a checkpoint) is also required. The API will validate the checkpoint version number for consistency.
If the value for the VersionNumber
(checkpoint) is provided, Compatibility
is optional and this can be used to set/reset a checkpoint for the schema.
This update will happen only if the schema is in the AVAILABLE state.
#update_table(options = {}) ⇒ Struct
Updates a metadata table in the Data Catalog.
#update_trigger(options = {}) ⇒ Types::UpdateTriggerResponse
Updates a trigger definition.
#update_user_defined_function(options = {}) ⇒ Struct
Updates an existing function definition in the Data Catalog.
#update_workflow(options = {}) ⇒ Types::UpdateWorkflowResponse
Updates an existing workflow.
#wait_until(waiter_name, params = {}) {|waiter| ... } ⇒ Boolean
Waiters polls an API operation until a resource enters a desired state.
Basic Usage
Waiters will poll until they are succesful, they fail by entering a terminal state, or until a maximum number of attempts are made.
# polls in a loop, sleeping between attempts client.waiter_until(waiter_name, params)
Configuration
You can configure the maximum number of polling attempts, and the delay (in seconds) between each polling attempt. You configure waiters by passing a block to #wait_until:
# poll for ~25 seconds
client.wait_until(...) do |w|
w.max_attempts = 5
w.delay = 5
end
Callbacks
You can be notified before each polling attempt and before each
delay. If you throw :success
or :failure
from these callbacks,
it will terminate the waiter.
started_at = Time.now
client.wait_until(...) do |w|
# disable max attempts
w.max_attempts = nil
# poll for 1 hour, instead of a number of attempts
w.before_wait do |attempts, response|
throw :failure if Time.now - started_at > 3600
end
end
Handling Errors
When a waiter is successful, it returns true
. When a waiter
fails, it raises an error. All errors raised extend from
Waiters::Errors::WaiterFailed.
begin
client.wait_until(...)
rescue Aws::Waiters::Errors::WaiterFailed
# resource did not enter the desired state in time
end
#waiter_names ⇒ Array<Symbol>
Returns the list of supported waiters. The following table lists the supported waiters and the client method they call:
Waiter Name | Client Method | Default Delay: | Default Max Attempts: |
---|