Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Column statistics API

Focus mode
Column statistics API - AWS Glue

The column statistics API describes AWS Glue APIs for returning statistics on columns in a table.

Data types

ColumnStatisticsTaskRun structure

The object that shows the details of the column stats run.

Fields
  • CustomerId – UTF-8 string, not more than 12 bytes long.

    The AWS account ID.

  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

  • DatabaseName – UTF-8 string.

    The database where the table resides.

  • TableName – UTF-8 string.

    The name of the table for which column statistics is generated.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names. If none is supplied, all column names for the table will be used by default.

  • CatalogID – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • Role – UTF-8 string.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • SecurityConfiguration – UTF-8 string, not more than 128 bytes long.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

  • NumberOfWorkers – Number (integer), at least 1.

    The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

  • WorkerType – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The type of workers being used for generating stats. The default is g.1x.

  • ComputationType – UTF-8 string (valid values: FULL).

    The type of column statistics computation.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | SUCCEEDED | FAILED | STOPPED).

    The status of the task run.

  • CreationTime – Timestamp.

    The time that this task was created.

  • LastUpdated – Timestamp.

    The last point in time when this task was modified.

  • StartTime – Timestamp.

    The start time of the task.

  • EndTime – Timestamp.

    The end time of the task.

  • ErrorMessage – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    The error message for the job.

  • DPUSeconds – Number (double), not more than None.

    The calculated DPU usage in seconds for all autoscaled workers.

ColumnStatisticsTaskSettings structure

The settings for a column statistics task.

Fields
  • DatabaseName – UTF-8 string.

    The name of the database where the table resides.

  • TableName – UTF-8 string.

    The name of the table for which to generate column statistics.

  • Schedule – A Schedule object.

    A schedule for running the column statistics, specified in CRON syntax.

  • ColumnNameList – An array of UTF-8 strings.

    A list of column names for which to run statistics.

  • CatalogID – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog in which the database resides.

  • Role – UTF-8 string.

    The role used for running the column statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of data to sample.

  • SecurityConfiguration – UTF-8 string, not more than 128 bytes long.

    Name of the security configuration that is used to encrypt CloudWatch logs.

  • ScheduleType – UTF-8 string (valid values: CRON | AUTO).

    The type of schedule for a column statistics task. Possible values may be CRON or AUTO.

  • SettingSource – UTF-8 string (valid values: CATALOG | TABLE).

    The source of setting the column statistics task. Possible values may be CATALOG or TABLE.

  • LastExecutionAttempt – An ExecutionAttempt object.

    The last ExecutionAttempt for the column statistics task run.

ExecutionAttempt structure

A run attempt for a column statistics task run.

Fields
  • Status – UTF-8 string (valid values: FAILED | STARTED).

    The status of the last column statistics task run.

  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    A task run ID for the last column statistics task run.

  • ExecutionTimestamp – Timestamp.

    A timestamp when the last column statistics task run occurred.

  • ErrorMessage – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    An error message associated with the last column statistics task run.

Operations

StartColumnStatisticsTaskRun action (Python: start_column_statistics_task_run)

Starts a column statistics task run, for a specified table and columns.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table to generate statistics.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

  • RoleRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • CatalogID – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table reside. If none is supplied, the AWS account ID is used by default.

  • SecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

Response
  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the column statistics task run.

Errors
  • AccessDeniedException

  • EntityNotFoundException

  • ColumnStatisticsTaskRunningException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • InvalidInputException

GetColumnStatisticsTaskRun action (Python: get_column_statistics_task_run)

Get the associated metadata/information for a task run, given a task run ID.

Request
  • ColumnStatisticsTaskRunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

Response
  • ColumnStatisticsTaskRun – A ColumnStatisticsTaskRun object.

    A ColumnStatisticsTaskRun object representing the details of the column stats run.

Errors
  • EntityNotFoundException

  • OperationTimeoutException

  • InvalidInputException

GetColumnStatisticsTaskRuns action (Python: get_column_statistics_task_runs)

Retrieves information about all runs associated with the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRuns – An array of ColumnStatisticsTaskRun objects.

    A list of column statistics task runs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task runs have yet been returned.

Errors
  • OperationTimeoutException

ListColumnStatisticsTaskRuns action (Python: list_column_statistics_task_runs)

List all task runs for a particular account.

Request
  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRunIds – An array of UTF-8 strings, not more than 100 strings.

    A list of column statistics task run IDs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task run IDs have yet been returned.

Errors
  • OperationTimeoutException

StopColumnStatisticsTaskRun action (Python: stop_column_statistics_task_run)

Stops a task run for the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • ColumnStatisticsTaskNotRunningException

  • ColumnStatisticsTaskStoppingException

  • OperationTimeoutException

CreateColumnStatisticsTaskSettings action (Python: create_column_statistics_task_settings)

Creates settings for a column statistics task.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to generate column statistics.

  • RoleRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The role used for running the column statistics.

  • Schedule – UTF-8 string.

    A schedule for running the column statistics, specified in CRON syntax.

  • ColumnNameList – An array of UTF-8 strings.

    A list of column names for which to run statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of data to sample.

  • CatalogID – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog in which the database resides.

  • SecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the security configuration that is used to encrypt CloudWatch logs.

  • Tags – A map array of key-value pairs, not more than 50 pairs.

    Each key is a UTF-8 string, not less than 1 or more than 128 bytes long.

    Each value is a UTF-8 string, not more than 256 bytes long.

    A map of tags.

Response
  • No Response parameters.

Errors
  • AlreadyExistsException

  • AccessDeniedException

  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • ColumnStatisticsTaskRunningException

UpdateColumnStatisticsTaskSettings action (Python: update_column_statistics_task_settings)

Updates settings for a column statistics task.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to generate column statistics.

  • Role – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The role used for running the column statistics.

  • Schedule – UTF-8 string.

    A schedule for running the column statistics, specified in CRON syntax.

  • ColumnNameList – An array of UTF-8 strings.

    A list of column names for which to run statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of data to sample.

  • CatalogID – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog in which the database resides.

  • SecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the security configuration that is used to encrypt CloudWatch logs.

Response
  • No Response parameters.

Errors
  • AccessDeniedException

  • EntityNotFoundException

  • InvalidInputException

  • VersionMismatchException

  • OperationTimeoutException

GetColumnStatisticsTaskSettings action (Python: get_column_statistics_task_settings)

Gets settings for a column statistics task.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to retrieve column statistics.

Response
  • ColumnStatisticsTaskSettings – A ColumnStatisticsTaskSettings object.

    A ColumnStatisticsTaskSettings object representing the settings for the column statistics task.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

DeleteColumnStatisticsTaskSettings action (Python: delete_column_statistics_task_settings)

Deletes settings for a column statistics task.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to delete column statistics.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

StartColumnStatisticsTaskRunSchedule action (Python: start_column_statistics_task_run_schedule)

Starts a column statistics task run schedule.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to start a column statistic task run schedule.

Response
  • No Response parameters.

Errors
  • AccessDeniedException

  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

StopColumnStatisticsTaskRunSchedule action (Python: stop_column_statistics_task_run_schedule)

Stops a column statistics task run schedule.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table for which to stop a column statistic task run schedule.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

Exceptions

ColumnStatisticsTaskRunningException structure

An exception thrown when you try to start another job while running a column stats generation job.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskNotRunningException structure

An exception thrown when you try to stop a task run when there is no task running.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskStoppingException structure

An exception thrown when you try to stop a task run.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskAutoConcurrencyLimitException structure

An exception thrown when you have already reached the limit of concurrent auto statistics jobs.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

InvalidCatalogSettingException structure

An exception thrown when there is a problem with the catalog settings.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.