Prerequisites End-to-end example Secure logging Considerations

AWS Glue Data Catalog views

This topic describes how to create views in the AWS Glue Data Catalog. You can use views in the Data Catalog to access data in different data sources using the same schema.

By creating views in the Data Catalog, you can create a single common view schema and metadata object to use across engines such as Amazon Athena and Amazon EMR Spark. Doing so lets you use the same views across your data lakes and data warehouses to fit your use cases. Views in the Data Catalog are special in that they are categorized as definer views, where access permissions are defined by the user who created the view instead of the user querying the view. The following are some use cases and benefits of creating views in the Data Catalog:

Create a view that restricts data access based on the permissions the user needs. For example, you can use views in the Data Catalog to prevent employees who don’t work in the HR department from seeing personally identifiable information (PII).
Make sure that users can’t access incomplete records. By applying certain filters onto your view in the Data Catalog, you make sure that data records inside a view in the Data Catalog are always complete.
Data Catalog views have an included security benefit of making sure that the query definition used to create the view must complete to create the view. This security benefit means that views in the Data Catalog are not susceptible to SQL commands from malicious players.
Views in the Data Catalog support the same advantages as normal views, such as letting users access a view without making the underlying table available to users.

To create a view in the Data Catalog, you must have a Spectrum external table, an object that’s contained within a Lake Formation-managed datashare, or an Apache Iceberg table.

Definitions of Data Catalog views are stored in the AWS Glue Data Catalog. Use AWS Lake Formation to grant access through resource grants, column grants, or tag-based access controls. For more information about granting and revoking access in Lake Formation, see Granting and revoking permissions on Data Catalog resources.

Prerequisites

Before you can create a view in the Data Catalog, make sure that you have the following prerequisites completed:

Make sure that your IAM role has the following trust policy.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
               "Service": [
                    "glue.amazonaws.com",
                    "lakeformation.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

You also need the following pass role policy.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1",
            "Action": [
                "iam:PassRole"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [ 
                        "glue.amazonaws.com",
                        "lakeformation.amazonaws.com"
                    ]
                }
            }
        }
    ]
}

Finally, you also need the following permissions.
- Glue:GetDatabase
- Glue:GetDatabases
- Glue:CreateTable
- Glue:GetTable
- Glue:UpdateTable
- Glue:DeleteTable
- Glue:GetTables
- Glue:SearchTables
- Glue:BatchGetPartition
- Glue:GetPartitions
- Glue:GetPartition
- Glue:GetTableVersion
- Glue:GetTableVersions

End-to-end example

Start by creating an external schema based on your Data Catalog database.


CREATE EXTERNAL SCHEMA IF NOT EXISTS external_schema FROM DATA CATALOG DATABASE 'external_data_catalog_db' 
IAM_ROLE 'arn:aws:iam::123456789012:role/sample-role';

You can now create a Data Catalog view.


CREATE EXTERNAL PROTECTED VIEW external_schema.remote_view 
AS SELECT * FROM external_schema.remote_table;

You can then start querying your view.


SELECT * FROM external_schema.remote_view;

For more information about the SQL commands related to views in the Data Catalog, see CREATE EXTERNAL VIEW, ALTER EXTERNAL VIEW, and DROP EXTERNAL VIEW.

Secure logging

Redshift masks metadata logged in the Redshift system logs when a query references a multi-dialect glue view. Multi-dialect means that the view supports the SQL dialects of different query engines, such as Redshift and Amazon EMR. Data in the following tables is masked for all queries with the same query id. The following table lists system views and columns with secure logging applied.

System table	Sensitive columns
SYS_EXTERNAL_QUERY_DETAIL	Columns: source_type, total_partitions, qualified_partitions, scanned_files, returned_rows, returned_bytes, file_format, file_location, external_query_text, warning_message. For more information, see SYS_EXTERNAL_QUERY_DETAIL.
SYS_EXTERNAL_QUERY_ERROR	Columns: file_location, rowid, column_name, original_value, modified_value, trigger, action, action_value, error_code. For more information, see SYS_EXTERNAL_QUERY_ERROR.
SYS_QUERY_DETAIL	Columns: step_name, table_id, table_name, input_bytes, input_rows, output_bytes, output_rows, blocks_read, blocks_write, local_read_IO, remote_read_IO, spilled_block_local_disk, spilled_block_remote_disk. For more information, see SYS_QUERY_DETAIL.
SYS_QUERY_HISTORY	Columns: returned_rows, returned_bytes. For more information, see SYS_QUERY_HISTORY.
STL_AGGR	Columns: rows, bytes, tbl, type. For more information, see STL_AGGR.
STL_BCAST	Columns: rows, bytes, packets. For more information, see STL_BCAST.
STL_DDLTEXT	Columns: text. For more information, see STL_DDLTEXT.
STL_DELETE	Columns: rows, tbl. For more information, see STL_DELETE.
STL_DIST	Columns: rows, bytes, packets. For more information, see STL_DIST.
STL_EXPLAIN	Columns: plannode, info. For more information, see STL_EXPLAIN.
STL_HASH	Columns: rows, bytes, tbl, est_rows. For more information, see STL_HASH.
STL_HASHJOIN	Columns: rows, tbl, num_parts, join_type. For more information, see STL_HASHJOIN.
STL_INSERT	Columns: rows, tbl. For more information, see STL_INSERT.
STL_LIMIT	Columns: rows. For more information, see STL_LIMIT.
STL_MERGE	Columns: rows. For more information, see STL_MERGE.
STL_MERGEJOIN	Columns: rows, tbl. For more information, see STL_MERGEJOIN.
STL_NESTLOOP	Columns: rows, tbl. For more information, see STL_NESTLOOP.
STL_PARSE	Columns: rows. For more information, see STL_PARSE.
STL_PLAN_INFO	Columns: rows, bytes. For more information, see STL_PLAN_INFO.
STL_PROJECT	Columns: rows, tbl. For more information, see STL_PROJECT.
STL_QUERY	Columns: querytxt. For more information, see STL_QUERY.
STL_QUERY_METRICS	Columns: max_rows, rows, max_blocks_read, blocks_read, max_blocks_to_disk, blocks_to_disk, max_query_scan_size, query_scan_size. For more information, see STL_QUERY_METRICS.
STL_QUERYTEXT	Columns: text. For more information, see STL_QUERYTEXT.
STL_RETURN	Columns: rows, bytes. For more information, see STL_RETURN.
STL_SAVE	Columns: rows, bytes, tbl. For more information, see STL_SAVE.
STL_SCAN	Columns: rows, bytes, fetches, type, tbl, rows_pre_filter, perm_table_name, scanned_mega_value. For more information, see STL_SCAN.
STL_SORT	Columns: rows, bytes, tbl. For more information, see STL_SORT.
STL_TR_CONFLICT	Columns: table_id. For more information, see STL_TR_CONFLICT.
STL_UNDONE	Columns: table_id. For more information, see STL_UNDONE.
STL_UNIQUE	Columns: rows, type, bytes. For more information, see STL_UNIQUE.
STL_UTILITYTEXT	Columns: text. For more information, see STL_UTILITYTEXT.
STL_WINDOW	Columns: rows. For more information, see STL_WINDOW.
STV_BLOCKLIST	Columns: col, tbl, num_values, minvalue, maxvalue. For more information, see STV_BLOCKLIST.
STV_EXEC_STATE	Columns: rows, bytes, label. For more information, see STV_EXEC_STATE.
STV_LOCKS	Columns: table_id. For more information, see STV_LOCKS.
STV_QUERY_METRICS	Columns: rows, max_rows, blocks_read, max_blocks_read, max_blocks_to_disk, blocks_to_disk, max_query_scan_size, query_scan_size. For more information, see STV_QUERY_METRICS.
STV_STARTUP_RECOVERY_STATE	Columns: table_id, table_name. For more information, see STV_STARTUP_RECOVERY_STATE.
STV_TBL_PERM	Columns: id, rows, sorted_rows, temp, block_count, query_scan_size. For more information, see STV_TBL_PERM.
STV_TBL_TRANS	Columns: id, rows, size. For more information, see STV_TBL_TRANS.
SVCS_EXPLAIN	Columns: plannode, info. For more information, see SVCS_EXPLAIN.
SVCS_PLAN_INFO	Columns: rows, bytes. For more information, see SVCS_PLAN_INFO.
SVCS_QUERY_SUMMARY	Columns: step, rows, bytes, rate_row, rate_byte, label, rows_pre_filter. For more information, see SVCS_QUERY_SUMMARY.
SVCS_S3LIST	Columns: bucket, prefix, max_file_size, avg_file_size. For more information, see SVCS_QUERY_SUMMARY.
SVCS_S3LOG	Columns: message. For more information, see SVCS_QUERY_SUMMARY.
SVCS_S3PARTITION_SUMMARY	Columns: total_partitions, qualified_partitions, min_assigned_partitions, max_assigned_partitions, avg_assigned_partitions. For more information, see SVCS_S3PARTITION_SUMMARY.
SVCS_S3QUERY_SUMMARY	Columns: external_table_name, file_format, s3_scanned_rows, s3_scanned_bytes, s3query_returned_rows, s3query_returned_bytes. For more information, see SVCS_S3QUERY_SUMMARY.
SVL_QUERY_METRICS	Columns: step_label, scan_row_count, join_row_count, nested_loop_join_row_count, return_row_count, spectrum_scan_row_count, spectrum_scan_size_mb. For more information, see SVL_QUERY_METRICS.
SVL_QUERY_METRICS_SUMMARY	Columns: step_label, scan_row_count, join_row_count, nested_loop_join_row_count, return_row_count, spectrum_scan_row_count, spectrum_scan_size_mb. For more information, see SVL_QUERY_METRICS_SUMMARY.
SVL_QUERY_REPORT	Columns: rows, bytes, label, rows_pre_filter. For more information, see SVL_QUERY_REPORT.
SVL_QUERY_SUMMARY	Columns: rows, bytes, rows_pre_filter. For more information, see SVL_QUERY_SUMMARY.
SVL_S3LIST	Columns: bucket, prefix, max_file_size, avg_file_size. For more information, see SVL_S3LIST.
SVL_S3LOG	Columns: message. For more information, see SVL_S3LOG.
SVL_S3PARTITION	Columns: rows, bytes, label, rows_pre_filter. For more information, see SVL_S3PARTITION.
SVL_S3PARTITION_SUMMARY	Columns: total_partitions, qualified_partitions, min_assigned_partitions, max_assigned_partitions, avg_assigned_partitions. For more information, see SVL_S3PARTITION_SUMMARY.
SVL_S3QUERY	Columns: external_table_name, file_format, s3_scanned_rows, s3_scanned_bytes, s3query_returned_rows, s3query_returned_bytes. For more information, see SVL_S3QUERY.
SVL_S3QUERY_SUMMARY	Columns: external_table_name, file_format, s3_scanned_rows, s3_scanned_bytes, s3query_returned_rows, s3query_returned_bytes. For more information, see SVL_S3QUERY_SUMMARY.
SVL_S3RETRIES	Columns: file_size, location, message. For more information, see SVL_S3RETRIES.
SVL_SPECTRUM_SCAN_ERROR	Columns: location, rowid, colname, original_value, modified_value. For more information, see SVL_SPECTRUM_SCAN_ERROR.
SVL_STATEMENTTEXT	Columns: type, text. For more information, see SVL_STATEMENTTEXT.
SVL_STORED_PROC_CALL	Columns: querytxt. For more information, see SVL_STORED_PROC_CALL.
SVL_STORED_PROC_MESSAGES	Columns: querytext. For more information, see SVL_STORED_PROC_MESSAGES.
SVL_UDF_LOG	Columns: funcname. For more information, see SVL_UDF_LOG.
SVV_DISKUSAGE	Columns: name, col, tbl, blocknum, num_values, minvalue, maxvalue. For more information, see SVV_DISKUSAGE.
SVV_QUERY_STATE	Columns: rows, bytes, label. For more information, see SVV_QUERY_STATE.
SVV_TABLE_INFO	Columns: table_id, table. For more information, see SVV_TABLE_INFO.
SVV_TRANSACTIONS	Columns: relation. For more information, see SVV_TRANSACTIONS.

Considerations and limitations

The following are considerations and limitations that apply to views created in the Data Catalog.

You can’t create a Data Catalog view that is based off of another view.
You can only have 10 base tables in a Data Catalog view.
The definer of the view must have full SELECT GRANTABLE permissions on the base tables.
Views can only contain Lake Formation objects and built-ins. The following objects are not permitted inside of a view.
- System tables
- User-defined functions (UDFs)
- Redshift tables, views, materialized views, and late binding views that aren’t in a Lake Formation managed data share.
Views can’t contain nested Redshift Spectrum tables.
AWS Glue representations of the base objects of a view must be in the same AWS account and Region as the view.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Electric vehicle station-data streaming ingestion tutorial, using Kinesis

Querying spatial data