Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Run commands to define and use a database in your data warehouse
<a name="database-tasks"></a>

Both Redshift Serverless data warehouses and Amazon Redshift provisoned data warehouses contain databases. After you have launched your data warehouse, you can manage most database actions using SQL commands. With few exceptions, the functionality and syntax of SQL is the same for all Amazon Redshift databases. For details of SQL commands available with Amazon Redshift, see [SQL commands](https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html) in the *Amazon Redshift Database Developer Guide*.

When you create your data warehouse, in most scenarios, Amazon Redshift also creates the default `dev` database. After you connect to the `dev` database, you can create another database. 

The following sections walk through common database tasks when working with Amazon Redshift databases. The tasks begin with creating a database and if you continue to the last task you can delete all the resources you create by dropping the database.

The examples in this section assume the following:
+ You have created an Amazon Redshift data warehouse.
+ You have established a connection to the data warehouse from your SQL client tool, such as the Amazon Redshift query editor v2. For more information about query editor v2, see [Querying a database using the Amazon Redshift query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2.html) in the *Amazon Redshift Management Guide*.

**Topics**
+ [Connecting to Amazon Redshift data warehouses](#connection)
+ [Create a database](t_creating_database.md)
+ [Create a user](t_adding_redshift_user_cmd.md)
+ [Create a schema](t_creating_schema.md)
+ [Create a table](t_creating_table.md)
+ [Load data](cm-dev-t-load-sample-data.md)
+ [Query the system tables and views](t_querying_redshift_system_tables.md)
+ [Cancel a query](cancel_query.md)

## Connecting to Amazon Redshift data warehouses
<a name="connection"></a>

To connect to Amazon Redshift clusters, from the Amazon Redshift console **Clusters** page, expand **Connect to Amazon Redshift clusters** and do one of the following:
+ Choose **Query data** to use the query editor v2 to run queries on databases hosted by your Amazon Redshift cluster. After creating your cluster, you can immediately run queries by using the query editor v2.

  For more information, see [Querying a database using the Amazon Redshift query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2.html) in the *Amazon Redshift Management Guide*.
+ In **Work with your client tools**, choose your cluster and connect to Amazon Redshift from your client tools using JDBC or ODBC drivers by copying the JDBC or ODBC driver URL. Use this URL from your client computer or instance. Code your applications to use JDBC or ODBC data access API operations, or use SQL client tools that support either JDBC or ODBC.

  For more information on how to find your cluster connection string, see [Finding your cluster connection string](https://docs.aws.amazon.com/redshift/latest/mgmt/configuring-connections.html#connecting-drivers.html).
+ If your SQL client tool requires a driver, you can **Choose your JDBC or ODBC driver** to download an operating system-specific driver to connect to Amazon Redshift from your client tools.

  For more information on how to install the appropriate driver for your SQL client, see [Configuring a JDBC driver version 2.2 connection](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html).

  For more information on how to configure an ODBC connection, see [Configuring an ODBC connection](https://docs.aws.amazon.com/redshift/latest/mgmt/configure-odbc-connection.html).

To connect to Redshift Serverless data warehouse, from the Amazon Redshift console **Serverless dashboard** page, do one of the following:
+ Use the Amazon Redshift query editor v2 to run queries on databases hosted by your Redshift Serverless data warehouse. After creating your data warehouse, you can immediately run queries by using the query editor v2.

  For more information, see [Querying a database using the Amazon Redshift query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2.html).
+ Connect to Amazon Redshift from your client tools using JDBC or ODBC drivers by copying the JDBC or ODBC driver URL.

  To work with data in your data warehouse, you need JDBC or ODBC drivers for connectivity from your client computer or instance. Code your applications to use JDBC or ODBC data access API operations, or use SQL client tools that support either JDBC or ODBC.

  For more information on how to find your connection string, see [Connecting to Redshift Serverless](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-connecting) in the *Amazon Redshift Management Guide*.

# Create a database
<a name="t_creating_database"></a>

After you verify that your data warehouse is up and running, you can create a database. This database is where you actually create tables, load data, and run queries. A data warehouse can host multiple databases. For example, you can have a database for sales data named `SALESDB` and a database for orders data named `ORDERSDB` in the same data warehouse.

To create a database named **SALESDB**, run the following command in your SQL client tool.

```
CREATE DATABASE salesdb;
```

**Note**  
After running the command, make sure to refresh your SQL client tool list of objects in your data warehouse to see the new `salesdb`.

For this exercise, accept the defaults. For information about more command options, see [CREATE DATABASE](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_DATABASE) in the *Amazon Redshift Database Developer Guide*. To delete a database and its contents, see [DROP DATABASE](https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_DATABASE) in the *Amazon Redshift Database Developer Guide*. 

After you have created the SALESDB database, you can connect to the new database from your SQL client. Use the same connection parameters as you used for your current connection, but change the database name to `SALESDB`.

# Create a user
<a name="t_adding_redshift_user_cmd"></a>

By default, only the admin user that you created when you launched the data warehouse has access to the default database in the data warehouse. To grant other users access, create one or more accounts. Database user accounts are global across all the databases in a data warehouse, and not per individual database.

Use the CREATE USER command to create a new user. When you create a new user, you specify the name of the new user and a password. We recommend that you specify a password for the user. It must have 8–64 characters, and it must include at least one uppercase letter, one lowercase letter, and one numeral.

For example, to create a user named **GUEST** with password **ABCd4321**, run the following command.

```
CREATE USER GUEST PASSWORD 'ABCd4321';
```

To connect to the `SALESDB` database as the `GUEST` user, use the same password when you created the user, such as `ABCd4321`.

For information about other command options, see [CREATE USER](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_USER.html) in the *Amazon Redshift Database Developer Guide*.

# Create a schema
<a name="t_creating_schema"></a>

After you create a new database, you can create a new schema in the current database. A *schema* is a namespace that contains named database objects such as tables, views, and user-defined functions (UDFs). A database can contain one or multiple schemas, and each schema belongs to only one database. Two schemas can have different objects that share the same name.

You can create multiple schemas in the same database to organize data the way that you want or to group your data functionally. For example, you can create a schema to store all your staging data and another schema to store all the reporting tables. You can also create different schemas to store data relevant to different business groups that are in the same database. Each schema can store different database objects, such as tables, views, and user-defined functions (UDFs). In addition, you can create schemas with the AUTHORIZATION clause. This clause gives ownership to a specified user or sets a quota on the maximum amount of disk space that the specified schema can use. 

Amazon Redshift automatically creates a schema called `public` for every new database. When you don't specify the schema name while creating database objects, the objects go into the `public` schema.

To access an object in a schema, qualify the object by using the `schema_name.table_name` notation. The qualified name of the schema consists of the schema name and table name separated by a dot. For example, you might have a `sales` schema that has a `price` table and an `inventory` schema that also has a `price` table. When you refer to the `price` table, you must qualify it as `sales.price` or `inventory.price`.

The following example creates a schema named **SALES** for the user `GUEST`.

```
CREATE SCHEMA SALES AUTHORIZATION GUEST;
```

For information about more command options, see [CREATE SCHEMA](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_SCHEMA.html) in the *Amazon Redshift Database Developer Guide*.

To view the list of schemas in your database, run the following command.

```
select * from pg_namespace;
```

The output should look similar to the following.

```
  nspname             | nspowner |         nspacl
----------------------+----------+--------------------------
  sales               |  100     |
  pg_toast            |   1      |
  pg_internal         |   1      |
  catalog_history     |   1      |
  pg_temp_1           |   1      | 
  pg_catalog          |   1      | {rdsdb=UC/rdsdb,=U/rdsdb}
  public              |   1      | {rdsdb=UC/rdsdb,=U/rdsdb}
  information_schema  |   1      | {rdsdb=UC/rdsdb,=U/rdsdb}
```

For more information on how to query catalog tables, see [Querying the catalog tables](https://docs.aws.amazon.com/redshift/latest/dg/c_join_PG.html) in the *Amazon Redshift Database Developer Guide.*

Use the GRANT statement to give permissions to users for the schemas.

The following example grants privilege to the `GUEST` user to select data from all tables or views in the `SALES` schema using a SELECT statement. 

```
GRANT SELECT ON ALL TABLES IN SCHEMA SALES TO GUEST;
```

The following example grants all available privileges at one time to the `GUEST` user.

```
GRANT ALL ON SCHEMA SALES TO GUEST;
```

# Create a table
<a name="t_creating_table"></a>

After you create your new database, create tables to hold your data. Specify the column information when you create the table.

For example, to create a table named **DEMO**, run the following command.

```
CREATE TABLE Demo (
  PersonID int,
  City varchar (255)
);
```

By default, new database objects, such as tables, are created in the default schema named `public` created during data warehouse creation. You can use another schema to create database objects. For more information about schemas, see [Managing database security](https://docs.aws.amazon.com/redshift/latest/dg/r_Database_objects.html) in the *Amazon Redshift Database Developer Guide*.

You can also create a table using the `schema_name.object_name` notation to create the table in the `SALES` schema.

```
CREATE TABLE SALES.DEMO (
  PersonID int,
  City varchar (255)
);
```

To view and inspect schemas and their tables, you can use the Amazon Redshift query editor v2 . Or you can see the list of tables in schemas using system views. For more information, see [Query the system tables and views](t_querying_redshift_system_tables.md).

The `encoding`, `distkey`, and `sortkey` columns are used by Amazon Redshift for parallel processing. For more information about designing tables that incorporate these elements, see [Amazon Redshift best practices for designing tables](https://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html).

## Insert data rows into a table
<a name="t_inserting_data_into_table"></a>

After you create a table, insert rows of data into that table.

**Note**  
The [INSERT](https://docs.aws.amazon.com/redshift/latest/dg/r_INSERT_30.html) command inserts rows into a table. For standard bulk loads, use the [COPY](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) command. For more information, see [Use a COPY command to load data](https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-use-copy.html).

For example, to insert values into the `DEMO` table, run the following command.

```
INSERT INTO DEMO VALUES (781, 'San Jose'), (990, 'Palo Alto');
```

To insert data into a table that's in a specific schema, run the following command.

```
INSERT INTO SALES.DEMO VALUES (781, 'San Jose'), (990, 'Palo Alto');
```

## Select data from a table
<a name="t_selecting_data"></a>

After you create a table and populate it with data, use a SELECT statement to display the data contained in the table. The SELECT \$1 statement returns all the column names and row values for all of the data in a table. Using SELECT is a good way to verify that recently added data was correctly inserted into the table.

To view the data that you entered in the **DEMO** table, run the following command.

```
SELECT * from DEMO;
```

The result should look like the following.

```
 personid |   city    
----------+-----------
      781 | San Jose
      990 | Palo Alto
(2 rows)
```

For more information about using the SELECT statement to query tables, see [SELECT](https://docs.aws.amazon.com/redshift/latest/dg/r_SELECT_synopsis.html).

# Load data
<a name="cm-dev-t-load-sample-data"></a>

Many of the examples in this guide use the TICKIT sample dataset. You can download the file [tickitdb.zip](samples/tickitdb.zip), which contains individual sample data files. You can then upload the sample data to your own Amazon S3 bucket.

To load the sample data for your database, first create the tables. Then use the COPY command to load the tables with sample data that is stored in an Amazon S3 bucket. For steps to create tables and load sample data, see [Step 4: Load data from Amazon S3 to Amazon Redshift](new-user.md#rs-gsg-create-sample-db).

# Query the system tables and views
<a name="t_querying_redshift_system_tables"></a>

In addition to the tables that you create, your data warehouse contains a number of system tables and views. These tables and views contain information about your installation and the various queries and processes that are running on the system. You can query these system tables and views to collect information about your database. For more information, see [System tables and views reference](https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_system-tables.html) in the *Amazon Redshift Database Developer Guide*. The description for each table or view indicates whether a table is visible to all users or only to superusers. Log in as a superuser to query tables that are visible only to superusers. 

## View a list of table names
<a name="t_querying_redshift_system_tables-view-a-list-of-table-names"></a>

To view a list of all tables in a schema, you can query the PG\$1TABLE\$1DEF system catalog table. You can first examine the setting for `search_path`.

```
SHOW search_path;
```

The result should look similar to the following,

```
  search_path
---------------
 $user, public
```

The following example adds the `SALES` schema to the search path and shows all the tables in the `SALES` schema.

```
set search_path to '$user', 'public', 'sales';
                
SHOW search_path;

      search_path       
------------------------
 "$user", public, sales


select * from pg_table_def where schemaname = 'sales';

 schemaname | tablename |  column  |          type          | encoding | distkey | sortkey | notnull 
------------+-----------+----------+------------------------+----------+---------+---------+---------
 sales      | demo      | personid | integer                | az64     | f       |       0 | f
 sales      | demo      | city     | character varying(255) | lzo      | f       |       0 | f
```

The following example shows a list of all tables called `DEMO` in all schemas on the current database.

```
set search_path to '$user', 'public', 'sales';
select * from pg_table_def where tablename = 'demo';

 schemaname | tablename |  column  |          type          | encoding | distkey | sortkey | notnull 
------------+-----------+----------+------------------------+----------+---------+---------+---------
 public     | demo      | personid | integer                | az64     | f       |       0 | f
 public     | demo      | city     | character varying(255) | lzo      | f       |       0 | f
 sales      | demo      | personid | integer                | az64     | f       |       0 | f
 sales      | demo      | city     | character varying(255) | lzo      | f       |       0 | f
```

For more information, see [PG\$1TABLE\$1DEF](https://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html).

You can also use the Amazon Redshift query editor v2 to view all the tables in a specified schema by first choosing a database that you want to connect to.

## View users
<a name="t_querying_redshift_system_tables-view-database-users"></a>

You can query the PG\$1USER catalog to view a list of all users, along with the user ID (USESYSID) and user privileges. 

```
SELECT * FROM pg_user;

  usename   | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | valuntil | useconfig
------------+----------+-------------+----------+-----------+----------+----------+-----------
 rdsdb      |        1 | true        | true     | true      | ******** | infinity |
 awsuser    |      100 | true        | true     | false     | ******** |          |
 guest      |      104 | true        | false    | false     | ******** |          |
```

The user name `rdsdb` is used internally by Amazon Redshift to perform routine administrative and maintenance tasks. You can filter your query to show only user-defined user names by adding `where usesysid > 1` to your SELECT statement.

```
SELECT * FROM pg_user WHERE usesysid > 1;

  usename   | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | valuntil | useconfig
------------+----------+-------------+----------+-----------+----------+----------+-----------
 awsuser    |      100 | true        | true     | false     | ******** |          |
 guest      |      104 | true        | false    | false     | ******** |          |
```

## View recent queries
<a name="t_querying_redshift_system_tables-view-recent-queries"></a>

In the previous example, the user ID (user\$1id) for `adminuser` is 100. To list the four most recent queries run by `adminuser`, you can query the SYS\$1QUERY\$1HISTORY view. 

You can use this view to find the query ID (query\$1id) or process ID (session\$1id) for a recently run query. You can also use this view to check how long it took a query to complete. SYS\$1QUERY\$1HISTORY includes the first 4,000 characters of the query string (query\$1text) to help you locate a specific query. Use the LIMIT clause with your SELECT statement to limit the results. 

```
SELECT query_id, session_id, elapsed_time, query_text 
FROM sys_query_history
WHERE user_id = 100
ORDER BY start_time desc
LIMIT 4;
```

The result look something like the following. 

```
 query_id |  session_id  |  elapsed_time |   query_text
----------+--------------+---------------+----------------------------------------------------------------
 892      |    21046     |       55868   | SELECT query, pid, elapsed, substring from ...
 620      |    17635     |     1296265   | SELECT query, pid, elapsed, substring from ...
 610      |    17607     |       82555   | SELECT * from DEMO; 
 596      |    16762     |      226372   | INSERT INTO DEMO VALUES (100);
```

## Determine the session ID of a running query
<a name="determine_pid"></a>

To retrieve system table information about a query, you might need to specify the session ID (process ID) associated with that query. Or, you might need to find the session ID for a query that is still running. For example, you need the session ID if you need to cancel a query that is taking too long to run on a provisioned cluster. You can query the STV\$1RECENTS system table to obtain a list of session IDs for running queries, along with the corresponding query string. If your query returns multiple session, you can look at the query text to determine which session ID you need.

To determine the session ID of a running query, run the following SELECT statement.

```
SELECT session_id, user_id, start_time, query_text
FROM sys_query_history
WHERE status='running';
```

# Cancel a query
<a name="cancel_query"></a>

If you run a query that is taking too long or is consuming excessive resources, cancel the query. For example, create a list of ticket sellers that includes the seller's name and quantity of tickets sold. The following query selects data from the `SALES` table and `USERS` table and joins the two tables by matching SELLERID and USERID in the WHERE clause.

```
SELECT sellerid, firstname, lastname, sum(qtysold)
FROM sales, users
WHERE sales.sellerid = users.userid
GROUP BY sellerid, firstname, lastname
ORDER BY 4 desc;
```

The result looks something like the following.

```
 sellerid | firstname | lastname | sum
----------+-----------+----------+------
  48950   |   Nayda   |   Hood   | 184
  19123   |   Scott   | Simmons  | 164
  20029   |    Drew   | Mcguire  | 164
  36791   |  Emerson  | Delacruz | 160
  13567   |   Imani   |   Adams  | 156
  9697    |  Dorian   |    Ray   | 156
  41579   | Harrison  | Durham   | 156
  15591   |  Phyllis  |  Clay    | 152
  3008    |  Lucas    | Stanley  | 148
  44956   |  Rachel   |Villarreal| 148
```

**Note**  
This is a complex query. For this tutorial, you don't need to worry about how this query is constructed.

The previous query runs in seconds and returns 2,102 rows.

Suppose that you forget to put in the WHERE clause.

```
SELECT sellerid, firstname, lastname, sum(qtysold)
FROM sales, users
GROUP BY sellerid, firstname, lastname
ORDER BY 4 desc;
```

The result set includes all of the rows in the `SALES` table multiplied by all the rows in the `USERS` table (49989\$13766). This is called a Cartesian join, and it isn't recommended. The result is over 188 million rows and takes a long time to run.

To cancel a running query, use the CANCEL command with the query's session ID. With the Amazon Redshift query editor v2 you can cancel a query by choosing the cancel button while the query is running.

To find the session ID, start a new session and query the STV\$1RECENTS table, as shown in the previous step. The following example shows how you can make the results more readable. To do this, use the TRIM function to trim trailing spaces and show only the first 20 characters of the query string.

To determine the session ID of a running query, run the following SELECT statement.

```
SELECT user_id, session_id, start_time, query_text
FROM sys_query_history
WHERE status='running';
```

The result looks something like the following.

```
 user_id |   session_id  |   start_time               |   query_text
---------+---------------+----------------------------+----------------------------------------------------------------
 100     |    1073791534 | 2024-03-19 22:26:21.205739 | SELECT user_id, session_id, start_time, query_text FROM  ...
```

To cancel the query with session ID `1073791534`, run the following command.

```
CANCEL 1073791534;
```

**Note**  
The CANCEL command doesn't stop a transaction. To stop or roll back a transaction, use the ABORT or ROLLBACK command. To cancel a query associated with a transaction, first cancel the query then stop the transaction.

If the query that you canceled is associated with a transaction, use the ABORT or ROLLBACK command to cancel the transaction and discard any changes made to the data:

```
ABORT;
```

Unless you are signed on as a superuser, you can cancel only your own queries. A superuser can cancel all queries.

If your query tool doesn't support running queries concurrently, start another session to cancel the query.

For more information about canceling a query, see [CANCEL](https://docs.aws.amazon.com/redshift/latest/dg/r_CANCEL.html) in the *Amazon Redshift Database Developer Guide*.

## Cancel a query using the superuser queue
<a name="cancel_query-cancel-a-query-using-the-superuser-queue"></a>

If your current session has too many queries running concurrently, you might not be able to run the CANCEL command until another query finishes. In that case, run the CANCEL command using a different workload management query queue.

By using workload management, you can run queries in different query queues so that you don't need to wait for another query to complete. The workload manager creates a separate queue, called the Superuser queue, that you can use for troubleshooting. To use the Superuser queue, log on a superuser and set the query group to 'superuser' using the SET command. After running your commands, reset the query group using the RESET command.

To cancel a query using the superuser queue, run these commands.

```
SET query_group TO 'superuser';
CANCEL 1073791534;
RESET query_group;
```